Skip to main content



Database

Scientific data management is emerging as a key enabling technology for research and collaboration.

“NSF envisions a world in which digital science and engineering data are routinely deposited in convenient repositories, can be readily discovered in well-documented form by specialists and non-specialists alike, are open and accessible, and are reliably preserved.”

NSF Cyberinfrastructure Vision for 21st Century Discovery

RDBMS

Center staff have expertise in Relational Database Management Systems used to create and retrieve data, such as PostgreSQL, MySQL, and Microsoft SQL Server.

Petabyte Databases

SQL technologies can meet the data needs of small research groups or large-scale archives consisting of hundreds of terabytes or petabyes. One example is Web Lab, a joint project of Cornell University and the Internet Archive that is part of the NSF-funded Petabyte Storage Devices for Data-Driven Science. The challenge of transferring and managing very large data sets is described in “Building a Research Library for the History of the Web.” Instrument data from the Arecibo sky survey and the CLEO high-energy particle physic experiment are other examples of large scale data flows.

Relational Databases for Engineering

Driving engineering simulations with relational database backends rather than flat files can reduce I/O errors and provide other advantages. Anthony Ingraffea is a leader in the application of databases for engineering and has used this approach on NSF-funded multiscale materials modeling projects.

Low Latency, High-Throughput Databases

CAC staff has designed a database solution that effectively masks latencies by using SQL and a Web services front-end to “push” data out to the compute nodes. This solution is ideal for high-throughput applications in fields such as finance and the life sciences.

Database Research

Cornell database research is focused on areas such as database systems, digital libraries and Web information, and data mining.

CAC supports collaborative research projects in emerging information technologies, particularly in areas that impact the design of effective cyberinfrastructure for scientific research, data preservation, and discovery.