|ESnet: It’s Everywhere You Want Your Data To Be|
Although it’s defined by DOE as a national user facility just as the ALS is, the Energy Sciences Network (ESnet) doesn’t quite fit the image of a centrally located facility serving a specific set of users. Rather, ESnet is a nationwide network that provides high-bandwidth, reliable connectivity linking tens of thousands of scientists at more than 40 DOE labs and facilities. The systems and services provided by ESnet staff advance research by helping scientists share their ideas, their data and their discoveries with collaborators and peers around the world.
Managed by Lawrence Berkeley National Laboratory (LBNL), ESnet will soon take bandwidth to the next level as it rolls out the world’s first 100 gigabits per second (100 Gbps) network by the end of 2012. According to ESnet Director Greg Bell, the scientific data resulting from experiments at DOE’s particle accelerators, light sources and genome sequencing facilities are push the limits of ESnet’s current 10 Gbps network.
“We never want the network to be a gating function for scientific discovery,” Bell said.
According to ALS Scientist Dula Parkinson, a new fast camera installed at the hard x-ray tomography Beamline 8.3.2 at the ALS last year allows scientists to study a variety of structures as a function of time—from bones to rocks, and even metallic alloys—in unprecedented detail. The new camera produces data at a rate of 300 megabytes per second, which is 50 times faster than the one it replaced.
“Two years ago the hard x-ray tomography beamline at Berkeley Lab’s ALS generated about 100 gigabytes of data per week, but we got a faster camera and now we are generating anywhere from 2 to 5 terabytes of data per week,” says Parkinson. “This is pushing the limit of what our current infrastructure can handle.”
According to Parkinson, in the current system, a typical ALS user will create a folder on a data storage server connected to the instrument, and save their raw data to this folder. In many cases, users may do some initial processing on desktop computers at the ALS and save these files on the facility’s storage server. Upon leaving the facility, researchers will copy their data on an external hard drive and carry it home for further analysis. The files and raw data initially saved on the ALS storage server are typically left behind for the facility’s staff to manage. Keeping up with the torrent of data requires new methods for moving, storing, and analyzing data.
For the first step, ESnet staff helped Parkinson set up a data transfer node (tuned for optimal performance) and LBNL networking staff helped deploy a 10 Gigabits-per-second switch, giving Parkinson’s data a high-performance path through the LBNL network (LBLnet) to ESnet. This approach is an example of ESnet’s “Science DMZ” model, where data-intensive science applications are run on dedicated infrastructure configured for high performance.
ESnet carries the ALS data to DOE’s National Energy Research Scientific Computing Center (NERSC) in Oakland, where the data is stored, managed and shared with other researchers. To pave the way, NERSC staff helped Parkinson with his data acquisition and data transfer workflow. Future plans include tapping into NERSC’s supercomputing resources to improve data analysis.
“This is a success story for the ALS, for LBLnet, for ESnet and for NERSC,” said ESnet network engineer Eli Dart, who worked with Parkinson. “This is a working example of the science infrastructure needed to support a new generation of data-intensive science experiments at X-ray light sources, neutron sources, and free-electron lasers.”