Google's Next-Gen of Sneakernet

How do you get 120 terabytes of data -- the equivalent of 123,000 iPod shuffles (roughly 30 million songs) -- from A to B? For the most part, the old-fashioned way: via a sneakernet. It's not glamorous, but Google engineers hope to at least end the arduous process of transferring massive quantities of data -- which can literally take weeks to upload onto the internet -- with something affectionately called "FedExNet" by the scientists who use it.

Chris DiBona, the open-source program manager at Google, just returned late last week from Washington, D.C., where he met with Hubble researchers at the Space Telescope Science Institute to set the stage for what will be the largest data transfer for the project ever: The near totality of all the astronomical data and images that Hubble has ever collected -- about 120 terabytes.

The project comes out of DiBona's efforts last fall to put together an informal system in which Google acts as both a repository and courier for large data sets between teams of scientists. Now, he leads a team that sets up small form-factor PCs, hooked up to drive arrays that can store up to 3 terabytes of data.

The process lightens the load, but it isn't simple: DiBona ships both the PC and array to teams of scientists at various research institutions, which then connect their local servers to the array via an eSATA connection. Once the data transfer is complete, the drives get sent straight back to Mountain View, where DiBona and others copy the data to Google's servers for archival purposes. The idea then is that if other scientists around the world needed access to such a large quantity of data, Google would simply reverse the process.
"Right now, we're just acting as a conduit," DiBona says. "We make a copy of it, and then we can use the hard drives for something else. They'll get banged around a little bit too much (to store the data directly on the drives). They're not intended to be a long-term storage medium -- they're like envelopes to us."

For now, the program is only working in one direction -- data being sent from the field straight back to Google. But that should change later this year. Also, for the time being, the data is largely limited to astronomical data, such as Arizona State University's nearly 6 terabytes of thermal infrared images of the surface of Mars.

Christian also said she has been working with Google to help the company create a new way to access their astronomical data -- simply typing in a star's name into a traditional search field simply won't do. And this raises the question of what Google intends to do with such a large amount of data, beyond just lending a helping hand. While the company remains cagey about its future plans, it's conceivable that it may be working on a more science-oriented search engine, along the lines of Google Scholar.

By Cyrus Farivar

No comments: