Thursday, December 09, 2010

Software as a service for science

Greeting from the famous room N101 at the Australian National University, where Ian Foster (Argonne National Laboratory) is speaking on "What the cloud *really* means for science". He argues that while providing hardware-as-a-service is useful for business, science needs more from cloud computing. One example is software as a service for the relatively mundane but time consuming task of moving large amounts of data from one location to another and converting it into a format which can be processed. He used as an example Globus Online, Software-as-a-Service. This is built using Amazon's cloud service and allows large amounts of scientific data to be moved from place to place:
We developed this service to address the challenges faced by researchers in moving, sharing, and archiving large volumes of data among distributed sites. With Globus Online, you hand off data movement tasks to a hosted service that manages the entire operation, monitoring performance and errors, retrying failed transfers, correcting problems automatically whenever possible, and reporting status to keep you informed so that you can focus on your research.
This example brings out some of the problems with cloud computing. It is not clear why the data should need to be moved from its storage location for processing. In theory, at least, it should be more efficient to send the algorithm to be used to where the data is and then just send the result back. In most cases this will result in millions of times less data transfer. The idea of the science grid and of Web Services is to be able to distribute applications so data does not need to be moved around.

Given that Globus Online is built on Amazon Web Services, the obvious approach would be to process the data using applications running in the Amazon cloud. The Globus Online application does not actually use the Amazon service for carrying out the actual data transfer, just for setting up the process and then issuing instructions to traditional non-cloud applications. It would seem to make sense to go to the next step and provide cloud based applications, perhaps using web services.

The idea of using cloud services in place of specially acquired scientific supercomputers is an interesting one. Supercomputers traditionally used specially designed processors, but most now use off the shelf PC processors and chips from games consoles. The current area of research is in custom multiple high speed interconnects between the chips. It would be interesting to see if this could be applied to cloud computers generally.

Ian mentioned the "CSIRO Science Cloud" which was intriguing. CSIRO is researhcing cloud computing, but I had not heard of a specific cloud service.

No comments: