Scientific Applications and EC2 (or any other commercial IaaS provider)
>> Friday, August 21, 2009
Ian Foster makes a very good point about Ed Walker's article on comparing the executions of HPC applications onEC2 and super computers. Echoing his ideas, yes super computers can be faster compared to a cluster on EC2 only on the execution times. But if we take the total time, including queue time etc., then EC2 can compete with super computers. I totally agree.
But there are few more points I'd like to mention here.
The NAS benchmark Ed uses contains HPC applications implemented using OpenMP and most of the applications in this benchmarks utilizes the fast inter connects of the nodes in super computers. So I don't think its fare to compare the execution times within a HPC cluster to EC2, which I believe doesn't have fast inter-connects or powerfull computers.
Also there are lot more scientific applications that doesn't need or use fast inter-connets. There are quite a number of apps, like in fusion simulation, fluid dynamics, weather simulations, etc which are written using MPI. But thats not it. There are thousands of scientists around the world using their desktops or less than 10 nodes clusters consisting of commodity computers running their applications. They don't need super-computers to run their scientific experiments
Not only that. There are lots of programs that are embarassingly parallel and use MapReduce like programming model to implement them. Take Google's massive data mining programs for an example. They also don't necessarily need fast interconnects or best computers to run them, still they are good enough to work with real time.
So its very important to udnerstand that scientific applications are not always HPC apps.
How many scientists will have access to Grid resources and how many of those who has access to grids will get the chance to use it as and when they need it. For example, if a scientist has a paper deadline and needs to run thousand jobs, he can either submit all his thousand jobs to grid and wait till they are done, or if he has money, he can create few clusters on demand on EC2, divide his jobs among these jobs clusters and get his work done.
We can not simply put down on-demand IaaS resources saying they are not suitable for science. I think what we need is better scheduling algorithms which can schedule both on to grid, local and IaaS resources taking user requirements and different parameters like total execution times in to account.
(One more note. Ian mentions in his blog "before we conclude that EC2 is no good for science", I think he must have meant about HPC applications and not science apps in general)
See also my other post on a similar topic.

0 comments:
Post a Comment