Using OpenMPI 1.2.6 with on-demand clouds
>> Friday, January 30, 2009
OpenMPI seems to require and test all the nodes to be on the same subnet, before running MPI jobs on them. I was shocked to see the observation about OpenMPI, mentioned in this[1] paper.
But when one requests for several VMs from EC2, most of the time, the VMs booted are not in the same subnet. So this can be a show stopper if you are trying to use OpenMPI within clouds, especially the commercial ones.
[1] : Cloud Computing for Parallel Scientific HPC Applications : Feasibility of Running Coupled Atmospheric-Ocean Climate Models on Amazon's EC2

5 comments:
That isnt so much a failing of MPI -which is optimised for HPC- as a limitiation of EC2, others being no multicast, low IO performance to virtual disks. If you want to run HPC apps on datacentres that suit you better, IBM, HP or, until recently Sun,
FWIW, the new version of Open MPI v1.3 fixes the problem of multiple TCP subnet connectivity.
As for the bad latency, I'll have to check into that, but I suspect that it is due to OMPI's polling progression model (i.e., it keeps polling sockets for progress rather than blocking in a select() or poll() -- there's long, deep reasons for this that I won't go into in a blog comment :-) ). We've long since talked about adopting a blocking model (which LAM is optimized for, as it its "fast path" send/receive path for benchmarks), but haven't gotten around to it because we've been focusing on the low latency / high bandwidth interconnects -- not TCP.
We'd love to have someone join our community to help improve our TCP support / include a blocking progression model... :-)
Steve, thanks for the comment.
I wasn't complaining about OpenMPI, it was more of a warning to people (including me) who are trying to use OpenMPI within EC2. EC2, or most of the clouds, will have tons of other problems, while running existing scientific applications within them.
Jeff, thanks for the update.
I'm looking for a reasonably good and free MPI implementation to use within EC2. Any recommendations?
I don't know what is already available on EC2 vs. what you can supply for yourself; that paper seems to imply that you can use anything.
Open MPI V1.3 is certainly quite feature-full, but perhaps its latency is a bit high (again, per that paper). Another point in OMPI's favor is that your colleagues there at IU might be able to help you our with any problems you might incur. ;-)
LAM is also a good choice, but it isn't really supported these days. It should "just work", but... If you do use LAM, make sure to take the latest beta that Brian put out; it contains very minor fixes for bit rot that has accumulated over the years since we moved to Open MPI.
I don't know much about the other MPIs; being an OMPI core developer (and previously being a core LAM/MPI developer), my bias is pretty clear. ;-)
I appreciate your comments very much and want to thank you.
Post a Comment