Saturday, May 14, 2011

CIS(theta) 2010-2011 - Eureka, we did it! - Meeting XIV

Aim: 
Completing Step 4!

Attending: 
CIS(theta) 2010-2011: DavidG, RyanH

Absent: 
CIS(theta) 2010-2011: HerbertKJoshG

Reading:
NA

Parallel Python
IPython
Large Integer number crunching Mersenne Primes
http://www.hoise.com/primeur/03/articles/weekly/AE-PR-01-04-37.html
Large Integer number crunching Beal Conjecture
http://www.bealconjecture.com/



InstantCluster Step 4: Software Stack (Completed)
By Jove, I think we've done it! OK, we had a really short meeting today as everyone had to come late due to xtra help and AP exams.... Anyway, we actually got openMPI working last time and didn't even know it! All we had to do was add the IPs to a file called "machines" to get nearly 2700 MFLOPs using 2 cores per node on 3 nodes!

Now, we can scale the cluster up from 3 nodes to all 24 students boxes and get over 20 GFLOPs. The trick was to use the 10.10.*.* IPs and not the 10.5.*.* ones. Maybe we could get the 3 servers in on this too for an additional 3 or more GFLOPs? 

BTW, we figured out how to setup the "machines" file by running ClusterByNight first, thanx Kevin Lynagh! That's also where we got the idea to run openMPI over openSSH. Sorry, but PXE boot just didn't work for us! We should also be able to add the octave/mpitb functionality from pelicanHPC as well, thanx Michael Creel!




We only installed gfortran on one box so we could compile flops.f on that node (lets call it the master node):
mpif77 -o flops flops.f

and copied "flops" to all the worker boxes (and used chmod 755 to make it executable) getting about 2650 MFLOPs on 6 cores over 3 nodes:
mpirun -np 6 --hostfile machines flops

What we messed up last time was making the "machines" file based on the 10.5.*.* static IPs we set up on eth1. When we listed all the 10.10.*.* DHCP IPs used on eth0, all worked fine! So, openMPI just defaults to eth0?

Well, it was almost fine. When the remote processes terminated, the master node hung. I think that's because the 10.5.*.* IPs are listed in the authorized_keys file. Maybe we can just edit that and make them 10.10.*.* too?

Well, that's all for now, enjoy!

No comments:

Post a Comment