So you've decided to use a supercomputer to run your simulations, like Iridis at Southampton Uni. You can just schedule jobs and run them remotely, without having to run them one by one on your machine. Awesome. Well, not always. Using a cluster means you have to wait in a queue and you don't immediately see if anything went wrong. Also, there is a limited resource you get at one time. If something does go wrong, you have to contact support and wait. If there is maintenance going on, you have to wait too. Is there a way of running all the suff on your machine, without hassle? Sure there is.
Now, supercomputers are great if your simulation truly uses multiple cores that your own machine just doesn't have. But what if you have an application that is serial and you just want to run a number of them in parallel to obtain results quicker? Basically, you can write a bash script to schedule background jobs, run that script and leave your computer on. There is a good chance that your machine can run the jobs even faster, for example a particular simulation I am working on takes 75 minutes on Iridis but only 60 on my machine. Small difference? Sure, but multiply that by 100 or 500 and the gap gets bigger.
So here is how you do it:
1. (if applicable) get rid of Windows and install MacOS or Linux :)
2. Create your simulation so that it can be run from the terminal, e.g. build a C application or create Java jar file. Make sure the application can output data into a specific folder, this is important if a number of them will run simultaneously.
3. Test your basic command, so that you can run from the terminal. It is a good idea to be able to specify parameters through the command line, so that you don't have to have separate builds if you just want to change a parameter value.
Also, it is good to be able to tell the application not to create any UI if run from the command line. It will make it much faster. For example, my Java application takes 75% of CPU when running with UI but only 25% if it runs without it. UI is good for testing, but you don't need it if you just want a lot of results.
I made my Java application so that I can run
java -jar RobotForaging.jar true myFolderName1 30 50 500
where the 1st argument tells the app it is console-only (no UI), the output folder is the 2nd argument and a number of parameters follow.
4. Measure how many applications you can run at the same time, without using up the computer completely. Do this by observing processes that run, for example by using Activity Monitor on a Mac:
For example, if one of my simulations takes around 25% of CPU (RAM is not an issue for me at the moment), I can have 9 simulations running at the same time on a 4-core machine and still use my computer for other stuff.
5. Create a schedule.sh file (or use any other file name) in the same directory where your built app is. Schedule a number of application to run simultaneously by just using your basic command (with different parameters perhaps) and putting '&' after it. This will make them all background processes. Then use 'wait' to wait for all of them to finish before another batch is scheduled. For example, I put this in my .sh file:
#!/bin/bash
java -jar RobotForaging.jar true e1_tar300_nsr60_nst10 30 50 300 60 10 1 &
java -jar RobotForaging.jar true e1_tar300_nsr60_nst15 30 50 300 60 15 1 &
java -jar RobotForaging.jar true e1_tar300_nsr60_nst30 30 50 300 60 30 1 &
java -jar RobotForaging.jar true e1_tar300_nsr120_nst15 30 50 300 120 15 1 &
java -jar RobotForaging.jar true e1_tar300_nsr120_nst30 30 50 300 120 30 1 &
java -jar RobotForaging.jar true e1_tar300_nsr120_nst45 30 50 300 120 45 1 &
java -jar RobotForaging.jar true e1_tar300_nsr180_nst30 30 50 300 180 30 1 &
java -jar RobotForaging.jar true e1_tar300_nsr180_nst45 30 50 300 180 45 1 &
java -jar RobotForaging.jar true e1_tar300_nsr180_nst60 30 50 300 180 60 1 &
wait
java -jar RobotForaging.jar true e1_tar500_nsr60_nst10 30 50 500 60 10 1 &
java -jar RobotForaging.jar true e1_tar500_nsr60_nst15 30 50 500 60 15 1 &
java -jar RobotForaging.jar true e1_tar500_nsr60_nst30 30 50 500 60 30 1 &
java -jar RobotForaging.jar true e1_tar500_nsr120_nst15 30 50 500 120 15 1 &
java -jar RobotForaging.jar true e1_tar500_nsr120_nst30 30 50 500 120 30 1 &
java -jar RobotForaging.jar true e1_tar500_nsr120_nst45 30 50 500 120 45 1 &
java -jar RobotForaging.jar true e1_tar500_nsr180_nst30 30 50 500 180 30 1 &
java -jar RobotForaging.jar true e1_tar500_nsr180_nst45 30 50 500 180 45 1 &
java -jar RobotForaging.jar true e1_tar500_nsr180_nst60 30 50 500 180 60 1 &
And that's it! Run schedule.sh via terminal. Check your Activity Monitor (or similar), you should see your processes running. It is always good to leave yourself a bit of free CPU / RAM so that your computer is not completely useless when running your jobs.
For this particular set of simulations, Iridis allowed me to run around 30 simulations during around 15 hours (I had to wait a lot, my jobs requested 50 minutes of run time). Using background processes on my machine allows me to run 30 simulations in about 3 hours. If you are simply running serial applications simultaneously, have a look if you can save some time too.
Comments
[25/06/2013]
[25/06/2013]
{Please enable JavaScript in order to post comments}