[Do you really need a supercomputer?]

Java C++ Added on 25/06/2013

So you've decided to use a supercomputer to run your simulations, like Iridis at Southampton Uni. You can just schedule jobs and run them remotely, without having to run them one by one on your machine. Awesome. Well, not always. Using a cluster means you have to wait in a queue and you don't immediately see if anything went wrong. Also, there is a limited resource you get at one time. If something does go wrong, you have to contact support and wait. If there is maintenance going on, you have to wait too. Is there a way of running all the suff on your machine, without hassle? Sure there is.

Now, supercomputers are great if your simulation truly uses multiple cores that your own machine just doesn't have. But what if you have an application that is serial and you just want to run a number of them in parallel to obtain results quicker? Basically, you can write a bash script to schedule background jobs, run that script and leave your computer on. There is a good chance that your machine can run the jobs even faster, for example a particular simulation I am working on takes 75 minutes on Iridis but only 60 on my machine. Small difference? Sure, but multiply that by 100 or 500 and the gap gets bigger.

So here is how you do it:

1. (if applicable) get rid of Windows and install MacOS or Linux :)

2. Create your simulation so that it can be run from the terminal, e.g. build a C application or create Java jar file. Make sure the application can output data into a specific folder, this is important if a number of them will run simultaneously.

3. Test your basic command, so that you can run from the terminal. It is a good idea to be able to specify parameters through the command line, so that you don't have to have separate builds if you just want to change a parameter value.

Also, it is good to be able to tell the application not to create any UI if run from the command line. It will make it much faster. For example, my Java application takes 75% of CPU when running with UI but only 25% if it runs without it. UI is good for testing, but you don't need it if you just want a lot of results.

I made my Java application so that I can run

java -jar RobotForaging.jar true myFolderName1 30 50 500

where the 1st argument tells the app it is console-only (no UI), the output folder is the 2nd argument and a number of parameters follow.

4. Measure how many applications you can run at the same time, without using up the computer completely. Do this by observing processes that run, for example by using Activity Monitor on a Mac:

  • Check how much CPU each job takes, basically you don't want all jobs on your computer (including other applications!) to add up to more than number_of_cores*100%, otherwise they will be slowing each other down
  • Also check RAM usage, there should always be some space left



For example, if one of my simulations takes around 25% of CPU (RAM is not an issue for me at the moment), I can have 9 simulations running at the same time on a 4-core machine and still use my computer for other stuff.

5. Create a schedule.sh file (or use any other file name) in the same directory where your built app is. Schedule a number of application to run simultaneously by just using your basic command (with different parameters perhaps) and putting '&' after it. This will make them all background processes. Then use 'wait' to wait for all of them to finish before another batch is scheduled. For example, I put this in my .sh file:

#!/bin/bash

java -jar RobotForaging.jar true e1_tar300_nsr60_nst10 30 50 300 60 10 1 &
java -jar RobotForaging.jar true e1_tar300_nsr60_nst15 30 50 300 60 15 1 &
java -jar RobotForaging.jar true e1_tar300_nsr60_nst30 30 50 300 60 30 1 &
java -jar RobotForaging.jar true e1_tar300_nsr120_nst15 30 50 300 120 15 1 &
java -jar RobotForaging.jar true e1_tar300_nsr120_nst30 30 50 300 120 30 1 &
java -jar RobotForaging.jar true e1_tar300_nsr120_nst45 30 50 300 120 45 1 &
java -jar RobotForaging.jar true e1_tar300_nsr180_nst30 30 50 300 180 30 1 &
java -jar RobotForaging.jar true e1_tar300_nsr180_nst45 30 50 300 180 45 1 &
java -jar RobotForaging.jar true e1_tar300_nsr180_nst60 30 50 300 180 60 1 &

wait

java -jar RobotForaging.jar true e1_tar500_nsr60_nst10 30 50 500 60 10 1 &
java -jar RobotForaging.jar true e1_tar500_nsr60_nst15 30 50 500 60 15 1 &
java -jar RobotForaging.jar true e1_tar500_nsr60_nst30 30 50 500 60 30 1 &
java -jar RobotForaging.jar true e1_tar500_nsr120_nst15 30 50 500 120 15 1 &
java -jar RobotForaging.jar true e1_tar500_nsr120_nst30 30 50 500 120 30 1 &
java -jar RobotForaging.jar true e1_tar500_nsr120_nst45 30 50 500 120 45 1 &
java -jar RobotForaging.jar true e1_tar500_nsr180_nst30 30 50 500 180 30 1 &
java -jar RobotForaging.jar true e1_tar500_nsr180_nst45 30 50 500 180 45 1 &
java -jar RobotForaging.jar true e1_tar500_nsr180_nst60 30 50 500 180 60 1 &


And that's it! Run schedule.sh via terminal. Check your Activity Monitor (or similar), you should see your processes running. It is always good to leave yourself a bit of free CPU / RAM so that your computer is not completely useless when running your jobs.

For this particular set of simulations, Iridis allowed me to run around 30 simulations during around 15 hours (I had to wait a lot, my jobs requested 50 minutes of run time). Using background processes on my machine allows me to run 30 simulations in about 3 hours. If you are simply running serial applications simultaneously, have a look if you can save some time too. 

Comments

Lenka
[25/06/2013]


Sure you could, that would save even more time editing the .sh script! Thanks for the tip.
Davide
[25/06/2013]
Great post. Can you not put the java -jar command in a loop? Maybe a bit neater, though it would work the same both ways. You'd however save time next time, when you re-run your simulation with different parameters. :-)


{Please enable JavaScript in order to post comments}

Creeper

Creeper is a Java MVC framework for those who want to create multi-agent simulations (or games) and need something to build on. Creeper takes care of effective updating and rendering. You only need to specify the world objects and how they should look like.

Controlling Ant-Based Construction

Stigmergy allows insect colonies to collectively build structures that no single individual is fully aware of. Since relatively minimal sensory and reasoning capabilities are required of the agents, such building activity could be utilised by robotic swarms if we could learn how to control the shape of the final structures.

The Centralised Mindset and Complexity Science

Humans tend to explain decentralised phenomena as being caused by a single entity. This way of thinking is often referred to as 'the centralised mindset'. Several authors propose that using programming environments where creation of decentralised agent-based systems is easy...

A small compiler script for C with GCC

One of my favourite classes at the moment is the one where they teach us C. Knowing C already, it is a nice relaxation for Monday morning...

pyCreeper

The main purpose of pyCreeper is to wrap tens of lines of python code, required to produce graphs that look good for a publication, into functions. It takes away your need to understand various quirks of matplotlib and gives you back ready-to-use and well-documented code.

Novelty detection with robots using the Grow-When-Required Neural Network

The Grow-When-Required Neural Network implementation in simulated robot experiments using the ARGoS robot simulator.