Eden is a script-based tool for easily managing runs of many small jobs on Nautilus or Darter without flooding the job queue. A list of command lines that run your jobs is all that is needed--Eden takes care of the rest. Eden generates and submits a single PBS script that will start up some number of job-executing processes (one for each cpu requested). These processes then execute your job commands in parallel, running through the list until it is exhausted. When complete, Eden provides a summary file of stderr, stdout and timing information from the individual runs.
If you have lots of small jobs to run, Eden is the tool for you.
Notice: These instructions cover the use of Eden 1.4 which is now the default version.
To start using Eden, you must first load it through the module system:
> module load eden
Eden is very easy to use. The basic steps are:
- Create a run directory--call it whatever you wish.
- Create a list of command lines to run your jobs--name it 'commands' and place it in your run directory.
- Create a header file for a PBS script including the specific PBS options for your job and place it in your run directory.
- Run the main eden script, specifying the name of your run directory.
And that's all there is to it. Here's an in-depth example:
> # first load eden > module load eden > > # make a run directory > mkdir testrun > > # let's say we have already generated our command list > # with 3000 commands called test_commands > wc -l test_commands 3000 > > head -n 5 test_commands mkdir /path/to/output00; ./my_program -t 0.1 -o /path/to/output00 mkdir /path/to/output01; ./my_program -t 0.2 -o /path/to/output01 mkdir /path/to/output02; ./my_program -t 0.3 -o /path/to/output02 mkdir /path/to/output03; ./my_program -t 0.4 -o /path/to/output03 mkdir /path/to/output04; ./my_program -t 0.5 -o /path/to/output04 > > # copy test_commands into run directory and rename it > cp test_commands testrun/commands > > # also copy our executable into the run directory > cp my_program testrun/ > > # make a header file including our PBS options for the job > # (it needs to be called 'header.pbs' and placed in the run directory) > # (this example is for Nautilus; Darter would use 'size' instead of 'ncpus' > # and the size would have to be a multiple of 16) > > cat testrun/header.pbs #PBS -l ncpus=128 #PBS -N test1 #PBS -A UT-TENN0038 #PBS -j oe > > # now we're ready to run Eden, giving the name of the run directory > # as a command line option > eden testrun EDEN: writing eden.config EDEN: creating outfiles directory EDEN: creating pbs script EDEN: submitting batch job 31114.nemo.nics.utk.edu >
In the above example, Eden will submit a single PBS script that will use 128 CPUs to run 3000 instances of my_program.
A few clarifications:
- Notice in the above example that each command line in the list actually contains two commands separated by a semicolon. Eden takes an entire line at a time from your commands file as a separate job for a particular CPU. So each 'job' can actually be multiple commands chained together.
- Eden automatically writes a PBS script and it is submitted (via qsub) from within your run directory. The PBS script will include whatever options you provide in header.pbs. Eden will also add the line 'cd $PBS_O_WORKDIR' to the PBS script--which means that any job executed in the script will be executed from within your run directory. That's the reason we moved the my_program executable to the run directory in the above example.
When your Eden run completes, you'll have several new things in your run directory:
- commands_done - as commands are completed, their index number is written to this file; if an Eden job is stopped prematurely, this file can be used to start again from where it left off (see 'Restarting a run' below); also you can use this file to monitor the progress of your Eden run
- eden.config - this file lists all of the information regarding your job including a timestamp; it is used internally by Eden to coordinate processes
- eden_job.pbs - this is the actual PBS script generated by Eden (including your header.pbs file)
- summary.csv - this is a csv listing of the stats from your job including timing, filesizes of the stdout and stderr from your job and the command line from each job
- PBS output files - the usual .o and/or .e files that PBS outputs; note that Eden includes the 'ja' command at the end of the PBS script so job accounting information (timing, cpu usage, etc.) will be included in these files
- outfiles/ - this directory contains files for the stdout, stderr and timing information from the individual commands run by Eden; for each command n, there will be n.out, n.err and n.time; stats from these files are collected in the summary file
Restarting a Run
If for some reason, your Eden job stops prematurely, it can be restarted and will pick up where it left off. To accomplish this, create a new run directory and copy the commands, header.pbs and commands_done files from the incomplete run into this new directory. Then run Eden with this new run directory to complete your run.
Running Eden from a params file
For cases where you need to perform a parameter sweep through all possible combinations of parameters, Eden provides a way to generate these commands automatically via a params file. A params file contains a template of the command you wish to run with placeholders for values that will change from run to run. It then lists the various parameters with their possible values. Eden will generate a commands file from this information which includes a command for every possible combination of the parameter values.
Here's a simple example of a params file:
./myprogram -t $threshold -o file$i $alpha $beta threshold 0.2 0.4 0.6 0.8 1.0 alpha NULL -a beta NULL -b -B
The first line is the command template with placeholders for the different parameters. The placeholders are simply the parameter name prepended with a dollar sign ($) just like shell script variables ($threshold, $i, $alpha and $beta). The following lines list the parameter name (without the $) along with the possible values they can take on. For instance, the threshold parameter lists five values in this example. The special keyword NULL is used for an empty value (meaning the parameter will not appear in the command line). For instance, the beta parameter has three possibilites: either it will appear as -b or -B or else it will not appear at all. The $i parameter is built-in to Eden and simply provides an incremental index number to use for your commands. In this case, it is used to append the filename for each run.
To see how all of this works, look at the first few commands generated by the above params file:
./myprogram -t 0.2 -o file000 ./myprogram -t 0.2 -o file001 -b ./myprogram -t 0.2 -o file001 -B ./myprogram -t 0.2 -o file002 -a ./myprogram -t 0.2 -o file003 -a -b ./myprogram -t 0.2 -o file003 -a -B ./myprogram -t 0.4 -o file004 ./myprogram -t 0.4 -o file005 -b ...
To run Eden with a params file, simply place the params file (instead of a commands file) in your run directory and launch Eden. Eden will then automatically generate a command list and run your jobs.
You may want to modify the commands file generated from your params file before running your jobs. In this case, you can run the make_commands.sh script separately:
> make_commands.sh < params > commands
Then you can edit the generated commands file before placing it in the run directory and running Eden.
Specifying More Than One Core Per Job
By default, Eden assigns each job from your command list to a single CPU. If you need your jobs to use more than one CPU, use the -m flag when you run Eden. For example, to have your jobs take 4 CPUs each, run Eden with:
> eden run_dir -m 4
Comments or Bugs
Contact Scott Simmerman at email@example.com