The Gaut Lab

Department of Ecology & Evolutionary Biology

Running a job on Meta-Titus using Sun Grid Engine

January 3, 2007

1 When to use the queueing system

1.1 When not to use the queueing system

sftp, ftp, or other file transfer
normal system commands (cd, ls, vim, etc.)
short jobs (say <1 hour)
interactive jobs (R, MySQL, etc.)

For short jobs and interactive stuff, please use the command “qrsh”. This will automatically log you on to one of the cluster computing nodes. Though in reality this will only make much difference if the system is being heavily used, it is good practice nonetheless. Remember to logoff the computing node ("exit") when you're done.

1.2 When to use the queueing system

long jobs (>1 hour)
computationally intensive jobs
many repeated runs of short jobs

2 How to submit a job

To run something using the queuing system, you will need to include all of your commands in a script (3), and then use the qsub command to submit the script. Some common option for qsub are listed below. Please see the manpage for qsub (http://www.hpc.dtu.dk/GridEngine/man/qsub.html or type "man qsub" anywhere on titus ) for more details. Note that virtually any kind of command can be run this way, including batch jobs for R, perl scripts, and C programs. The cluster will automatically create two files, SCRIPTNAME.eJOBID and SCRIPTNAME.oJOBID substituting in the script's name and it's jobid. These files contain the standard error (the e file) and standard output (the o file) from your script. If your script already redirects standard output to a file, you won't need the o file. Otherwise, the output from your commands will be found in these files (also see the -e and -o options below).

2.1 qsub options

All of the options for qsub can be run either from the command line (e.g. "qsub -cwd stupid.sh") or inserted as lines in your script starting with #$ (e.g. "#$ -cwd"). If you are using the scripting resources in this howto, you will always need to include the -S /bin/sh option when running a script.

-@ optionfile

Forces qsub to use the options contained in optionfile. The indicated file may contain all valid options. Comment lines are starting with a "#" sign.

-cwd

Execute the job from the current working directory. This switch will activate Grid Engine's path aliasing facility, if the corresponding configuration files are present.

-e [hostname:]path,...

Defines or redefines the path used for the standard error stream of the job. If the path constitutes an absolute path name, the error-path attribute of the job is set to its value including the hostname. If the path name is relative, Grid Engine expands path either with the current working directory path in case the -cwd (see above) switch is also specified or with the home directory path otherwise. If hostname is present, the standard error stream will be placed under the corresponding location if the job runs on the specified host. By default the file name for standard error has the form job_name.ejob_id. If path is a directory, the standard error stream of the job will be put in this directory under the default file name.

-help

Prints a listing of all options.

-j y|n

Specifies whether or not the standard error stream of the job is merged into the standard output stream. If both the -j y and the -e options are present, Grid Engine sets, but ignores the error-path attribute.

-m b|e|a|s|n,...

Defines or redefines under which circumstances mail is to be sent to the job owner or to the users defined with the -M option described below. The option arguments have the following meaning:
'b' Mail is sent at the beginning of the job.
'e' Mail is sent at the end of the job.
'a' Mail is sent when the job is aborted or rescheduled.
's' Mail is sent when the job is suspended.
'n' No mail is sent.
Currently no mail is sent when a job is suspended.

-M user[@host],...

Defines or redefines the list of users to which the server that executes the job has to send mail, if the server sends mail about the job. Default is the job owner at the originating host. You can use this to add your email address for updates or information about your jobs. For example, "qsub -M johndoe@example.com stupid.sh" will send email about the job status of the stupid.sh script to johndoe@example.com.

-N name

The name of the job. The name can be any printable set of characters, starting with an alphabetic character. If the -N option is not present Grid Engine assigns the name of the job script to the job after any directory pathname has been removed from the script-name.

-o [hostname:]path,...

The path used for the standard output stream of the job. The path is handled as described in the -e option for the standard error stream. By default the file name for standard output has the form job_name.ojob_id

-S [host:]pathname,...

Specifies the interpreting shell for the job. Only one pathname component without a host specifier is valid and only one path name for a given host is allowed. Shell paths with host assignments define the interpreting shell for the job if the host is the execution host. The shell path without host specification is used if the execution host matches none of the hosts in the list.

3 How to write a script (for bash)

This section is lifted almost completely from http://floppix.ccai.com/scripts1.html. For more detailed instructions on bash scripting including loops and conditionals, check out BASH Programming - Introduction HOW-TO or http://pegasus.rutgers.edu/~elflord/unix/bash-tute.html.

3.1 What is a script

A bash script is a file containing a list of commands to be executed by the bash shell.

3.2 Simple scripts

The very simplest scripts contain a set of commands that you would normally enter from the keyboard. For example, the following six lines are stored in the scripts usage.sh.

#! /bin/bash
# script to determine disk usage in home directory

cd ~/
echo The script has now entered your home directory.
echo We will now see how much space you are using here.
du -sh

Line 1: specifies which shell (the bash shell) should be used to interpret the commands in the script. This is needed in all scripts.
Line 2: is a comment (has no effect when the script is executed).
Line 3: changes to the home directory
Line 4-5: Echo messages to the screen
Line 6: calculates disk usage of the current directory.

3.3 Running a script

From the queueing system

For submitting jobs to the queueing system, you will not need to do either of these methods. Just type "qsub usage.sh" to run the script usage.sh.

As an executable

Make the script executable "chmod 700 usage.sh". Now run the script with the command: "./usage.sh" which means: run usage.sh from the current directory.

Using bash

You can also run a bash script that is not executable by simply typing "bash" before the script name: "bash usage.sh". If you are getting error messages when you run the script, you can trace the lines as they execute using the command "bash -v usage.sh". As the script executes, each line is displayed on the screen so that you know exactly what your script is doing.

3.4 Using variables in a script

Variables are created when you assign a value to them ( eg: homedir=~/ )

To use the variable, put a $ before the variable name. ( eg: echo $homedir )

Modify the usage.sh script to use the homedir variable as follows:

#! /bin/bash
# script to determine disk usage in home directory

homedir = ~/
cd $homedir
echo The script has now entered your home directory.
echo We will now see how much space you are using here.
du -sh

3.5 Getting user input

A script can get input from the user while it is running. Use the echo command to display a prompt on the screen and the read command to get the input.

#! /bin/bash

echo -n "Pick a directory:"
read -e direc
cd $direc
echo The script is now in $direc
du -sh

Passing Parameters on the command line:

You can also pass parameters to the script on the command line. Bash will accept up to 9 parameters separated by spaces. The first parameter is $1, the second parameter is $2, etc. The usage.sh script using input parameters is shown below:

#! /bin/bash

cd $1
du -sh

To run the script, use the command "bash usage.sh ~/"
In this case, $1 will be given the value "~/" or your home directory.

4 Managing cluster jobs

4.1 Checking status

To check the jobs in the queue, you can use the command "qstat". The website http://titus.bio.uci.edu/ganglia will also show you the status of all the cluster nodes and offers other tools for checking on the system. On the left hand side under the time, where it says "Rocks Tools", you can click on "Job Queue" to see running jobs, pending jobs, finished jobs, etc. Note that this website is only accessible from certain ip addresses on campus, so you will not be able to use this from elsewhere.

Try running the example script sleeper.sh using "qsub /opt/gridengine/examples/jobs/sleeper.sh" and then using the web or qstat to view system resources.

Under the state column from qstat you can see the status of your job. Some of the codes are:

r: the job is running
t: the job is being transferred to a cluster node
qw: the job is queued (and not running yet)
Eqw: an error occurred with the job

You can look at the manual page for qstat (type man qstat at the prompt) to get more information on the state codes.

You can use "qstat -j jobid" to check the status of the job or to get error messages.

Another important thing to note is the job-ID for your job. You need to know this if you ever want to make changes to your job.

Also note that you can have the system send mail to your user account when certain things happen to the job (see the -M option for qsub above).

4.2 Killing jobs

Normally, to end a job submitted on the queue, you can run "qdel job-ID" where job-ID is the job-ID you get from qstat. Qdel only deletes the job from the queue. Sometimes, if the process spawns a bunch of smaller executables, you may want to kill those as well. This you would do using the command "cluster-kil" and a regular expression for the name of the process you want to kill. If you are not sure if you need to use this, don't.