|
|
Using Genome
- Introduction
- Batch Queueing Commands
- How to specify SGE Resources
- Example Jobs
- Configuration Information
Introduction
Parallel processing and single jobs can be run on the CGRB genome Cluster. Genome
uses Sun Grid Engine (SGE) to manage and facilitate jobs.
The following sections demonstrate how to configure your account and submit jobs to Genome using SGE.
Job policies are described and several example batch job scripts for the most
common types of runs are provided.
While it is possible to do interactive jobs and small
computational runs on the nodes of
Genome, the bulk of computational work
requires the use of SGE.
Policies and Limitations
- Unless otherwise instructed to do so, compiling or developing software on the nodes is strictly prohibited.
- Users are not allowed to access the nodes directly (this includes ssh and rsh).
- Overloading of nodes with jobs may lead to job termination by CGRB personnel, especially if memory consumption or disk usage exceeds available resources
Batch Queueing Commands
The following table lists the most commonly used SGE commands. Please consult the sge_intro section of the SGE man pages for a more complete list.
| What do I want to do? | SGE command |
| Submit a batch script myscript | qsub myscript |
| Change parameters for job with ID job_id waiting in queue | qalter [job_id] |
| Remove job with ID job_id | qdel [job_id] |
| Display status of batch jobs | qstat |
| Full Listing for qstat | qstat -f |
| qstat for a specific user | qstat -u <username> |
| X Window Frontend (integrated functionality) | qmon |
Users should compile code and create submit scripts on the master machine (waterman) and
then submit the scripts to SGE, which will dispatch them to appropriate nodes for execution.
Please consult CGRB staff if you expect computation time to exceed 5 days.
Basic commands to monitor and submit jobs using SGE:
- Show job/queue status - qstat
- no arguments – show currently running/pending jobs
- -f Show full listing of all queues
- -j Shows detailed information on pending/running job
- -u Shows current jobs by user
Each node is assigned a unique queue name.
- Show job/host status - qhost
- no arguments – show a table of all execution hosts and information about their configuration
- -l attr=val Show only certain hosts
- -j Shows detailed information on pending/running job
- -q Shows detailed information on queues at each host
- Submitting scripts and binaries - qsub
- no arguments – accepts input from STDIN (^D to send submit input)
- -cwd Run the job from the current working directory
- -v Pass the variable VAR (-V passes all variables)
- -o Redirect standard output (Default: Home directory)
Example:
qsub a.out
qsub -cwd -v SOME_VAR -o /dev/null -e /dev/null myjob.sh
The submit parameters can be specified in the script, myjob.sh.
In this case, just run:
qsub myjob.sh
Note that qsub only accepts shell scripts, not executable files.
Also: man qsub for details
- Status of your running job - qstat
- Deleting submitted job from the queue - qdel
Example:
qdel jobID
For running jobs, use the force flag, "-f".
Example:
qdel -f jobID
- Manual pages for some SGE commands
How to specify SGE Resources
Batch job options and resources can be given as command line switches to qsub,
or they can be included in a SGE job script as a comment line of the form
#$ option
for a SGE script.
See the examples below for details.
| general options
|
|---|
| SGE option | Functionality | Remarks
|
|---|
| -cwd
| Start job in directory from which it was submitted.
|
| | -M <mail_addr> | User's e-mail address. |
obligatory
so Support can contact you in case of problems.
| | -m [a|b|e] | Batch system sends e-mail when [aborting|starting|ending] job. |
| | -N <req_name> | Name of batch request. Default
is name of the script. | 8 characters at most.
| | -o <filename> | write standard output to specified file.
| Support recommends specifying the full path name. Default value is
job_name.ojob_id, where job_name is the name of the job specified via the -N
parameter, and job_id its identification number.
Note that under SGE
it is possible to use one of the following
pseudo-environment variables as part of the full name specification:
| $HOME | home directory
| | $USER | user ID
| | $JOB_ID | job ID
| | $JOB_NAME | job name (see -N option)
| | $HOSTNAME | name of target node
| | $TASK_ID | job array task index
|
These are replaced by the run-time contents of the actual
environment variables.
| | -e <filename> | write standard error to
specified file. Under SGE, the same pseudo-environment variables can be used
as for the -o option.
| Support recommends specifying the full path name.
job_name.ejob_id is used
as a default if no explicit name is specified.
| | -j y | write standard error to the same file as standard output
| Any -e specification is ignored.
| | Job Control and Limits
|
|---|
| option | functionality | remarks
|
|---|
| -hard | All the following requirements must be fulfilled
for the job to be initiated.
| This is the default, so you
actually only need this to invalidate a previous -soft option.
| | -soft | after fulfilment of all -hard
requirements the job is processed within the queue fulfilling as
many -soft specifications as possible.
| This means that
at runtime not all requirements specified as
-soft may be fulfilled. Hence the user's program
must be able to cope with this situation.
| | -h | batch request is kept in User Hold (meaning
it is queued but not initiated)
| With qalter -hU <job_id> you can
remove the user hold from your job.
| | -l p4=yes | batch request is run on a Pentium 4
| lxsrv13-18 or the IA32 extension.
| | -l p4_himem=yes | batch request is run on a Pentium 4 with
up to 2 GByte of memory and up to 336 hours
| lxsrv51-lxsrv60
| | -l mf=<memory> | explicit memory requirement (mf
abbreviates mem_free); available for IA32 only
| Examples:
- -l mf=1200M specifies 1200 * 1024 * 1024 Bytes
- -l mf=1200m specifies 1200 * 1000 * 1000 Bytes
- -l mf=700000K specifies 700000 * 1024 Bytes
- -l mf=700000k specifies 700000 * 1000 Bytes
Note that contradictory requirements (e. g., -q chrom1.q -l
mf=4GB) will result in a non-initiating job. The "mf" setting replaces the "vf" setting, which
could lead to swapping and consequent performance degradation
since "vf" ("virtual_free")
refers to free virtual memory, whereas "mf" refers to
physical memory.
| | -q <queue_name>
| batch request targets a specific queue (mapped to a specific node)
| Note that qstat -f gives you a table the first
column of which gives the queue names. It is also possible to
specify multiple queue names in a comma-separated list. The
job then runs on the first queue becoming available.
Also note that specification of -q usually results in a longer waiting
time for the request.
| | -pe <par_env> <num_cpus>
| Specification of a parallel environment and the number of
CPU's. The latter can be given as a fixed value or as a range
(e. g., 4-7). If a range is given, SGE decides how many CPU's
within the range are actually assigned.
| obligatory for parallel jobs.
Available parallel environments:
| mpi | MPI programs on parallel pool 1 (at
most 18 CPUs)
| | mpi_2 | MPI programs on parallel pool 2 (first
9 CPUs)
| | mpi_3 | MPI programs on parallel pool 2 (second
9 CPUs)
|
|
Examples
Submitting a shell script with qsub
| #!/bin/bash
#$ -S /bin/sh
#$ -e $HOME/cluster/error
#$ -o $HOME/cluster/output
echo 'running date'
date
|
If the file containing this text is called thisscript.sh, you can submit the script by typing the following:
qsub thisscript.sh
Submitting a perl script with qsub
| #$ -o $HOME/cluster/error
#$ -e $HOME/cluster/output
#$ -N reCOG
#$ -S /usr/bin/perl
#
use lib '/local/cluster/lib/perl5/site_perl';
$ENV{PERL5LIB} = '/mnt/local/cluster/lib/perl5/site_perl:/mnt/home/cgrb/cgrblib/perl5/COGDB';
print "Hello world!\n";
|
MPI jobs on the parallel pools
#!/bin/bash -f
#
#---------------- SHORT COMMENT ----------------------------------------
# Template script for parallel MPI jobs to run on mphase Grid Engine cluster.
# Modify it for your case and submit to CODINE with
# command "qsub mpi_run.sh".
# You may want to modify the parameters for
# "-N" (job queue name), "-pe" (queue type and number of requested CPUs),
# "myjob" (your compiled executable).
# You can compile you code, for example myjob.c (*.f), with GNU mpicc or
# mpif77 compilers as follows:
# "mpicc -o myjob myjob.c" or "mpif77 -o myjob myjob.f"
# You can monitor your jobs with command
# "qstat -u your_username" or "qstat -f" to see all queues.
# To remove your job, run "qdel job_id"
# To kill running job, use "qdel -f job_id"
# ------Attention: #$ is a special CODINE symbol, not a comment -----
#
# The name, which will identify your job in the queue system
#$ -N Test_MPI_Job
#
# Queue request, mpich. You can specify the number of requested CPUs,
# for example, from 2 to 3
#$ -pe chris_mpi 1-4
#
# ---------------------------
#$ -cwd
#$ -o $HOME/output/Test_MPI.$JOB_ID.out
#$ -e $HOME/output/error/Test_MPI.$JOB_ID.error
#$ -v MPIR_HOME=/usr/local/cluster/mpich-1.2.6
# ---------------------------
myjob=$HOME/cluster/bin/cpi
echo "Got $NSLOTS slots."
# Don't modify the line below if you don't know what it is
$MPIR_HOME/bin/mpirun -np $NSLOTS $myjob
|
If the file containing this text is called thisscript.sh, you can submit the script by typing the following:
qsub thisscript.sh
#!/bin/bash -f
#
#---------------- SHORT COMMENT ----------------------------------------
# Template script for parallel MPI jobs to run on mphase Grid Engine cluster.
# Modify it for your case and submit to CODINE with
# command "qsub thisscript.sh filename".
#
# This Script will allow for the binary to be used as the first argument
# thus you would do a
# qsub thisscript.sh filename.exe
# You may want to modify the parameters for
# "-N" (job queue name), "-pe" (queue type and number of requested CPUs),
# "myjob" (your compiled executable).
# You can compile you code, for example myjob.c (*.f), with GNU mpicc or
# mpif77 compilers as follows:
# "mpicc -o myjob myjob.c" or "mpif77 -o myjob myjob.f"
# You can monitor your jobs with command
# "qstat -u your_username" or "qstat -f" to see all queues.
# To remove your job, run "qdel job_id"
# To kill running job, use "qdel -f job_id"
# ------Attention: #$ is a special CODINE symbol, not a comment -----
#
# The name, which will identify your job in the queue system
#$ -N Test_MPI_Job
#
# Queue request, mpich. You can specify the number of requested CPUs,
# for example, from 2 to 3
#$ -pe chris_mpi 1-4
#
# ---------------------------
#$ -cwd
#$ -o $HOME/output/Test_MPI.$JOB_ID.out
#$ -e $HOME/output/error/Test_MPI.$JOB_ID.error
#$ -v MPIR_HOME=/usr/local/cluster/mpich-1.2.6
# ---------------------------
echo "Got $NSLOTS slots."
# Don't modify the line below if you don't know what it is
$MPIR_HOME/bin/mpirun -np $NSLOTS $1
|
If the file containing this text is called thisscript.sh, you can submit the script by typing the following:
qsub thisscript.sh filename.exe
Configuration Hints
Mapping of Queue Names to Host Names
Under SGE you can specify on which host your job should run (by using
the -q parameter, as described below). However, with the migration to
the presently installed SGE releases the mapping of
queue names to host
names has changed: Queue names are not identical with
host names any more. Instead the mapping indicated in the following
table is valid:
| Hostname | | Batch Queue Name
|
| chrom1 | | chrom1.q
|
| chrom2 | | chrom2.q
|
| ... | | ...
|
| chrom17 | | chrom17.q
|
| chrom18 | | chrom18.q
|
| ... | | ...
|
System Setup for csh and tcsh users
CGRB recommends using /bin/sh as
a script shell in batch scripts. Should you still want to use
csh or tcsh please note that you may need to include
source /raid0/local/etc/std.cshrc
in the example scripts presented above.
MPI jobs on the parallel pools
The first and second parallel pools are not (necessarily) binary
compatible.
Please consult the MPI Documentation
concerning the correct way of setting up the compilation process as
well as the runtime environment.
Names of Job Scripts
Batch scripts must not have a number as first character of their name.
E. g., a script of the form 01ismyjob will not be correctly
started. Please use one of the characters a-z, A-Z as first character
of your job script name.
Unprintable characters in Batch Scripts
Scripts that have been edited under DOS/Windows may contain line-feeds and
carriage-returns; these might not work under SGE. Furthermore, apparent
whitespaces, for example in the
#! /bin/sh
specification could lead to problems with SGE. Scripts like these tend
to block a queue altogether! Please remove such special whitespaces.
Remark: To determine whether your script is in DOS/Windows
format the following possibilities are available:
Please consult CGRB staff for more information regarding problems whith whitespaces
|
|