CGRB Bioinformatics logo amd_logo.jpg
Home | About | Hardware | Software | Using Genome

Using Genome

  • Introduction

  • Batch Queueing Commands

  • How to specify SGE Resources

  • Example Jobs

  • Configuration Information

    Introduction

    Parallel processing and single jobs can be run on the CGRB genome Cluster. Genome uses Sun Grid Engine (SGE) to manage and facilitate jobs.

    The following sections demonstrate how to configure your account and submit jobs to Genome using SGE. Job policies are described and several example batch job scripts for the most common types of runs are provided.

    While it is possible to do interactive jobs and small computational runs on the nodes of Genome, the bulk of computational work requires the use of SGE.

    Policies and Limitations

    • Unless otherwise instructed to do so, compiling or developing software on the nodes is strictly prohibited.
    • Users are not allowed to access the nodes directly (this includes ssh and rsh).
    • Overloading of nodes with jobs may lead to job termination by CGRB personnel, especially if memory consumption or disk usage exceeds available resources

    Batch Queueing Commands

    The following table lists the most commonly used SGE commands. Please consult the sge_intro section of the SGE man pages for a more complete list.

    What do I want to do?SGE command
    Submit a batch script myscript qsub myscript
    Change parameters for job with ID job_id waiting in queueqalter [job_id]
    Remove job with ID job_idqdel [job_id]
    Display status of batch jobsqstat
    Full Listing for qstat qstat -f
    qstat for a specific userqstat -u <username>
    X Window Frontend (integrated functionality)qmon

    Users should compile code and create submit scripts on the master machine (waterman) and then submit the scripts to SGE, which will dispatch them to appropriate nodes for execution. Please consult CGRB staff if you expect computation time to exceed 5 days.

    Basic commands to monitor and submit jobs using SGE:

    1. Show job/queue status - qstat
      • no arguments – show currently running/pending jobs
      • -f Show full listing of all queues
      • -j Shows detailed information on pending/running job
      • -u Shows current jobs by user

      Each node is assigned a unique queue name.

    2. Show job/host status - qhost
      • no arguments – show a table of all execution hosts and information about their configuration
      • -l attr=val Show only certain hosts
      • -j Shows detailed information on pending/running job
      • -q Shows detailed information on queues at each host
    3. Submitting scripts and binaries - qsub
      • no arguments – accepts input from STDIN (^D to send submit input)
      • -cwd Run the job from the current working directory
      • -v Pass the variable VAR (-V passes all variables)
      • -o Redirect standard output (Default: Home directory)

      Example:

      qsub a.out

      qsub -cwd -v SOME_VAR -o /dev/null -e /dev/null myjob.sh

      The submit parameters can be specified in the script, myjob.sh.
      In this case, just run:

      qsub myjob.sh

      Note that qsub only accepts shell scripts, not executable files.

      Also: man qsub for details

    4. Status of your running job - qstat
    5. Deleting submitted job from the queue - qdel
    6. Example:

      qdel jobID

      For running jobs, use the force flag, "-f".

      Example:

      qdel -f jobID

    7. Manual pages for some SGE commands

    How to specify SGE Resources

    Batch job options and resources can be given as command line switches to qsub, or they can be included in a SGE job script as a comment line of the form

    
    #$ option
    
    
    for a SGE script. See the examples below for details.

    general options
    SGE option Functionality Remarks
    -cwd Start job in directory from which it was submitted.  
    -M <mail_addr> User's e-mail address. obligatory so Support can contact you in case of problems.
    -m [a|b|e] Batch system sends e-mail when [aborting|starting|ending] job.  
    -N <req_name> Name of batch request. Default is name of the script. 8 characters at most.
    -o <filename> write standard output to specified file. Support recommends specifying the full path name. Default value is job_name.ojob_id, where job_name is the name of the job specified via the -N parameter, and job_id its identification number.

    Note that under SGE it is possible to use one of the following pseudo-environment variables as part of the full name specification:

    $HOME home directory
    $USER user ID
    $JOB_ID job ID
    $JOB_NAME job name (see -N option)
    $HOSTNAME name of target node
    $TASK_ID job array task index
    These are replaced by the run-time contents of the actual environment variables.
    -e <filename> write standard error to specified file. Under SGE, the same pseudo-environment variables can be used as for the -o option. Support recommends specifying the full path name. job_name.ejob_id is used as a default if no explicit name is specified.
    -j y write standard error to the same file as standard output Any -e specification is ignored.
    Job Control and Limits
    option functionality remarks
    -hard All the following requirements must be fulfilled for the job to be initiated. This is the default, so you actually only need this to invalidate a previous -soft option.
    -soft after fulfilment of all -hard requirements the job is processed within the queue fulfilling as many -soft specifications as possible. This means that at runtime not all requirements specified as -soft may be fulfilled. Hence the user's program must be able to cope with this situation.
    -h batch request is kept in User Hold (meaning it is queued but not initiated) With qalter -hU <job_id> you can remove the user hold from your job.
    -l p4=yes batch request is run on a Pentium 4 lxsrv13-18 or the IA32 extension.
    -l p4_himem=yes batch request is run on a Pentium 4 with up to 2 GByte of memory and up to 336 hours lxsrv51-lxsrv60
    -l mf=<memory> explicit memory requirement (mf abbreviates mem_free); available for IA32 only Examples:
    • -l mf=1200M specifies 1200 * 1024 * 1024 Bytes
    • -l mf=1200m specifies 1200 * 1000 * 1000 Bytes
    • -l mf=700000K specifies 700000 * 1024 Bytes
    • -l mf=700000k specifies 700000 * 1000 Bytes
    Note that contradictory requirements (e. g., -q chrom1.q -l mf=4GB) will result in a non-initiating job. The "mf" setting replaces the "vf" setting, which could lead to swapping and consequent performance degradation since "vf" ("virtual_free") refers to free virtual memory, whereas "mf" refers to physical memory.
    -q <queue_name> batch request targets a specific queue (mapped to a specific node) Note that qstat -f gives you a table the first column of which gives the queue names. It is also possible to specify multiple queue names in a comma-separated list. The job then runs on the first queue becoming available. Also note that specification of -q usually results in a longer waiting time for the request.
    -pe <par_env> <num_cpus> Specification of a parallel environment and the number of CPU's. The latter can be given as a fixed value or as a range (e. g., 4-7). If a range is given, SGE decides how many CPU's within the range are actually assigned. obligatory for parallel jobs.

    Available parallel environments:

    mpi MPI programs on parallel pool 1 (at most 18 CPUs)
    mpi_2 MPI programs on parallel pool 2 (first 9 CPUs)
    mpi_3 MPI programs on parallel pool 2 (second 9 CPUs)

    Examples

    Submitting a shell script with qsub

    #!/bin/bash #$ -S /bin/sh #$ -e $HOME/cluster/error #$ -o $HOME/cluster/output echo 'running date' date

    If the file containing this text is called thisscript.sh, you can submit the script by typing the following:

    qsub thisscript.sh

    Submitting a perl script with qsub

    #$ -o $HOME/cluster/error #$ -e $HOME/cluster/output #$ -N reCOG #$ -S /usr/bin/perl # use lib '/local/cluster/lib/perl5/site_perl'; $ENV{PERL5LIB} = '/mnt/local/cluster/lib/perl5/site_perl:/mnt/home/cgrb/cgrblib/perl5/COGDB'; print "Hello world!\n";

    MPI jobs on the parallel pools

    Submit Script #1 for MPI jobs on the parallel pools

    #!/bin/bash -f
    #
    #---------------- SHORT COMMENT ----------------------------------------
    # Template script for parallel MPI jobs to run on mphase Grid Engine cluster.
    # Modify it for your case and submit to CODINE with
    # command "qsub mpi_run.sh".

    # You may want to modify the parameters for
    # "-N" (job queue name), "-pe" (queue type and number of requested CPUs),
    # "myjob" (your compiled executable).


    # You can compile you code, for example myjob.c (*.f), with GNU mpicc or
    # mpif77 compilers as follows:
    # "mpicc -o myjob myjob.c" or "mpif77 -o myjob myjob.f"

    # You can monitor your jobs with command
    # "qstat -u your_username" or "qstat -f" to see all queues.
    # To remove your job, run "qdel job_id"
    # To kill running job, use "qdel -f job_id"

    # ------Attention: #$ is a special CODINE symbol, not a comment -----
    #
    # The name, which will identify your job in the queue system
    #$ -N Test_MPI_Job
    #
    # Queue request, mpich. You can specify the number of requested CPUs,
    # for example, from 2 to 3
    #$ -pe chris_mpi 1-4
    #
    # ---------------------------
    #$ -cwd
    #$ -o $HOME/output/Test_MPI.$JOB_ID.out
    #$ -e $HOME/output/error/Test_MPI.$JOB_ID.error
    #$ -v MPIR_HOME=/usr/local/cluster/mpich-1.2.6
    # ---------------------------
    myjob=$HOME/cluster/bin/cpi
    echo "Got $NSLOTS slots."


    # Don't modify the line below if you don't know what it is
    $MPIR_HOME/bin/mpirun -np $NSLOTS $myjob

    If the file containing this text is called thisscript.sh, you can submit the script by typing the following:

    qsub thisscript.sh

    Submit Script #2 for MPI jobs on the parallel pools by calling the executable as an argument



    #!/bin/bash -f
    #
    #---------------- SHORT COMMENT ----------------------------------------
    # Template script for parallel MPI jobs to run on mphase Grid Engine cluster.
    # Modify it for your case and submit to CODINE with
    # command "qsub thisscript.sh filename".
    #
    # This Script will allow for the binary to be used as the first argument
    # thus you would do a
    # qsub thisscript.sh filename.exe

    # You may want to modify the parameters for
    # "-N" (job queue name), "-pe" (queue type and number of requested CPUs),
    # "myjob" (your compiled executable).


    # You can compile you code, for example myjob.c (*.f), with GNU mpicc or
    # mpif77 compilers as follows:
    # "mpicc -o myjob myjob.c" or "mpif77 -o myjob myjob.f"

    # You can monitor your jobs with command
    # "qstat -u your_username" or "qstat -f" to see all queues.
    # To remove your job, run "qdel job_id"
    # To kill running job, use "qdel -f job_id"

    # ------Attention: #$ is a special CODINE symbol, not a comment -----
    #
    # The name, which will identify your job in the queue system
    #$ -N Test_MPI_Job
    #
    # Queue request, mpich. You can specify the number of requested CPUs,
    # for example, from 2 to 3
    #$ -pe chris_mpi 1-4
    #
    # ---------------------------
    #$ -cwd
    #$ -o $HOME/output/Test_MPI.$JOB_ID.out
    #$ -e $HOME/output/error/Test_MPI.$JOB_ID.error
    #$ -v MPIR_HOME=/usr/local/cluster/mpich-1.2.6
    # ---------------------------

    echo "Got $NSLOTS slots."


    # Don't modify the line below if you don't know what it is
    $MPIR_HOME/bin/mpirun -np $NSLOTS $1

    If the file containing this text is called thisscript.sh, you can submit the script by typing the following:

    qsub thisscript.sh filename.exe



    Configuration Hints

    Mapping of Queue Names to Host Names

    Under SGE you can specify on which host your job should run (by using the -q parameter, as described below). However, with the migration to the presently installed SGE releases the mapping of queue names to host names has changed: Queue names are not identical with host names any more. Instead the mapping indicated in the following table is valid:

    Hostname   Batch Queue Name
    chrom1   chrom1.q
    chrom2   chrom2.q
    ...   ...
    chrom17   chrom17.q
    chrom18   chrom18.q
    ...   ...

    System Setup for csh and tcsh users

    CGRB recommends using /bin/sh as a script shell in batch scripts. Should you still want to use csh or tcsh please note that you may need to include
    source /raid0/local/etc/std.cshrc
    
    in the example scripts presented above.

    MPI jobs on the parallel pools

    The first and second parallel pools are not (necessarily) binary compatible. Please consult the MPI Documentation concerning the correct way of setting up the compilation process as well as the runtime environment.

    Names of Job Scripts

    Batch scripts must not have a number as first character of their name. E. g., a script of the form 01ismyjob will not be correctly started. Please use one of the characters a-z, A-Z as first character of your job script name.

    Unprintable characters in Batch Scripts

    Scripts that have been edited under DOS/Windows may contain line-feeds and carriage-returns; these might not work under SGE. Furthermore, apparent whitespaces, for example in the
    #! /bin/sh
    
    specification could lead to problems with SGE. Scripts like these tend to block a queue altogether! Please remove such special whitespaces.
    Remark: To determine whether your script is in DOS/Windows format the following possibilities are available:
    • Edit the script with vi (=vim). In the status line you will see the string [dos].
    • Doing an octal dump of your script, you see the following
      $ od -c myscript | less
      0000000   #   !   /   b   i   n   /   s   h  \r  \n   #   $       -   o
       etc. etc. ...
      
      After #!/bin/sh you see the two whitespaces "\r\n", where UNIX requires a "\n" only.

    Please consult CGRB staff for more information regarding problems whith whitespaces

If you have any questions, please email Scott Givan.