Batch system

Z Komputery Dużej Mocy w ACK CYFRONET AGH
Skocz do:nawigacja, szukaj

Batch system is a convenient tool for running jobs on compute resources. As required, it can inform user by e-mail about date and job start or stop time.

Running jobs

To run a job you should use command qsub. Jobs can be executed in two modes: interactive (using terminal) or batch. In batch mode all commands are placed in a file. The file needs to have all lines started with #PBS. Thanks to that batch system is able to read proper options from these lines and run remaining instructions of the computer job from the rest of the file. Most important options and parameters of qsub command are listed in the table below.

Option Parameters Description Notes
-l This option is for specyfying resources (cores, nodes, memory) for a job. Options can be merged in one line. In that case they must be divided by colon.
nodes=<number_of_nodes> Parameter for specyfying number of nodes
ppn=<number_of_cores> Number of requested cores per node. on Baribal or Panda
cpus=<number_of_processors>
pmem=<amount_of_memory> Amount of memory per every allocated core. Alternatively, you can set the memory requirement for the job using the mem= parameter. In both cases the value must be set using mega- or gigabytes, eg. 1gb.
walltime=<computations_time> Declaration of maximum time of computations. Time should be specified in the following format: DD:GG:MM:SS, where DD, GG, MM, SS stands respectively for days, hours, minutes, seconds per job.
-N <job_name> Name of job in batch system
-j oe Merging standard output with standard error output during computing.
-m b, e or a Batch system will send e-mail to user when the job starts b, ends e or is aborted a.
-M <user@e-mail.address> User e-mail address on which job status message should be send.
-q <name_of_queue> Specifies name of queue on which job should be executed.
-I Interactive job execution
-X Allows running windows programs in interactive mode. unavailable on Baribal and Mars machines

Removing jobs

Command qdel is for removing jobs. To remove a job from queue you need it's ID (jobid), which can be displayed using qstat command. Full sequence could be:

qdel 12345678910 

If the job cannot be removed please contact with administrators.

Displaying information about jobs in queues

To display information about jobs and queues qstat command is used. Calling the command without parameters will display information about all jobs in queues. Frequently used options of qstat command are in the table below:

Option Parameters Description Notes
-q <queue_name> Without parameters will display information about available queues and their attributes.
-u <user_name> Display information about jobs of specific user.
<job_identifier> Display information about the job.
-f Full information about all jobs in batch system.


Useful variables of batch system

  • $PBS_NODEFILE This variable points to the file including names of nodes allocated for job. Used usually to determine the number of cores allocated for the the job, eg.:
export NPROC=`cat $PBS_NODEFILE | wc -l` 
  • $PBS_O_WORKDIR This variable points to the directory from which the job has been started. Used usually to point files necessary for job, eg.:
cd $PBS_O_WORKDIR 


Examples

Interactive job

Creating interactive job in "l_interactive" queue with possibility of working in text mode, requiring one core on one compute node.

qsub -I -q l_interactive -l nodes=1:ppn=1 

Creating interactive job in "l_interactive" queue with possibility of working in graphical mode, requiring one core on one compute node.

qsub -IX -q l_interactive -l nodes=1:ppn=1 

Note: If you need to use the graphical mode you must remember to connect to the accessing machine in the way allowing to use this mode.

Single-core job

Creating a job in "l_infinite" queue requiring one core on one compute node, using command line.

qsub -q l_infinite -l nodes=1:ppn=1 

In case of script, use the following entries:

#PBS -q l_infinite 
#PBS -l nodes=1:ppn=1 

Task reserving full compute node

First, you need to know if all the nodes on the compute machine has the same number of cores. To find this you can use pbsnodes -a command. In it's output you can get number of cores on each node, it's the number after np = . Starting a job in "l_infinite" queue reserving full 12-core compute node, using command line.

qsub -q l_infinite -l nodes=1:ppn=12 

In case of script, use the following entries:

#PBS -q l_infinite 
#PBS -l nodes=1:ppn=12 

Task reserving X cores on Y nodes.

Starting a job in "l_infinite" queue reserving X cores on each from Y compute nodes, using command line.

qsub -q l_infinite -l nodes=Y:ppn=X 

In case of script, use the following entries:

#PBS -q l_infinite 
#PBS -l nodes=Y:ppn=X