Difference between revisions of "Basic Usage: CPU Based Resources With Slurm"
Line 9: | Line 9: | ||
== Simple commands with SLURM == | == Simple commands with SLURM == | ||
− | You can obtain information on the Slurm "Partitions" that accept jobs using the sinfo command: | + | You can obtain information on the Slurm "Partitions" that accept jobs using the sinfo command |
+ | <code bash> | ||
+ | $ sinfo | ||
+ | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
+ | test up 1:00 1 idle gnt-usiu-gpu-00.kenet.or.ke | ||
+ | gpu1 up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke | ||
+ | normal* up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke | ||
+ | </code> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
The test partition is reserved for testing, with a very short time limit. The normal partition is to be used for CPU only jobs, | The test partition is reserved for testing, with a very short time limit. The normal partition is to be used for CPU only jobs, | ||
Line 25: | Line 25: | ||
== Showing The Queue == | == Showing The Queue == | ||
The squeue slurm command will list all submitted jobs, and will give you an indication of how busy the cluster is, as well as the status of all running or waiting jobs. Jobs that are complete will exit the queue and will not be in this list. | The squeue slurm command will list all submitted jobs, and will give you an indication of how busy the cluster is, as well as the status of all running or waiting jobs. Jobs that are complete will exit the queue and will not be in this list. | ||
+ | |||
<code bash> | <code bash> | ||
− | $ squeue | + | $ squeue |
− | + | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |
− | + | 63 normal gpu1 jotuya R 0:03 1 gnt-usiu-gpu-00.kenet.or.ke | |
− | $ | + | $ |
</code> | </code> | ||
== Submitting Your first Job == | == Submitting Your first Job == | ||
+ | You require a submission script, which is a plain text file with all the instructions for the command you intend to run: | ||
+ | <code bash> | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH -J testjob # Job name | ||
+ | #SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId) | ||
+ | #SBATCH -e %j.err # Name of std err | ||
+ | #SBATCH --partition=normal # Queue | ||
+ | #SBATCH --nodes=1 # Total number of nodes requested | ||
+ | #SBATCH --gres=gpu:1 # Total number of gpus requested | ||
+ | #SBATCH --cpus-per-task=1 # | ||
+ | #SBATCH --time=00:03:00 # Run time (hh:mm:ss) - 1.5 hours | ||
+ | |||
+ | # Launch MPI-based executable | ||
+ | module load applications/qespresso/7.3.1 | ||
+ | |||
+ | cd $HOME/test | ||
+ | mpirun -np 4 pw.x <input.in > output.out | ||
+ | </code> | ||
+ | Put this in a file called *test.slurm* | ||
+ | |||
+ | ==== Submitting the Job to the Queue ==== | ||
+ | The slurm sbatch command provides the means to submit batch jobs to the queue: | ||
+ | <code bash> | ||
+ | $ sbatch test.slurm | ||
+ | Submitted batch job 64 | ||
+ | $ | ||
+ | </code> | ||
+ | This will run the named program on four 4 cores, note that the parallelism is built into the program, if the program itself is not parallelised, running on multiple cores will not provide any benefit. |
Revision as of 13:52, 1 April 2025
Contents
[hide]Introduction
Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.
Slurm will provide you with the following to help you make use of the cluster;
- What resources are available on the cluster.
- Queuing and allocation of jobs based on specified resources.
- Job monitoring and status reporting.
Simple commands with SLURM
You can obtain information on the Slurm "Partitions" that accept jobs using the sinfo command
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test up 1:00 1 idle gnt-usiu-gpu-00.kenet.or.ke
gpu1 up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke
normal* up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke
The test partition is reserved for testing, with a very short time limit. The normal partition is to be used for CPU only jobs,
and the gpu1 queue is reserved for GPU jobs. Both production partitions have a time limit of 24 hours at a time for individual
jobs.
Showing The Queue
The squeue slurm command will list all submitted jobs, and will give you an indication of how busy the cluster is, as well as the status of all running or waiting jobs. Jobs that are complete will exit the queue and will not be in this list.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
63 normal gpu1 jotuya R 0:03 1 gnt-usiu-gpu-00.kenet.or.ke
$
Submitting Your first Job
You require a submission script, which is a plain text file with all the instructions for the command you intend to run:
#!/bin/bash
#SBATCH -J testjob # Job name
#SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH -e %j.err # Name of std err
#SBATCH --partition=normal # Queue
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --gres=gpu:1 # Total number of gpus requested
#SBATCH --cpus-per-task=1 #
#SBATCH --time=00:03:00 # Run time (hh:mm:ss) - 1.5 hours
# Launch MPI-based executable
module load applications/qespresso/7.3.1
cd $HOME/test
mpirun -np 4 pw.x <input.in > output.out
Put this in a file called *test.slurm*
Submitting the Job to the Queue
The slurm sbatch command provides the means to submit batch jobs to the queue:
$ sbatch test.slurm
Submitted batch job 64
$
This will run the named program on four 4 cores, note that the parallelism is built into the program, if the program itself is not parallelised, running on multiple cores will not provide any benefit.