Difference between revisions of "Basic Usage: GPU Based Resources With Slurm"
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | [[File:Slurm_logo.svg.png|150px]] | ||
+ | |||
== Introduction == | == Introduction == | ||
Line 26: | Line 28: | ||
</code> | </code> | ||
− | == Submitting Your first Job == | + | |
− | ==== Create a submission script ==== | + | == Submitting Your first GPU Job == |
+ | [[File:Quantum_ESPRESSO_logo.jpg|250px]] | ||
+ | ==== Create a submission script for Quantum Espresso ==== | ||
You require a submission script, which is a plain text file with all the instructions for the command you intend to run. | You require a submission script, which is a plain text file with all the instructions for the command you intend to run. | ||
− | + | Retreive the example files in your scratch directory from this [ https://github.com/Materials-Modelling-Group/training-examples | github repository ] | |
<code bash> | <code bash> | ||
cd ~/localscratch/ | cd ~/localscratch/ | ||
− | + | git clone https://github.com/Materials-Modelling-Group/training-examples.git | |
+ | cd training-examples | ||
</code> | </code> | ||
+ | |||
and in this directory we will place the following text content in a file: | and in this directory we will place the following text content in a file: | ||
<code bash> | <code bash> | ||
− | + | #!/bin/bash | |
− | + | #SBATCH -J gputest # Job name | |
− | + | #SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId) | |
− | + | #SBATCH -e %j.err # Name of std err | |
− | + | #SBATCH --partition=gpu1 # Queue | |
− | + | #SBATCH --nodes=1 # Total number of nodes requested | |
− | + | #SBATCH --gres=gpu:1 # Total number of gpus requested | |
− | + | #SBATCH --cpus-per-task=1 # | |
− | + | #SBATCH --time=00:03:00 # Run time (hh:mm:ss) - 1.5 hours | |
− | + | # Launch MPI-based executable | |
− | + | module load applications/gpu/qespresso/7.3.1 | |
− | + | cd $HOME/localscratch/training-examples | |
− | + | mpirun -np 1 pw.x <al.scf.david.in > output.out | |
</code> | </code> | ||
− | Put this in a file called | + | |
+ | Put this in a file called '''test.slurm''' | ||
==== Submitting the Job to the Queue ==== | ==== Submitting the Job to the Queue ==== | ||
Line 63: | Line 70: | ||
</code> | </code> | ||
This will run the named program on a single GPU, note that the GPU acceleration is built into the program, if the program itself does not support GPU acceleration, attempting to run on the GPU will fail. | This will run the named program on a single GPU, note that the GPU acceleration is built into the program, if the program itself does not support GPU acceleration, attempting to run on the GPU will fail. | ||
+ | |||
+ | == [https://asciinema.org/a/i0VEeL4p6CdpJA9iUFMNTvPQT Watch Demo ] == | ||
Next: | Next: | ||
− | [[ | + | [[Intermediate Usage: PyTorch and Tensorflow|Intermediate usage: PyTorch and Tensorflow]] |
Up: | Up: | ||
[[ HPC_Usage| HPC_Usage]] | [[ HPC_Usage| HPC_Usage]] |
Latest revision as of 19:31, 8 May 2025
Contents
[hide]Introduction
Simple commands with SLURM
You can obtain information on the Slurm "Partitions" that accept jobs using the sinfo command
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test up 1:00 1 idle gnt-usiu-gpu-00.kenet.or.ke
gpu1 up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke
normal* up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke
The test partition is reserved for testing, with a very short time limit. The normal partition is to be used for CPU only jobs,
and the gpu1 queue is reserved for GPU jobs. Both production partitions have a time limit of 24 hours at a time for individual
jobs.
Showing The Queue
The squeue slurm command will list all submitted jobs, and will give you an indication of how busy the cluster is, as well as the status of all running or waiting jobs. Jobs that are complete will exit the queue and will not be in this list.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
63 normal gpu1 jotuya R 0:03 1 gnt-usiu-gpu-00.kenet.or.ke
$
Submitting Your first GPU Job
Create a submission script for Quantum Espresso
You require a submission script, which is a plain text file with all the instructions for the command you intend to run.
Retreive the example files in your scratch directory from this [ https://github.com/Materials-Modelling-Group/training-examples | github repository ]
cd ~/localscratch/
git clone https://github.com/Materials-Modelling-Group/training-examples.git
cd training-examples
and in this directory we will place the following text content in a file:
#!/bin/bash
#SBATCH -J gputest # Job name
#SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH -e %j.err # Name of std err
#SBATCH --partition=gpu1 # Queue
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --gres=gpu:1 # Total number of gpus requested
#SBATCH --cpus-per-task=1 #
#SBATCH --time=00:03:00 # Run time (hh:mm:ss) - 1.5 hours
# Launch MPI-based executable
module load applications/gpu/qespresso/7.3.1
cd $HOME/localscratch/training-examples
mpirun -np 1 pw.x <al.scf.david.in > output.out
Put this in a file called test.slurm
Submitting the Job to the Queue
The slurm sbatch command provides the means to submit batch jobs to the queue:
$ sbatch test.slurm
Submitted batch job 64
$
This will run the named program on a single GPU, note that the GPU acceleration is built into the program, if the program itself does not support GPU acceleration, attempting to run on the GPU will fail.
Watch Demo
Next: Intermediate usage: PyTorch and Tensorflow
Up: HPC_Usage