Slurm
From KENET Training
Introduction
Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.
Slurm will provide you with the following to help you make use of the cluster;
- What resources are available on the cluster.
- Queuing and allocation of jobs based on specified resources.
- Job monitoring and status reporting.
These commands include:
$ sinfo : to view the cluster, resources and partition
$ squeue : view submitted job.
$ sbatch : submit a batch job.
$ sacct : for admins
$ scancel : to cancel your own job that has been submitted.
Together with these commands, a job submission script can be provided to slurm to set a jobs
parameters. Practical usage examples will be illustrated in the subsequent pages.
Watch Demo
Quality of Service and Limitations
Users of CPU resources have zero access to the GPU resources, and are confined to CPU resources.
$ sacctmgr show qos format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
Name Priority GrpTRES MaxTRES MaxTRESMins
---------- ---------- ------------- ------------- -------------
normal 0
gpu_only 0 gres/gpu=2
cpu_only 0 gres/gpu=0
debug 50 gres/gpu=1
gpu_only and normal are Slurm QOS parameters.
Next: Basic_Usage:_CPU_Based_Resources_With_Slurm
Up: HPC_Usage