Slurm
From KENET Training
Introduction
Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.
Slurm will provide you with the following to help you make use of the cluster;
- What resources are available on the cluster.
- Queuing and allocation of jobs based on specified resources.
- Job monitoring and status reporting.
These commands include:
$ sinfo : to view the cluster, resources and partition
$ squeue : view submitted job.
$ sbatch : submit a batch job.
$ sacct : for admins
$ scancel : to cancel your own job that has been submitted.
Together with these commands, a job submission script can be provided to slurm to set a jobs
parameters. Practical usage examples will be illustrated in the subsequent pages.
Watch
Template:Font-size Template:Huge
Quality of Service and Limitations
Users of CPU resources have zero access to the GPU resources, and are confined to CPU resources.
$ sacctmgr show qos format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
Name Priority GrpTRES MaxTRES MaxTRESMins
---------- ---------- ------------- ------------- -------------
normal 0
gpu_only 0 gres/gpu=2
cpu_only 0 gres/gpu=0
gpu_only and cpu_only are Slurm partitions (partitions are to Slurm what queues are to PBS torque)
Next: Basic_Usage:_CPU_Based_Resources_With_Slurm
Up: HPC_Usage