Difference between revisions of "Slurm"
From KENET Training
(21 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | [[File:Slurm_logo.svg.png|200px]] | ||
+ | |||
=== Introduction === | === Introduction === | ||
Slurm [https://slurm.schedmd.com/documentation.html] is a workload manager for clusters, offering both batch and interactive job scheduling. | Slurm [https://slurm.schedmd.com/documentation.html] is a workload manager for clusters, offering both batch and interactive job scheduling. | ||
Line 10: | Line 12: | ||
These commands include: | These commands include: | ||
<code bash> | <code bash> | ||
− | $ sinfo | + | $ sinfo : to view the cluster, resources and partition |
− | $ squeue | + | $ squeue : view submitted job. |
− | $ sbatch | + | $ sbatch : submit a batch job. |
− | $ sacct | + | $ sacct : for admins |
+ | $ scancel : to cancel your own job that has been submitted. | ||
+ | </code> | ||
+ | Together with these commands, a job submission script can be provided to slurm to set a jobs | ||
+ | parameters. Practical usage examples will be illustrated in the subsequent pages. | ||
+ | |||
+ | |||
+ | ==[https://asciinema.org/a/FsZFGQQBRcRulln07btPWUR99 Watch Demo ] == | ||
+ | |||
+ | == Quality of Service and Limitations == | ||
+ | Users of '''CPU''' resources have zero access to the GPU resources, and are confined to CPU resources. | ||
+ | |||
+ | <code bash> | ||
+ | $ sacctmgr show qos format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins | ||
+ | |||
+ | Name Priority GrpTRES MaxTRES MaxTRESMins | ||
+ | ---------- ---------- ------------- ------------- ------------- | ||
+ | normal 0 | ||
+ | gpu_only 0 gres/gpu=2 | ||
+ | cpu_only 0 gres/gpu=0 | ||
+ | debug 50 gres/gpu=1 | ||
</code> | </code> | ||
− | + | '''gpu_only''' and '''normal''' are Slurm QOS parameters. | |
+ | |||
+ | Next: | ||
+ | [[Basic_Usage:_CPU_Based_Resources_With_Slurm|Basic_Usage:_CPU_Based_Resources_With_Slurm]] | ||
+ | |||
+ | Up: | ||
+ | [[ HPC_Usage| HPC_Usage]] |
Latest revision as of 14:44, 8 May 2025
Introduction
Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.
Slurm will provide you with the following to help you make use of the cluster;
- What resources are available on the cluster.
- Queuing and allocation of jobs based on specified resources.
- Job monitoring and status reporting.
These commands include:
$ sinfo : to view the cluster, resources and partition
$ squeue : view submitted job.
$ sbatch : submit a batch job.
$ sacct : for admins
$ scancel : to cancel your own job that has been submitted.
Together with these commands, a job submission script can be provided to slurm to set a jobs
parameters. Practical usage examples will be illustrated in the subsequent pages.
Watch Demo
Quality of Service and Limitations
Users of CPU resources have zero access to the GPU resources, and are confined to CPU resources.
$ sacctmgr show qos format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
Name Priority GrpTRES MaxTRES MaxTRESMins
---------- ---------- ------------- ------------- -------------
normal 0
gpu_only 0 gres/gpu=2
cpu_only 0 gres/gpu=0
debug 50 gres/gpu=1
gpu_only and normal are Slurm QOS parameters.
Next: Basic_Usage:_CPU_Based_Resources_With_Slurm
Up: HPC_Usage