Difference between revisions of "Slurm"
From KENET Training
								
												
				| Line 21: | Line 21: | ||
parameters. Practical usage examples will be illustrated in the subsequent pages.  | parameters. Practical usage examples will be illustrated in the subsequent pages.  | ||
| − | + | [[https://asciinema.org/a/FsZFGQQBRcRulln07btPWUR99| Watch Demo  ]]  | |
== Quality of Service and Limitations ==    | == Quality of Service and Limitations ==    | ||
Revision as of 12:45, 8 May 2025
Introduction
Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.
Slurm will provide you with the following to help you make use of the cluster;
- What resources are available on the cluster.
 - Queuing and allocation of jobs based on specified resources.
 - Job monitoring and status reporting.
 
These commands include:
$ sinfo : to view the cluster, resources and partition $ squeue : view submitted job. $ sbatch : submit a batch job. $ sacct : for admins $ scancel : to cancel your own job that has been submitted.
Together with these commands, a job submission script can be provided to slurm to set a jobs parameters. Practical usage examples will be illustrated in the subsequent pages.
[Watch Demo ]
Quality of Service and Limitations
Users of CPU resources have zero access to the GPU resources, and are confined to CPU resources.
$ sacctmgr show qos format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
     Name          Priority       GrpTRES       MaxTRES   MaxTRESMins 
    ---------- ---------- ------------- ------------- ------------- 
      normal           0                                           
     gpu_only          0                       gres/gpu=2               
     cpu_only          0                       gres/gpu=0 
gpu_only and cpu_only are Slurm partitions (partitions are to Slurm what queues are to PBS torque)
Next: Basic_Usage:_CPU_Based_Resources_With_Slurm
Up: HPC_Usage