Slurm

From KENET Training
Revision as of 14:39, 8 May 2025 by Atambo (talk | contribs)
Jump to: navigation, search

Slurm logo.svg.png

Introduction

Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.

Slurm will provide you with the following to help you make use of the cluster;

  1. What resources are available on the cluster.
  2. Queuing and allocation of jobs based on specified resources.
  3. Job monitoring and status reporting.

These commands include:

 $ sinfo : to view the cluster, resources and partition
 $ squeue : view submitted job.
 $ sbatch : submit a batch job.
 $ sacct : for admins
 $ scancel :  to cancel your own job that has been submitted.

Together with these commands, a job submission script can be provided to slurm to set a jobs parameters. Practical usage examples will be illustrated in the subsequent pages.


Watch Template:Font-size Template:Huge

Quality of Service and Limitations

Users of CPU resources have zero access to the GPU resources, and are confined to CPU resources.

$ sacctmgr show qos  format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
     Name          Priority       GrpTRES       MaxTRES   MaxTRESMins 
    ---------- ---------- ------------- ------------- ------------- 
      normal           0                                           
     gpu_only          0                       gres/gpu=2               
     cpu_only          0                       gres/gpu=0 

gpu_only and cpu_only are Slurm partitions (partitions are to Slurm what queues are to PBS torque)

Next: Basic_Usage:_CPU_Based_Resources_With_Slurm

Up: HPC_Usage