Difference between revisions of "Slurm"

From KENET Training
Jump to: navigation, search
(Created page with "=== Introduction === Slurm [https://slurm.schedmd.com/documentation.html] is a workload manager for clusters, offering both batch and interactive job scheduling. It works ove...")
 
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
[[File:Slurm_logo.svg.png|200px]]
 +
 
=== Introduction ===
 
=== Introduction ===
 
Slurm [https://slurm.schedmd.com/documentation.html] is a workload manager for clusters, offering both batch and interactive job scheduling.  
 
Slurm [https://slurm.schedmd.com/documentation.html] is a workload manager for clusters, offering both batch and interactive job scheduling.  
Line 7: Line 9:
 
# Queuing and allocation of jobs based on specified resources.
 
# Queuing and allocation of jobs based on specified resources.
 
# Job monitoring and status reporting.
 
# Job monitoring and status reporting.
 +
 +
These commands include:
 +
<code bash>
 +
  $ sinfo : to view the cluster, resources and partition
 +
  $ squeue : view submitted job.
 +
  $ sbatch : submit a batch job.
 +
  $ sacct : for admins
 +
  $ scancel :  to cancel your own job that has been submitted.
 +
</code>
 +
Together with these commands, a job submission script can be provided to slurm to set a jobs
 +
parameters. Practical usage examples will be illustrated in the subsequent pages.
 +
 +
 +
==[https://asciinema.org/a/FsZFGQQBRcRulln07btPWUR99  Watch Demo ] ==
 +
 +
== Quality of Service and Limitations ==
 +
Users of  '''CPU''' resources have zero access to the GPU resources, and are confined to CPU resources.
 +
 +
<code bash>
 +
$ sacctmgr show qos  format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
 +
 +
      Name          Priority      GrpTRES      MaxTRES  MaxTRESMins
 +
    ---------- ---------- ------------- ------------- -------------
 +
      normal            0                                         
 +
      gpu_only          0                      gres/gpu=2             
 +
      cpu_only          0                      gres/gpu=0             
 +
      debug            50                      gres/gpu=1
 +
</code>
 +
'''gpu_only''' and '''normal''' are Slurm QOS parameters.
 +
 +
Next:
 +
[[Basic_Usage:_CPU_Based_Resources_With_Slurm|Basic_Usage:_CPU_Based_Resources_With_Slurm]]
 +
 +
Up:
 +
[[ HPC_Usage| HPC_Usage]]

Latest revision as of 14:44, 8 May 2025

Slurm logo.svg.png

Introduction

Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.

Slurm will provide you with the following to help you make use of the cluster;

  1. What resources are available on the cluster.
  2. Queuing and allocation of jobs based on specified resources.
  3. Job monitoring and status reporting.

These commands include:

 $ sinfo : to view the cluster, resources and partition
 $ squeue : view submitted job.
 $ sbatch : submit a batch job.
 $ sacct : for admins
 $ scancel :  to cancel your own job that has been submitted.

Together with these commands, a job submission script can be provided to slurm to set a jobs parameters. Practical usage examples will be illustrated in the subsequent pages.


Watch Demo

Quality of Service and Limitations

Users of CPU resources have zero access to the GPU resources, and are confined to CPU resources.

$ sacctmgr show qos  format=Name,Priority,GrpTRES,MaxTRES,MaxTRESMins
     Name          Priority       GrpTRES       MaxTRES   MaxTRESMins 
    ---------- ---------- ------------- ------------- ------------- 
     normal            0                                           
     gpu_only          0                      gres/gpu=2               
     cpu_only          0                      gres/gpu=0               
     debug            50                      gres/gpu=1 

gpu_only and normal are Slurm QOS parameters.

Next: Basic_Usage:_CPU_Based_Resources_With_Slurm

Up: HPC_Usage