Difference between revisions of "Basic Usage: CPU Based Resources With Slurm"

From KENET Training
Jump to: navigation, search
Line 11: Line 11:
 
You can obtain information on the Slurm  "Partitions" that accept jobs using the sinfo command:  
 
You can obtain information on the Slurm  "Partitions" that accept jobs using the sinfo command:  
  
 +
<code bash>
 
$ sinfo
 
$ sinfo
 
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
test        up   infinite     1  idle gnt-usiu-gpu-00.kenet.or.ke
+
test        up       1:00     1  idle gnt-usiu-gpu-00.kenet.or.ke
gpu1        up   infinite     1  idle gnt-usiu-gpu-00.kenet.or.ke
+
gpu1        up 1-00:00:00     1  idle gnt-usiu-gpu-00.kenet.or.ke
normal*      up   infinite     1  idle gnt-usiu-gpu-00.kenet.or.ke
+
normal*      up 1-00:00:00     1  idle gnt-usiu-gpu-00.kenet.or.ke
 +
</code>
 +
 
 +
The test partition is reserved for testing, with a very short time limit. The normal partition is to be used for CPU only jobs,
 +
and the gpu1 queue is reserved for GPU jobs. Both production partitions have a time limit of 24 hours at a time for individual
 +
jobs.
 +
 
 +
== Showing The Queue ==
 +
The squeue slurm command will list all submitted jobs, and will give you an indication of how busy the cluster is, as well as the status of all running or waiting jobs. Jobs that are complete will exit the queue and will not be in this list.
 +
<code bash>
 +
$ squeue
 +
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 +
                63    normal    gpu1  jotuya  R      0:03      1 gnt-usiu-gpu-00.kenet.or.ke
 +
$
 +
</code>
 +
 
 +
== Submitting Your first Job ==

Revision as of 13:31, 1 April 2025

Introduction

Slurm [1] is a workload manager for clusters, offering both batch and interactive job scheduling. It works over a text based interface on the linux terminal.

Slurm will provide you with the following to help you make use of the cluster;

  1. What resources are available on the cluster.
  2. Queuing and allocation of jobs based on specified resources.
  3. Job monitoring and status reporting.

Simple commands with SLURM

You can obtain information on the Slurm "Partitions" that accept jobs using the sinfo command:

$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up 1:00 1 idle gnt-usiu-gpu-00.kenet.or.ke gpu1 up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke normal* up 1-00:00:00 1 idle gnt-usiu-gpu-00.kenet.or.ke

The test partition is reserved for testing, with a very short time limit. The normal partition is to be used for CPU only jobs, and the gpu1 queue is reserved for GPU jobs. Both production partitions have a time limit of 24 hours at a time for individual jobs.

Showing The Queue

The squeue slurm command will list all submitted jobs, and will give you an indication of how busy the cluster is, as well as the status of all running or waiting jobs. Jobs that are complete will exit the queue and will not be in this list. $ squeue

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               63    normal     gpu1   jotuya  R       0:03      1 gnt-usiu-gpu-00.kenet.or.ke

$

Submitting Your first Job