Difference between revisions of "Debuging and Interactive Slurm Jobs"

From KENET Training
Jump to: navigation, search
Line 27: Line 27:
  
 
Next:
 
Next:
[[Module_system| Module_system]]
+
[[Advanced_Usage| Advanced_Usage]]
  
 
Up:
 
Up:
 
[[ HPC_Usage| HPC_Usage]]
 
[[ HPC_Usage| HPC_Usage]]

Revision as of 20:35, 8 May 2025

Interactive Jobs, and Testing

It is sometimes useful to test out some commands, prototype code or debug before submitting production jobs to the cluster, therefore slurm provides a different way of interacting with the compute nodes, this is through interactive jobs that allow you to run a terminal on the compute node when the job starts running.

Submitting Interactive Jobs

An interactive job can be submitting with the following srun command from slurm, rather than the usua sbatch command:

$ srun  --time=00:30:00  --gres=gpu:1 --partition=gpu1 --pty /bin/bash -i

this command with block the terminal until the job starts execution, so you will need to wait for this if the queued jobs are ahead of your interactive job. Once it starts running, you can interact with your code from the terminal as usual, you will notice you are not logged into the login node, but into a compute node

$ module av
$ module load applications/gpu/python/conda-25.1.1-python-3.9.21
$  conda env list
base                   /opt/ohpc/pub/conda/instdir
python-3.9.21          /opt/ohpc/pub/conda/instdir/envs/python-3.9.21
$ conda activate python-3.9.21
$ mkdir mnist
$ cd mnist
$ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py
$ python main.py
...

Watch Interactive Jobs Demo

Next: Advanced_Usage

Up: HPC_Usage