Difference between revisions of "Intermediate Usage: PyTorch and Tensorflow"

From KENET Training
Jump to: navigation, search
(Created page with "=== Modules For Machine Learning === The cluster has ready made python environments with conda, Tensorflow as well as PyTorch for machine learning users. The usage will be dif...")
 
 
(11 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
<code bash>
 
<code bash>
 
   $ module av
 
   $ module av
 
+
  ----------------------------------------------------------------- /usr/share/modulefiles -------------------------------
----------------------------------------------------------------------- /usr/share/modulefiles -------------------------------
 
 
   mpi/openmpi-x86_64
 
   mpi/openmpi-x86_64
 
+
  ----------------------------------------------------------------- /opt/ohpc/pub/modulefiles ------------------------------
---------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ------------------------------
 
 
   applications/gpu/gromacs/2024.4        applications/gpu/python/conda-25.1.1-python-3.9.21 (D)
 
   applications/gpu/gromacs/2024.4        applications/gpu/python/conda-25.1.1-python-3.9.21 (D)
 
   applications/gpu/python/base-3.9.21    applications/gpu/qespresso/7.3.1
 
   applications/gpu/python/base-3.9.21    applications/gpu/qespresso/7.3.1
 
+
  ---------------------------------------------------------- /usr/share/lmod/lmod/modulefiles/Core -------------------------
---------------------------------------------------------------- /usr/share/lmod/lmod/modulefiles/Core -------------------------
 
 
   lmod    settar
 
   lmod    settar
 
</code>
 
</code>
  
 +
[[File:Conda_logo.svg.png|250px]]
 
== Modules with Tensorflow and PyTorch ==
 
== Modules with Tensorflow and PyTorch ==
This  module that appear in the prior list has both TensorFlow and PyTorch installed:
+
This conda module that appear in the prior list has both TensorFlow and PyTorch installed:
 
<code bash>
 
<code bash>
 
   applications/gpu/python/conda-25.1.1-python-3.9.21
 
   applications/gpu/python/conda-25.1.1-python-3.9.21
 
</code>
 
</code>
  
== Loading The Python Module ==
+
== Loading The Python (Conda) Module ==
 
We can Load the module using this Slurm command:  
 
We can Load the module using this Slurm command:  
 
<code bash>
 
<code bash>
Line 35: Line 33:
 
== Listing Conda Environments ==
 
== Listing Conda Environments ==
 
The loaded module gives us access to a custom conda module, and we can now list the conda environments available
 
The loaded module gives us access to a custom conda module, and we can now list the conda environments available
 +
 
<code bash>
 
<code bash>
 
$ conda env list
 
$ conda env list
 
  # conda environments:
 
  #
 
 
   base                  /opt/ohpc/pub/conda/instdir
 
   base                  /opt/ohpc/pub/conda/instdir
 
   python-3.9.21          /opt/ohpc/pub/conda/instdir/envs/python-3.9.21
 
   python-3.9.21          /opt/ohpc/pub/conda/instdir/envs/python-3.9.21
Line 46: Line 42:
 
machine learning frameworks, Tensorflow and PyTorch.
 
machine learning frameworks, Tensorflow and PyTorch.
  
 +
[[File:Pytorch_logo.png|250px]]
 
<code bash>
 
<code bash>
 
   $ conda activate python-3.9.21
 
   $ conda activate python-3.9.21
Line 54: Line 51:
 
MNIST example from PyTorch, run these shell commands to create the working directory and retreive the files:
 
MNIST example from PyTorch, run these shell commands to create the working directory and retreive the files:
 
<code bash>
 
<code bash>
   $ mkdir ~/mnist    # creating a working dir
+
   $ mkdir -p ~/localscratch/mnist    # creating a working dir
   $ cd  ~/mnist     # changing directory to the working dir
+
   $ cd  ~/localscratch/mnist       # changing directory to the working dir
 
   $ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py
 
   $ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py
 
</code>
 
</code>
And now we can place it in our submission script:
+
And now we can place the python script in our submission script, place the following in a plain text file called torch.job:
 
<code bash>
 
<code bash>
  #!/bin/bash
+
#!/bin/bash
 +
#SBATCH -J  gputest              # Job name
 +
#SBATCH -o job.%j.out        # Name of stdout output file (%j expands to jobId)
 +
#SBATCH -e %j.err            # Name of std err
 +
#SBATCH --partition=gpu1    # Queue
 +
#SBATCH --nodes=1            # Total number of nodes requested
 +
#SBATCH --gres=gpu:1            # Total number of gpus requested
 +
#SBATCH --cpus-per-task=1    #
 +
#SBATCH --time=00:03:00        # Run time (hh:mm:ss) - 1.5 hours
 +
 
 +
cd ~/localscratch/mnist
 +
module load applications/gpu/python/conda-25.1.1-python-3.9.21
 +
conda activate python-3.9.21 
 +
python  main.py
 +
</code>
 +
Finally we can submit this script to Slurm, which will run the entire process for in the background.
  
  #SBATCH -J  gputest              # Job name
+
<code bash>
   #SBATCH -o job.%j.out        # Name of stdout output file (%j expands to jobId)
+
   $ sbatch torch.job
  #SBATCH -e %j.err            # Name of std err
 
  #SBATCH --partition=gpu1    # Queue
 
  #SBATCH --nodes=1            # Total number of nodes requested
 
  #SBATCH --gres=gpu:1            # Total number of gpus requested
 
  #SBATCH --cpus-per-task=1    #
 
  #SBATCH --time=00:03:00        # Run time (hh:mm:ss) - 1.5 hours
 
 
 
  cd ~/mnist
 
  module load applications/gpu/python/conda-25.1.1-python-3.9.21
 
  conda activate python-3.9.21 
 
  python  main.py
 
 
</code>
 
</code>
Finally we can submit this script to Slurm, which will run the entire process for in the background.
+
 
 +
== [https://asciinema.org/a/m8HJLldFQk0SrrpOYOAIQQrYj Watch Demo] ==
 +
 
 +
Next:
 +
[[Module_system|Module_system]]
 +
 
 +
Up:
 +
[[ HPC_Usage| HPC_Usage]]

Latest revision as of 15:13, 8 May 2025

Modules For Machine Learning

The cluster has ready made python environments with conda, Tensorflow as well as PyTorch for machine learning users. The usage will be different from a jupyter notebook interface, since everything has to be run in the background. As a user, you will place all your training/inference/testing/IO code in a python script, which then will be added as a command in the shell script section of the slurm job submission file.

Listing available modules

To view all module available, we can use the Slurm command:

 $ module av
  ----------------------------------------------------------------- /usr/share/modulefiles -------------------------------
  mpi/openmpi-x86_64
  ----------------------------------------------------------------- /opt/ohpc/pub/modulefiles ------------------------------
  applications/gpu/gromacs/2024.4        applications/gpu/python/conda-25.1.1-python-3.9.21 (D)
  applications/gpu/python/base-3.9.21    applications/gpu/qespresso/7.3.1
  ---------------------------------------------------------- /usr/share/lmod/lmod/modulefiles/Core -------------------------
  lmod    settar

Conda logo.svg.png

Modules with Tensorflow and PyTorch

This conda module that appear in the prior list has both TensorFlow and PyTorch installed:

 applications/gpu/python/conda-25.1.1-python-3.9.21

Loading The Python (Conda) Module

We can Load the module using this Slurm command:

 module load applications/gpu/python/conda-25.1.1-python-3.9.21

Listing Conda Environments

The loaded module gives us access to a custom conda module, and we can now list the conda environments available

$ conda env list

 base                   /opt/ohpc/pub/conda/instdir
 python-3.9.21          /opt/ohpc/pub/conda/instdir/envs/python-3.9.21

we can safely ignore the base environment, and make use of the *python-3.9.21* conda environment, this has the two machine learning frameworks, Tensorflow and PyTorch.

Pytorch logo.png

 $ conda activate python-3.9.21
 (python-3.9.21)$

This is what we will have in the Slurm submission script. Lets now create the python code that will run a simple machine learning exercise, with PyTorch. We will use the MNIST example from PyTorch, run these shell commands to create the working directory and retreive the files:

 $ mkdir -p ~/localscratch/mnist    # creating a working dir
 $ cd  ~/localscratch/mnist       # changing directory to the working dir
 $ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py

And now we can place the python script in our submission script, place the following in a plain text file called torch.job:

#!/bin/bash
#SBATCH -J  gputest               # Job name
#SBATCH -o job.%j.out         # Name of stdout output file (%j expands to jobId)
#SBATCH -e %j.err             # Name of std err
#SBATCH --partition=gpu1    # Queue
#SBATCH --nodes=1             # Total number of nodes requested
#SBATCH --gres=gpu:1             # Total number of gpus requested
#SBATCH --cpus-per-task=1     # 
#SBATCH --time=00:03:00        # Run time (hh:mm:ss) - 1.5 hours
  
cd ~/localscratch/mnist 
module load applications/gpu/python/conda-25.1.1-python-3.9.21
conda activate python-3.9.21  
python  main.py

Finally we can submit this script to Slurm, which will run the entire process for in the background.

 $ sbatch torch.job

Watch Demo

Next: Module_system

Up: HPC_Usage