Difference between revisions of "Intermediate Usage: PyTorch and Tensorflow"
(9 intermediate revisions by the same user not shown) | |||
Line 18: | Line 18: | ||
</code> | </code> | ||
+ | [[File:Conda_logo.svg.png|250px]] | ||
== Modules with Tensorflow and PyTorch == | == Modules with Tensorflow and PyTorch == | ||
− | This module that appear in the prior list has both TensorFlow and PyTorch installed: | + | This conda module that appear in the prior list has both TensorFlow and PyTorch installed: |
<code bash> | <code bash> | ||
applications/gpu/python/conda-25.1.1-python-3.9.21 | applications/gpu/python/conda-25.1.1-python-3.9.21 | ||
</code> | </code> | ||
− | == Loading The Python Module == | + | == Loading The Python (Conda) Module == |
We can Load the module using this Slurm command: | We can Load the module using this Slurm command: | ||
<code bash> | <code bash> | ||
Line 32: | Line 33: | ||
== Listing Conda Environments == | == Listing Conda Environments == | ||
The loaded module gives us access to a custom conda module, and we can now list the conda environments available | The loaded module gives us access to a custom conda module, and we can now list the conda environments available | ||
+ | |||
<code bash> | <code bash> | ||
$ conda env list | $ conda env list | ||
− | |||
− | |||
base /opt/ohpc/pub/conda/instdir | base /opt/ohpc/pub/conda/instdir | ||
python-3.9.21 /opt/ohpc/pub/conda/instdir/envs/python-3.9.21 | python-3.9.21 /opt/ohpc/pub/conda/instdir/envs/python-3.9.21 | ||
Line 42: | Line 42: | ||
machine learning frameworks, Tensorflow and PyTorch. | machine learning frameworks, Tensorflow and PyTorch. | ||
+ | [[File:Pytorch_logo.png|250px]] | ||
<code bash> | <code bash> | ||
$ conda activate python-3.9.21 | $ conda activate python-3.9.21 | ||
Line 50: | Line 51: | ||
MNIST example from PyTorch, run these shell commands to create the working directory and retreive the files: | MNIST example from PyTorch, run these shell commands to create the working directory and retreive the files: | ||
<code bash> | <code bash> | ||
− | $ mkdir ~/mnist # creating a working dir | + | $ mkdir -p ~/localscratch/mnist # creating a working dir |
− | $ cd ~/mnist | + | $ cd ~/localscratch/mnist # changing directory to the working dir |
$ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py | $ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py | ||
</code> | </code> | ||
− | And now we can place | + | And now we can place the python script in our submission script, place the following in a plain text file called torch.job: |
<code bash> | <code bash> | ||
− | + | #!/bin/bash | |
− | + | #SBATCH -J gputest # Job name | |
− | + | #SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId) | |
− | + | #SBATCH -e %j.err # Name of std err | |
− | + | #SBATCH --partition=gpu1 # Queue | |
− | + | #SBATCH --nodes=1 # Total number of nodes requested | |
− | + | #SBATCH --gres=gpu:1 # Total number of gpus requested | |
− | + | #SBATCH --cpus-per-task=1 # | |
− | + | #SBATCH --time=00:03:00 # Run time (hh:mm:ss) - 1.5 hours | |
− | + | cd ~/localscratch/mnist | |
− | + | module load applications/gpu/python/conda-25.1.1-python-3.9.21 | |
− | + | conda activate python-3.9.21 | |
− | + | python main.py | |
</code> | </code> | ||
Finally we can submit this script to Slurm, which will run the entire process for in the background. | Finally we can submit this script to Slurm, which will run the entire process for in the background. | ||
+ | |||
+ | <code bash> | ||
+ | $ sbatch torch.job | ||
+ | </code> | ||
+ | |||
+ | == [https://asciinema.org/a/m8HJLldFQk0SrrpOYOAIQQrYj Watch Demo] == | ||
Next: | Next: |
Latest revision as of 15:13, 8 May 2025
Contents
[hide]Modules For Machine Learning
The cluster has ready made python environments with conda, Tensorflow as well as PyTorch for machine learning users. The usage will be different from a jupyter notebook interface, since everything has to be run in the background. As a user, you will place all your training/inference/testing/IO code in a python script, which then will be added as a command in the shell script section of the slurm job submission file.
Listing available modules
To view all module available, we can use the Slurm command:
$ module av
----------------------------------------------------------------- /usr/share/modulefiles -------------------------------
mpi/openmpi-x86_64
----------------------------------------------------------------- /opt/ohpc/pub/modulefiles ------------------------------
applications/gpu/gromacs/2024.4 applications/gpu/python/conda-25.1.1-python-3.9.21 (D)
applications/gpu/python/base-3.9.21 applications/gpu/qespresso/7.3.1
---------------------------------------------------------- /usr/share/lmod/lmod/modulefiles/Core -------------------------
lmod settar
Modules with Tensorflow and PyTorch
This conda module that appear in the prior list has both TensorFlow and PyTorch installed:
applications/gpu/python/conda-25.1.1-python-3.9.21
Loading The Python (Conda) Module
We can Load the module using this Slurm command:
module load applications/gpu/python/conda-25.1.1-python-3.9.21
Listing Conda Environments
The loaded module gives us access to a custom conda module, and we can now list the conda environments available
$ conda env list
base /opt/ohpc/pub/conda/instdir
python-3.9.21 /opt/ohpc/pub/conda/instdir/envs/python-3.9.21
we can safely ignore the base environment, and make use of the *python-3.9.21* conda environment, this has the two
machine learning frameworks, Tensorflow and PyTorch.
$ conda activate python-3.9.21
(python-3.9.21)$
This is what we will have in the Slurm submission script.
Lets now create the python code that will run a simple machine learning exercise, with PyTorch. We will use the
MNIST example from PyTorch, run these shell commands to create the working directory and retreive the files:
$ mkdir -p ~/localscratch/mnist # creating a working dir
$ cd ~/localscratch/mnist # changing directory to the working dir
$ wget https://raw.githubusercontent.com/pytorch/examples/refs/heads/main/mnist/main.py
And now we can place the python script in our submission script, place the following in a plain text file called torch.job:
#!/bin/bash
#SBATCH -J gputest # Job name
#SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH -e %j.err # Name of std err
#SBATCH --partition=gpu1 # Queue
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --gres=gpu:1 # Total number of gpus requested
#SBATCH --cpus-per-task=1 #
#SBATCH --time=00:03:00 # Run time (hh:mm:ss) - 1.5 hours
cd ~/localscratch/mnist
module load applications/gpu/python/conda-25.1.1-python-3.9.21
conda activate python-3.9.21
python main.py
Finally we can submit this script to Slurm, which will run the entire process for in the background.
$ sbatch torch.job
Watch Demo
Next: Module_system
Up: HPC_Usage