Snippets Groups Projects

README.md

Jannis Klinkenberg authored 4 months ago

158fe8bd

158fe8bd 4 months ago

README.md 4.00 KiB

Generic Slurm Job Script Examples

This folder contains common job script examples and best practices.

1. Asychronous jobs

The following table illustrates examples for asynchronous jobs that contain both:

The allocation requests for your job, e.g. in form of #SBATCH flags in your batch script
The actual task or instructions that your job needs to perform.

You can submit such jobs to the Slurm batch system via sbatch <parameters> <script-name> (detailed documentation here). Typically, these jobs are then queued and scheduled by the workload manager as soon as the desired resources are free and it is your turn to compute. (Remember: many people might want to use those hardware resources. So, Slurm needs to find a fair compromise.)

File/Folder	Description
beeond_job.sh	Job script for setting up and using BeeOND (BeeGFS On Demand) in an HPC environment.
gpu_job_1gpu.sh	Runs a job with 1 GPU and a single process.
gpu_job_2gpus-1proc.sh	Runs a job with 2 GPUs and a single process. Useful for tasks that require multi-GPU acceleration but not multi-processing.
gpu_job_2gpus-2procs.sh	Runs a job with 2 GPUs and and 2 separate processes. Commonly used for parallel deep learning training.
gpu_job_4gpus-4procs.sh	Runs a job with 4 GPUs and and 4 separate processes (full node with 4x H100). Commonly used for parallel deep learning training.
gpu_job_8gpus-8procs.sh	Runs a job with 8 GPUs and and 8 separate processes (2 full nodes with 4x H100). Commonly used for parallel deep learning training.
hybrid_mpi_openmp_job.sh	Hybrid job script combining MPI (distributed computing) with OpenMP (shared-memory parallelism). Ideal for hybrid HPC workloads.
mpi_job_basic.sh	A basic MPI job script, useful for testing and learning MPI-based job submission.
mpi_job_1node.sh	Runs an MPI job on a single node, demonstrating intra-node parallel processing with multiple processes per node.
mpi_job_2nodes.sh	Runs an MPI job spanning 2 full compute nodes, demonstrating inter-node parallelism and distributed computing across multiple machines.

2. Interactive jobs

Sometimes, you are still in the testing/debugging phase or do not yet completely know how your job script instruction should correctly look like. In such cases, an interactive job might be what you want.

An interactive job allows users to run commands in real-time on an HPC cluster, making it useful for debugging, testing scripts, or exploring data interactively. Unlike asynchronous batch jobs, which are submitted to a queue and executed without user interaction, interactive jobs provide immediate feedback and enable on-the-fly adjustments. This is especially valuable when developing or fine-tuning workflows before submitting long-running batch jobs.

In such a case, you only define your resource requirements and boundary conditions with salloc (detailed documentation here). After the job has been scheduled by Slurm, the system will provide a regular shell for interactive work. Here are a few examples:

Example: Interactive job on CPU resources for OpenMP (full node)

salloc --time=00:15:00 --nodes=1 --ntasks-per-node=1 --cpus-per-task=96 --partition=c23ms

Example: Interactive job on CPU resources for MPI (2 full nodes)

salloc --time=00:15:00 --nodes=2 --ntasks-per-node=96 --partition=c23ms

Example: Interactive job on CPU resources for hybrid MPI+OpenMP (2 full nodes)

salloc --time=00:15:00 --nodes=2 --ntasks-per-node=4 --cpus-per-task=24 --partition=c23ms

Example: Interactive job on GPU resources (using 1 GPU)

salloc --time=00:15:00 --nodes=1 --ntasks-per-node=1 --cpus-per-task=24 --gres=gpu:1 --partition=c23g