SLURM

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

The HPC team has the most comprehensive resource for Dalma available. We will go through some of the basic commands here.

Submit jobs with – sbatch

Create a file in your training folder.

#!/bin/bash
# job.1.sh
#SBATCH -p serial
#SBATCH --job-name=job1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH -o job.%J.out
#SBATCH -e job.%J.err

#Load your modules
module load gencore/1

# Commands
touch example
ls -lah
tar --remove-files -cvf example.tar example
ls -lah
sleep 5

Submit this job with sbatch.

[gencore@login-0-2 ~]$ :sbatch job.1.sh 
Submitted batch job MYJOBID

Take note of the job id, because we will add this job as a dependency for the next job.

Investigate jobs with – squeue

squeue -u $USER

Submit a job with dependencies

#!/bin/bash
# job.2.sh
#SBATCH -p serial
#SBATCH --job-name=job2
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH -o job.%J.out
#SBATCH -e job.%J.err

#Load your modules
module load gencore/1

# Commands
touch example
ls -lah
tar -xvf example.tar
ls -lah
sleep 5
sbatch job.2.sh --dependency=afterok:MYJOBID #Substitute in your job id

If you submitted your jobs within ~5 minutes of eachother, you should see something like this when running squeue.

JOBID   PARTITION NAME  USER  ST       TIME  NODES NODELIST(REASON)
          55125   ser_std job2  gencore PD       0:00      1 (Dependency)
          55124   ser_std job1  gencore PD       0:04      1 compute-7-18