Using the Slurm Job Scheduler

 


Introduction
What is Slurm?
Terminology
Key Functions of Slurm
Partitions on BioHPC (Nucleus)
Slurm Job States

How-To Guides
Basic Slurm Commands
Slurm Job Script Templates

FAQs
Using a Conda environment in an sbatch script
Using NAMD in an sbatch script
Using Gromacs in an sbatch script
Maximum node allocation for a single user
Extending the runtime of a job

Troubleshooting

Downloads

Further Reading

 


Introduction

 

What is Slurm?

Slurm, formerly known as SLURM (Simple Linux Utility for Resource Management), is a powerful computational workload scheduler used on many of the world's largest supercomputers. Its main function is to allocate computing resources to workloads submitted to a compute cluster, like BioHPC's Nucleus. When workloads outnumber the available resources, Slurm manages a fair-share resource allocation system, placing workloads in queues until they can be executed.

Think of it like...

  • A Laboratory Resource Manager
    • Similar to how a core facility in a biology lab oversees shared equipment like microscopes and sequencers, Slurm manages shared computer resources within a high-performance computing (HPC) cluster.
  • A Traffic Control System
    • Slurm functions like a traffic control system, effectively managing the flow of computational tasks (vehicles) on the HPC cluster (road network) to ensure optimal resource utilization and timely completion of research projects.

For more detailed information about Slurm and its latest documentation, please refer to the official Slurm website.

 

Terminology

Job - A unit of work submitted to Slurm, consisting of one or more tasks.

Task - A single executable program within a job.

Partition - A logical division of the cluster with its own resource limits and scheduling policies. Also known as a job queue.

Reservation - A block of computing resources pre-allocated for specific jobs or users at a specified time.

Node - A single computer within a cluster, containing a motherboard, CPU, RAM, and possibly a GPU.

Compute Node - Nodes within the Slurm cluster that run the jobs. Each compute node contains two sockets.

Socket - A physical slot on a node where a physical CPU is installed. Each physical CPU contains cores with direct access to the node's RAM.

Core - A single physical processor unit within the CPU capable of performing computations. Each physical core comprises two logical cores for processing two concurrent threads simultaneously.

Thread - A sequence of computer instructions that can be processed independently by a single logical core.

 

Key Functions of Slurm

Job Handling:

  • Provides tools for users to submit jobs with specified resource requirements.
  • Evaluates, prioritizes, schedules, and executes jobs.
  • Allows users to monitor and manage their jobs.

Resource Management:

  • Allocates cluster resources (CPUs, memory, etc.) to jobs.
  • Manages partitions.
  • Tracks resource usage for accounting and reporting.

Typical Slurm Workflow:

  1. User Actions:
    • Log into Nucleus.
    • Create a job script defining resource needs and tasks.
    • Submit the job to Slurm.
    • Monitor and/or interact with the job.
  2. Slurm Actions:
    • Receive the user's job.
    • Match job requirements to available resources.
    • Schedule the job for execution.
    • Send the job to compute nodes.
    • Track the job and its resource usage information.

 

Partitions on BioHPC (Nucleus)

Slurm partitions are separate collections of nodes. BioHPC has a total of 500 compute nodes as of 2018, of which 494 are addressable by Slurm. These nodes are classified into 9 partitions based on their hardware and memory capacity, with some partitions overlapping nodes with others. These partitions are as follows:

Partition Nodes Node List CPU Physical (Logical) Cores Memory Capacity (GB) GPU
32GB 280 NucleusA[002-241],NucleusB[002-041] Intel E5-2680 16 (32) 32 N/A
128GB 24 Nucleus[010-033] Intel E5-2670 16 (32) 128 N/A
256GB 78 Nucleus[034-041, 050-081, 084-121] Intel E5-2680v3 24 (48) 256 N/A
256GBv1 48 Nucleus[126-157,174-189] Intel E5-2680v4 28 (56) 256 N/A
384GB 2 Nucleus[082-083] Intel E5-2670 16 (32) 384 N/A
GPU 40 Nucleus[042-049],NucleusC[002-033] various various various Tesla K20/K40/P4/P40
GPUp4 16 NucleusC[002-017] Intel Gold 6140 36 (72) 384 Tesla P4
GPUp40 16 NucleusC[018-033] Intel Gold 6140 36 (72) 384 Tesla P40
GPUp100 12 Nucleus[162-173] Intel E5-2680v4 28 (56) 256 Tesla P100 (2X)
GPUv100 2 NucleusC[034-035] Intel Gold 6140 36 (72) 384 Tesla V100 16GB (2x)
GPUv100s 10 NucleusC[036-045] Intel Gold 6140 36 (72) 384 Tesla V100 32GB (1x)
GPU4v100 12 NucleusC[070-081] Intel Gold 6240 36(72) 376 Tesla V100 32GB (4x)
GPUA100 16 NucleusC[086-101] Intel Gold 6240 36(72) 1423 Tesla A100 40GB (1x)
GPU4A100 10 NucleusC[102-111] Intel Gold 6354 36(72) 977 Tesla A100 80GB (4x)
PHG 8 Nucleus[122-125, 158-161] Intel E5-2680v3 24 (48) 256 N/A
super 432 All non-GPU and non-PHG nodes various various various N/A

 

If a partition is not explicitly specified upon job submission, Slurm will allocate your job to the 128GB partition by default. The PHG partition is only available for the PHG group.

 

Figure 1: High-level view of a sample node's CPU resources

 

Slurm Partition Example

 

Figure 1 shows a high-level view of the computational resources available on a node similar to one of Nucleus' 128GB partition nodes.

 

We see that the node contains:

  • Two sockets, each containing an eight-core CPU, and
  • Each core can run two concurrent threads.

 

So the maximum concurrent threads the node can process simultaneously at any one time is:

  • 2 sockets * 8 physical cores * 2 threads = 32 threads

 

 


Slurm Job States

Figure 2: Common transitions between a few typical Slurm job states (Blue)

When a job is submitted to Slurm, will be assigned a job state to describe the current status of the job, which will provide insights into the job's progress, resource usage, and potential issues that may require intervention.

Figure 2 outlines several common job states that a user will encounter, connected together with some typical transitions between the states.

Below are the descriptions for each discussed job state, along with its shorthand code in parenthesis - STATE (ST) :

PENDING (PD): The job is waiting in the queue to be scheduled and run. It is waiting for resources to become available or for other conditions to be met.

CONFIGURING (CF): The job has been allocated nodes and is in the process of setting up before execution begins.

RUNNING (R): The job is currently executing on the allocated nodes.

RESIZING (RZ): The job is about to change resource size by either increasing or decreasing the number of nodes, CPUs, or memory that the job is using. If the job is requesting new resources, it might be sent back to the PENDING state if the requested resources are unavailable.

SUSPENDED (S): The job has been temporarily stopped and is not currently executing, but it retains its allocation of nodes and can be resumed later.

COMPLETING (CG): The job is in the process of completing. Some processes may still be running, but Slurm is in the process of cleaning up and finalizing the job.

COMPLETED (CD): The job has completed successfully, and all processes have exited with an exit code of zero.

CANCELLED (CA): The job was cancelled by the user or system administrator before it could complete.

FAILED (F): The job terminated with a non-zero exit code, indicating that it did not complete successfully.

TIMEOUT (TO): The job was terminated because it exceeded the time limit specified for the job or partition.
 

For more details on these listed job states and others, visit Slurm's documentation.

 


 

How-To Guides

 

Basic Slurm Commands

For more information on each command shown below, type man <command> when logged into Nucleus.


sinfo: Display information about Slurm nodes and partitions.

sinfo Example
[s219741@Nucleus005 ~]$ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    super        up   infinite      4 drain* Nucleus[013-014,051],NucleusA045
    super        up   infinite     15   drng NucleusA[004-008,011-012,017,019-025]
    super        up   infinite     25  drain NucleusA[003,009-010,013-016,018,026-041,046]
    super        up   infinite    271  alloc Nucleus[010-012,016-017,024-041,050,052-053,055-075,077-082,099-112,114-115,119-148,158-161,174-177,184-194,196],NucleusA[043-044,047-092,132-197,209,236],NucleusB[006-041]
    super        up   infinite    101   idle Nucleus[015,018-023,054,076,083-098,113,116-118,149-157,178-183,195],NucleusA[042,093-131],NucleusB[042-057]
    256GB        up   infinite      1 drain* Nucleus051
    256GB        up   infinite     57  alloc Nucleus[034-041,050,052-053,055-071,073-075,077,080-081,099-110,114-115,120,122-125,158-161]
    256GB        up   infinite     21   idle Nucleus[054,076,084-098,113,116-118]
    384GB        up   infinite      1  alloc Nucleus082
    384GB        up   infinite     17   idle Nucleus083,NucleusB[042-057]
    128GB*       up   infinite      2 drain* Nucleus[013-014]
    128GB*       up   infinite     15  alloc Nucleus[010-012,016-017,024-033]
    128GB*       up   infinite      7   idle Nucleus[015,018-023]
    256GBv1      up   infinite     39  alloc Nucleus[126-148,174-177,184-194,196]
    256GBv1      up   infinite     16   idle Nucleus[149-157,178-183,195]
    GPU          up   infinite     50  alloc Nucleus[042-043,045],NucleusC[002-003,005-008,016-017,019,022-026,028-031,036-046,048-057,059-066]
    GPU          up   infinite     21   idle Nucleus[044,046-049],NucleusC[009-015,018,020-021,027,032-033,067-069]
    32GB         up   infinite      1 drain* NucleusA045
    32GB         up   infinite     15   drng NucleusA[004-008,011-012,017,019-025]
    32GB         up   infinite     25  drain NucleusA[003,009-010,013-016,018,026-041,046]
    32GB         up   infinite    157  alloc Nucleus[072,078-079,111-112,119,121],NucleusA[043-044,047-092,132-197],NucleusB[006-041]
    32GB         up   infinite     40   idle NucleusA[042,093-131]
    GPUp4        up   infinite      8  alloc NucleusC[002-003,005-008,016-017]
    GPUp4        up   infinite      7   idle NucleusC[009-015]
    GPUp40       up   infinite     10  alloc NucleusC[019,022-026,028-031]
    GPUp40       up   infinite      6   idle NucleusC[018,020-021,027,032-033]
    GPUp100      up   infinite      1 drain* Nucleus166
    GPUp100      up   infinite      2  drain Nucleus[171-172]
    GPUp100      up   infinite      5  alloc Nucleus[162,167-170]
    GPUp100      up   infinite      3   idle Nucleus[163-165]
    GPUv100s     up   infinite     30  alloc NucleusC[036-057,059-066]
    GPUv100s     up   infinite      3   idle NucleusC[067-069]
    GPU4v100     up   infinite     10  alloc NucleusC[070-075,077-080]
    GPU4v100     up   infinite      1   idle NucleusC076
    GPUA100      up   infinite      1  drain NucleusC096
    GPUA100      up   infinite     15  alloc NucleusC[086-095,097-101]
    GPU4A100     up   infinite     10  alloc NucleusC[102-111]
    512GB        up   infinite      1  down* NucleusB079
    512GB        up   infinite      1  drain NucleusB107
    512GB        up   infinite     80  alloc NucleusA[198-241],NucleusB[002-005,058-064,075-078,080-083,086-087,092-106]
    512GB        up   infinite     16   idle NucleusB[065-074,084-085,088-091]
    GPU4H100     up   infinite      1  down* NucleusC122
    GPU4H100     up   infinite      3  drain NucleusC[118-120]
    GPU4H100     up   infinite      1  alloc NucleusC121

sbatch: Submits a batch job script to Slurm.

sbatch Example
[s219741@Nucleus005 ~]$ sbatch testjob.sh
Submitted batch job 4298643

srun: Executes a program or command in a job allocation.

srun Example
[s219741@Nucleus005 ~]$ srun -n 1 -p 256GB hostname
Nucleus075
[s219741@Nucleus005 ~]$ srun -n 1 -p 256GB python hello.py
Hello World, from Nucleus075
srun Example

Start an interactive job on allocated resources to manually run programs on BioHPC compute nodes.
Helpful for manually running code on BioHPC.

  • --pty: This key flag requests a pseudo-terminal (PTY) allocation. This is essential for interactive sessions, as it allows proper interaction with programs that expect a terminal-like environment.
  • /bin/bash: Specifies that you want to launch a Bash shell, providing you with a command-line interface on the allocated compute node.
[s219741@Nucleus005 ~]$ srun -n 1 -p 128GB --pty /bin/bash
[modulestats] Wrapper already loaded
[s219741@Nucleus030 ~]$ module load python
[s219741@Nucleus030 ~]$ python hello.py
Hello, world!

 


squeue: Report the job status for all the user jobs currently being managed by Slurm.

Below is an explanation for the attributes within the returned table once squeue has been run:

  • JOBID - Unique numerical identifier for the job.
  • PARTITION - The partition the job belongs to.
  • NAME - Job name (from the Slurm script).
  • USER - Username of the job owner.
  • ST - Job State (ex. R: Running, PD: Pending, CG: Completing; see Slurm Job States)
  • TIME - Elapsed runtime of the job.
  • NODES - Number of nodes allocated to the job.
  • NODELIST (REASON) - List of nodes allocated, and the reason if pending.
squeue Example
[s219741@Nucleus005 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4174562      32GB    TEM.1  s177440  R 100-05:01:24      4 Nucleus[079,111,119],NucleusB041
           4175483      32GB    TEM.2  s177440  R 99-03:36:42      4 Nucleus112,NucleusB[038-040]
           4178109      32GB Alpha_99  s224996  R 97-02:03:47      7 NucleusA[192-195],NucleusB[027,029-030]
           4178121   256GBv1 Alph_Ful  s224996  R 97-01:54:13      2 Nucleus[184-185]
           4185089  GPUv100s     bash  s185484  R 91-21:36:50      1 NucleusC051
           4209196      32GB AlPha_99  s224996  R 72-00:17:48      7 NucleusA[132-138]
           4215457     128GB     PDAC  s202223  R 69-03:18:36      1 Nucleus028
           4235967      32GB    TEM.3  s177440  R 52-03:09:46      4 NucleusB[034-037]
           4236138   256GBv1 AlPh_Str  s224996  R 51-21:45:05      2 Nucleus[177,194]
           4242976      32GB  R164S.1  s177440  R 49-02:45:18      4 NucleusB[010-011,025-026]
           4243043      32GB  R164S.2  s177440  R 49-01:08:47      4 NucleusB[006-009]
           4243044      32GB  R164S.3  s177440  R 49-01:07:55      4 NucleusA[173-176]
           4243045      32GB  R164N.1  s177440  R 49-01:07:00      4 NucleusA[163-164,171-172]
           4243047      32GB  R164N.2  s177440  R 49-01:06:06      4 NucleusA[159-162]
           4249997      32GB  R164N.3  s177440  R 41-01:33:00      4 NucleusA[154-156,158]
           4251687     512GB BATCH_DH ansir_fm  R 36-23:55:53      1 NucleusB063
           4252072     512GB BATCH_TR ansir_fm  R 36-06:09:03      1 NucleusA205
squeue Example

-u <username> - Request jobs or job steps from a comma separated list of one or more users.
Helpful for users to check the status of their submitted jobs on Nucleus.

[s219741@Nucleus005 ~]$ squeue -u s219741
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4298707      32GB  TestJob  s219741  R       0:10      1 NucleusA049
           4298708      32GB  TestJob  s219741  R       0:07      1 NucleusA140
           4298709      32GB  TestJob  s219741  R       0:05      1 NucleusA141

scontrol: View/update system, job, step, partition, or reservation status.

scontrol Example
[s219741@Nucleus005 ~]$ scontrol show job 4298714
JobId=4298714 JobName=TestJob
   UserId=s219741(219741) GroupId=biohpc_admin(1001) MCS_label=N/A
   Priority=4294791033 Nice=0 Account=(null) QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:13 TimeLimit=02:00:00 TimeMin=N/A
   SubmitTime=2024-02-29T15:02:48 EligibleTime=2024-02-29T15:02:48
   StartTime=2024-02-29T15:02:49 EndTime=2024-02-29T17:02:49 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=32GB AllocNode:Sid=Nucleus005:46165
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=NucleusA186
   BatchHost=NucleusA186
   NumNodes=1 NumCPUs=32 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=32,mem=28G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=28G MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=/home2/s219741/testjob.sh
   WorkDir=/home2/s219741
   StdErr=/home2/s219741/job_4298714.err
   StdIn=/dev/null
   StdOut=/home2/s219741/job_4298714.out
   Power=

scancel: Control jobs, job arrays, or job steps by sending signals or canceling them.

Useful for deleting jobs that are unresponsive, or need to be manually stopped.

scancel Example

[s219741@Nucleus004 ~]$ squeue -u 219741
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4327688      32GB   webGUI  s219741  R   12:53:22      1 NucleusA147
           4327759      32GB   webGUI  s219741  R    5:32:55      1 NucleusA082
           4327931     128GB nf-callP  s219741  R    2:58:57      1 Nucleus013
           4327937     128GB nf-callP  s219741  R    2:56:59      1 Nucleus017
           4327954      32GB   webGUI  s219741  R    2:32:11      1 NucleusB021
           4328288      32GB  TestJob  s219741  R       0:08      1 NucleusA167
[s219741@Nucleus004 ~]$ scancel 4328288
[s219741@Nucleus004 ~]$ squeue -u s219741
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4327688      32GB   webGUI  s219741  R   12:54:10      1 NucleusA147
           4327759      32GB   webGUI  s219741  R    5:33:43      1 NucleusA082
           4327931     128GB nf-callP  s219741  R    2:59:45      1 Nucleus013
           4327937     128GB nf-callP  s219741  R    2:57:47      1 Nucleus017
           4327954      32GB   webGUI  s219741  R    2:32:59      1 NucleusB021

 


sacct: Report accounting information by individual job and job step

sacct Example
[ydu@biohpcws009 ~]$ sacct -j 28066
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
28066        ReadCount+      super                    32  COMPLETED      0:0
28066.batch       batch                                1  COMPLETED      0:0
28066.0      count_rea+                                2  COMPLETED      0:0 
sacct Example

--start=<YYYY-MM-DD> - Specify a start date to begin reporting the job and job step accounting information on or after the given date.

--end=<YYYY-MM-DD> - Further refine the results by adding an --end parameter to specify an end date.

[s219741@Nucleus030 ~]$ sacct --start=2024-01-01 --end=2024-02-01
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
4223127          webGUI       32GB                    32    TIMEOUT      1:0
4223127.bat+      batch                               32  COMPLETED      0:0
4224383      nf-trackS+      super                    56  COMPLETED      0:0
4224383.bat+      batch                               56  COMPLETED      0:0
4224384      nf-trimRe+      super                    32  COMPLETED      0:0
4224384.bat+      batch                               32  COMPLETED      0:0

 

Slurm Job Script Templates

 

Basic Serial Job

#!/bin/bash
#SBATCH --job-name=serialJob                              # job name
#SBATCH --partition=super                                 # select partion from 128GB, 256GB, 384GB, GPU and super
#SBATCH --nodes=1                                         # number of nodes requested by user
#SBATCH --time=0-00:00:30                                 # run time, format: D-H:M:S (max wallclock time)
#SBATCH --output=serialJob.%j.out                         # standard output file name
#SBATCH --error=serialJob.%j.time                         # standard error output file name
#SBATCH --mail-user=username@utsouthwestern.edu           # specify an email address
#SBATCH --mail-type=ALL                                   # send email when job status change (start, end, abortion and etc.)

module add matlab/2014b                                   # load software package

matlab -nodisplay -nodesktop -nosplash < script.m         # execute program

 

Multi-task Job

  #!/bin/bash
  #SBATCH --job-name=mutiTaskJob                                      # job name
  #SBATCH --partition=super                                           # select partion from 128GB, 256GB, 384GB, GPU and super
  #SBATCH --nodes=2                                                   # number of nodes requested by user
  #SBATCH --ntasks=64                                                 # number of total tasks
  #SBATCH --time=0-00:00:30                                           # run time, format: D-H:M:S (max wallclock time)
  #SBATCH --output=mutiTaskJob.%j.out                                 # standard output file name
  #SBATCH --error=mutiTaskJob.%j.time                                 # standard error output file name
  #SBATCH --mail-user=username@utsouthwestern.edu                     # specify an email address
  #SBATCH --mail-type=ALL                                             # send email when job status change (start, end, abortion and etc.)

  module add matlab/2014b                                             # load software package

  let "ID=$SLURM_NODEID*$SLURM_NTASKS/$SLURM_NNODES+SLURM_LOCALID+1"  # distribute tasks to 2 nodes based on their ID

  srun matlab -nodisplay -nodesktop -nosplash < script.m ID            # execute program 

 

Multi-threading Job

  #!/bin/bash
  #SBATCH --job-name=mutiThreading                                    # job name
  #SBATCH --partition=super                                           # select partion from 128GB, 256GB, 384GB, GPU and super
  #SBATCH --nodes=1                                                   # number of nodes requested by user
  #SBATCH --ntasks=30                                                 # number of total tasks
  #SBATCH --time=0-10:00:00                                           # run time, format: D-H:M:S (max wallclock time)
  #SBATCH --output=mutiThreading.%j.out                               # redirect both standard output and erro output to the same file
  #SBATCH --mail-user=username@utsouthwestern.edu                     # specify an email address
  #SBATCH --mail-type=ALL                                             # send email when job status change (start, end, abortion and etc.)

  module add phenix/1.9                                               # load software package

  phenix.den_refine model.pdb data.mtz nproc=30                       # execute program with 30 CPUs 

 

Multi-core Job (MPI)

  #!/bin/bash
  #SBATCH --job-name=MPI                                              # job name
  #SBATCH --partition=super                                           # select partion from 128GB, 256GB, 384GB, GPU and super
  #SBATCH --nodes=2                                                   # number of nodes requested by user
  #SBATCH --ntasks=64                                                 # number of total tasks
  #SBATCH --time=0-00:00:10                                           # run time, format: D-H:M:S (max wallclock time)
  #SBATCH --output=MPI.%j.out                                         # redirect both standard output and erro output to the same file
  #SBATCH --mail-user=username@utsouthwestern.edu                     # specify an email address
  #SBATCH --mail-type=ALL                                             # send email when job status change (start, end, abortion and etc.)

  module add mvapich2/gcc/1.9                                         # load MPI library

  mpirun ./MPI_only                                                   # execute 64 MPI tasks across 2 nodes 

 

Hybrid multi-core/multi-threading Job (MPI with pthread)

  #!/bin/bash
  #SBATCH --job-name=MPI_pthread                                       # job name
  #SBATCH --partition=super                                            # select partion from 128GB, 256GB, 384GB, GPU and super
  #SBATCH --nodes=4                                                    # number of nodes requested by user
  #SBATCH --ntasks=8                                                   # number of total MPI tasks
  #SBATCH --time=0-00:00:10                                            # run time, format: D-H:M:S (max wallclock time)
  #SBATCH --output=MPI_pthread.%j.out                                  # redirect both standard output and erro output to the same file
  #SBATCH --mail-user=username@utsouthwestern.edu                      # specify an email address
  #SBATCH --mail-type=ALL                                              # send email when job status change (start, end, abortion and etc.)

  module add mvapich2/gcc/1.9                                          # load MPI library

  let "NUM_THREADS=$SLURM_CPUS_ON_NODE/($SLURM_NTASKS/$SLURM_NNODES)"  # calculate number of threads per MPI job

  mpirun ./MPI_pthread $NUM_THREADS                                    # 8 MPI tasks across 4 nodes, each MPI executes with 16 threads 

 

GPU Job

  #!/bin/bash
  #SBATCH --job-name=cuda-test                             # job name
  #SBATCH --partition=GPU                                  # select partion GPU
  #SBATCH --nodes=1                                        # number of nodes requested by user
  #SBATCH --gres=gpu:1                                     # use generic resource GPU, format: --gres=gpu:[n], n is the number of GPU card
  #SBATCH --time=0-00:00:10                                # run time, format: D-H:M:S (max wallclock time)
  #SBATCH --output=cuda.%j.out                             # redirect both standard output and erro output to the same file
  #SBATCH --mail-user=username@utsouthwestern.edu          # specify an email address
  #SBATCH --mail-type=ALL                                  # send email when job status change (start, end, abortion and etc.)

  module add cuda65                                        # load cuda library

  ./matrixMul                                              # execute GPU program 

 

Several full Slurm job examples can be downloaded here.

 


 

FAQs

Q: How do I activate a Conda environment in a Slurm sbatch script to run a Python job?

A:

#!/bin/bash
#SBATCH --job-name=my_conda_job  # Descriptive job name
#SBATCH --partition=super 
#SBATCH --nodes=1 
#SBATCH --time=0-00:30   
#SBATCH --output=my_conda_job.%j.out  
#SBATCH --error=my_conda_job.%j.err
#SBATCH --mail-user=username@utsouthwestern.edu  
#SBATCH --mail-type=ALL

# Load Conda module (if necessary)
module load python/3.12.x-anaconda-mamba

# Activate Conda environment
conda activate my_environment

# Execute Python script 
python my_python_script.py

Q: How do I activate a NAMD in a Slurm sbatch script?

A:

#!/bin/bash
#SBATCH --job-name=my_conda_job  # Descriptive job name
#SBATCH --partition=super 
#SBATCH --nodes=1 
#SBATCH --time=0-00:30   
#SBATCH --output=my_conda_job.%j.out  
#SBATCH --error=my_conda_job.%j.err
#SBATCH --mail-user=username@utsouthwestern.edu  
#SBATCH --mail-type=ALL

module load namd/gpu/2.14

charmrun $(which namd2) ++local ++auto-provision +ignoresharing equilibration.conf > equil.log
charmrun $(which namd2) ++local ++auto-provision +ignoresharing md.conf >md.log

Q: How do I activate a Gromacs program in a Slurm sbatch script?

A:

#!/bin/bash
#SBATCH --job-name=my_conda_job  # Descriptive job name
#SBATCH --partition=super
#SBATCH --nodes=1
#SBATCH --time=0-00:30   
#SBATCH --output=my_conda_job.%j.out  
#SBATCH --error=my_conda_job.%j.err
#SBATCH --mail-user=username@utsouthwestern.edu  
#SBATCH --mail-type=ALL

# Load Gromacs Module
module load gromacs/5.1.4

# Execute Gromacs commands
gmx grompp -f input.mdp -c conf.gro -p topol.top -o md_out.tpr
gmx mdrun -deffnm md_out 

Q: What is the maximum number of nodes I can allocate at one time?

A:

A single user can allocate the following maximum amount of computing resources across all submitted jobs:

  • 4 GPU Nodes, and
  • 16 Heavy Nodes (>128GB each) or 64 Light Nodes (128GB each).

Q: How can I extend my job's time limit?

A:

To request a job extension, please email the BioHPC team at biohpc-help@utsouthwestern.edu


 

 

Troubleshooting

My Slurm job has failed. What can I do to fix it?

1. Gather Information

  • Job Status: Use squeue -u your_username to check if your job is running, pending, failed, etc.
  • Error Messages: Scrutinize these closely:
    • SBATCH output and error files: (specified by #SBATCH -o and #SBATCH -e)
    • Your application's log files: If it generates any.

2.  Common Issue Areas

  • Incorrect Configuration:
    • Partitions: Verify you're using the correct partition for the resources you need.
    • Resource Requests: Double-check that your requested time, memory, nodes, and GPUs align with your job's needs.
    • SBATCH Directives: Ensure there are no typos or incorrect directives in your script.
  • Environment Issues:
    • Module Loads: Make sure you're loading necessary modules correctly.
    • Conda Activation: Check that the path to your Conda installation and environment are accurate.
    • File Paths: Verify that your script and input data can be found at the specified paths.
  • Application Problems
    • Compatibility: Is your software version compatible with the cluster's setup?
    • Bugs in Your Code: Debug your application outside of Slurm to isolate issues.
    • Input Data: Ensure your input data is formatted correctly and accessible.

3. Debugging Steps

  • Simplify: If possible, break your job into smaller test cases to isolate the problem.
  • Log Everything: Add logging to your scripts or application to track execution.
  • Interactive Session: Request an interactive session on a similar node to replicate the environment and test commands manually.
  • Seek Guidance: Provide your script, error messages, and any relevant logs when asking for help via the BioHPC Help Desk email: biohpc-help@utsouthwestern.edu

Additional Tips

  • Start Early: Submit test jobs with short runtimes to catch configuration issues quickly.
  • Test outside Slurm: Verify your code works independently before submitting a full-scale job.

 

Downloads

Slurm Job Examples


 

Further Reading

Slurm Documentation
Slurm Cheatsheet:
https://slurm.schedmd.com/pdfs/summary.pdf
Official Slurm documentation: https://slurm.schedmd.com/documentation.html
Slurm Quick Start User Guide: https://slurm.schedmd.com/quickstart.html

Tutorials and Guides
Slurm Tutorial from SchedMD (Slurm creators): https://slurm.schedmd.com/tutorials.html
Slurm Tutorial from Livermore computing resource: https://hpc.llnl.gov/banks-jobs/running-jobs/slurm
Slurm User Guide from Rochester Institute of Technology: https://research-computing.git-pages.rit.edu/docs/slurm_quick_start_tutorial.html

Online Courses and Videos
Slurm Job Management - Center for Advanced Research at University of South California: https://www.youtube.com/watch?v=GD5Ov75lQoM
Slurm Video Tutorials from SchedMD:
https://slurm.schedmd.com/tutorials.html

Community Resources
Submit a Ticket for Slurm on BioHPC to: biohpc-help@utsouthwestern.edu
Slurm Knowledge Base: https://slurm.schedmd.com/kb.html
Slurm User Community: https://slurm.schedmd.com/community.html