SLURM workload manager
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How can I submit job to the cluster computer?
Objectives
Run the unix command in HiPerGator
Unlike a personal computer, a computation cluster is used by large number of users to run resource intensive jobs. The sum of the resources requested by those jobs often exceed the resources available in cluster. To deal with this issue, many high performance clusters use schedulers to manage resources allocated to those jobs.
UF Hipergator
uses a popular scheduler called
SLURM
workload manager.
Portable Batch System (PBS),
MOAB,
Workload Scheduler etc.
are examples of other schedulers.
SLURM has three major functions:
- Allocate exclusive and/or non-exlcudisve access to resources for a duration of time so they can perform a work,
- Provide a framework for starting, executing, and monitoring work on allocated nodes,
- Arbitrate contention for resources by managing a queue of pending work.
Advantage of using (SLURM) scheduler:
- Once resources are allocated to a job, they are not taken away until job exits. If someone runs intensive job at the same time, the resources available to the job is not changed. This improves speed and reliability of job completion.
- It allows low priority resource-intesive jobs to run ouside peak hours.
- Unlike interactive window, SLURM jobs do not stop when user is not logged in.
SLURM scripts
In order to submit a job, user must provide a scripts that will specify user, account, time limit, memory, job output, and other information. A common way to do this is to create a submission script. A submission script, like other bash scripts, contains all the commands to be run during the job. However, it also contains extra comments which can be read by SLURM to determine resource requests.
To get started with SLURM, lets create new directory in your work directory and copy a submission script template from share/scripts directory.
$ cd /blue/general_workshop/<username>
$ mkdir slurm
$ cd slurm
$ cp ../../share/scripts/slurm_template.sh ./slurm.sh
Note: .sh
is commonly used extension for shell scripts. Using a extension is not mandatory.
Adding information to SLURM script
We have to modify some information in the template to make the provide more information to SLURM about the job.
We can use a small text editor program called nano
for writing to a file.
This will open a basic text editor.
$ nano slurm.sh
-----------------------------------------------------------------------------------------------
GNU nano 3.3 beta 02 File: slurm.sh
-----------------------------------------------------------------------------------------------
#!/bin/bash
#
#SBATCH --job-name serial_job_test # Job name
#SBATCH --account general_workshop # Account to run the computational task
#SBATCH --qos general_workshop # Account allocation
#SBATCH --mail-type END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user <email_address> # Where to send mail
#SBATCH --ntasks 1 # Run on a single CPU
#SBATCH --mem 10mb # Job memory request
#SBATCH --time 00:05:00 # Time limit hrs:min:sec
#SBATCH --output serial_test_%j.log # Standard output and error log
# Return current date and time every 10 seconds for 6 times
for i in {0..5}; do printf '%s %s\n' "$(date)"; sleep 10s; done
-----------------------------------------------------------------------------------------------
^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos
^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell
-----------------------------------------------------------------------------------------------
The comments beginning with #SBATCH
tell SLURM various information about the job.
The acutal commands to run appear after these comments. In this case, it just returns
current datetime at 10s interval.
Waiting in bash
To halt the execution of commands for certain period of time,
sleep
command is used followed by the period of time.
Change the <email_address> to your email address where you can check email. Once you are done, press Ctrl+x to return to bash prompt. Press Y and Enter to save the changes made to the file.
Editing in nano
nano is a commandline editor. You can only move your cursor with arrow keys: ↑, ↓, ← and →. Clicking with mouse does not change the position of the cursor. Be careful, you may be editing in wrong place.
If you accidentally edited in wrong place, exit nano, delete the script
slurm.sh
and copy again from share directory. Do not forget to edit the <email_address>.
Running a job in SLURM
Submitting a SLURM job
To submit the job to SLURM, sbatch
command is used.
$ sbatch slurm.sh
Submitted batch job <jobid>
Checking status of a SLURM job
You can check the status of the job using the command squeue
.
-u
argument accepts <username> and displays all jobs by the user.
-A
argument accepts account name and displays all jobs
using the resources allocated to that account.
$ squeue -u <username>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
<jobid> hpg2-comp serial_j <user> R 0:07 1 c29a-s2
If you do not see your job, it may have already been completed. Run the job again and check within a minute. Also, check your email to see if you received any messages from HiperGator.
$ squeue -A general_workshop
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
<jobid> hpg2-comp serial_j <user1> PD 0:00 1 c29a-s2
<jobid> hpg2-comp serial_j <user2> R 1:12 1 c15a-s1
<jobid> hpg2-comp serial_j <user3> R 0:46 1 c09a-s4
Understanding Job Status
Under status
ST
,R
stands for Running andPD
stands for pending. If the job is pending, a reason may be provided in last column. Eg:
- None: Just taking a while before running.
- Priority: Higher priority jobs exist in this partition.
- QOSMaxCpuPerUserLimit: The user is already using max number of CPU that they are allowed to use.
Checking the output
The SLURM submission script containas a line
#SBATCH --output serial_test_%j.log
. Thus the output for this job
with be in the file serial_test_<jobid>.log
.
$ ls
serial_test_<jobid>.log slurm.sh
$ cat serial_test_<jobid>.log
Tue Sep 15 02:04:05 EDT 2020
Tue Sep 15 02:04:15 EDT 2020
Tue Sep 15 02:04:25 EDT 2020
Tue Sep 15 02:04:35 EDT 2020
Tue Sep 15 02:04:45 EDT 2020
Tue Sep 15 02:04:55 EDT 2020
Autocomplete in terminal
To autocomplete a file or directory name, press Tab button. Names are autocompleted until there are conflicts (e.g. files with same prefixes). In case of conflict, press tab two time to view list of files and folders (equivalent to
ls
). To try autocomplete, typecat se
and press Tab.
Key Points
Don’t forget to edit the memory, CPU requested, and email in the SLURM request.
The job we submit here is a single thread. There are several other ways to submit the job. Multithreaded, Hybrid, GPU, and array scripts request resources depending on the job you are running.