Jobs on Baskerville are under the control of the Slurm scheduling system. The scheduling system is configured to offer an equitable distribution of resources over time to all users. The key means by which this is achieved are:
- Jobs are scheduled according to the resources that are requested.
- Jobs are not necessarily run in the order in which they are submitted.
- Jobs requiring a large number of cores and/or long walltime will have to queue until the requested resources become available. The system will run smaller jobs, that can fit in available gaps, until all of the resources that have been requested for the larger job become available - this is known as backfill. Hence it is beneficial to specify a realistic walltime for a job so it can be fitted in the gaps.
Here we give a quick introduction to Slurm commands. Those requiring more fine grain control should consult the relevant documentation.
Submitting a job¶
The command to submit a job is
sbatch. For example, to submit the set of
commands contained in the file myscript.sh, use the command:
The system will return a job number, for example:
Submitted batch job 55260
Slurm is aware of your current working directory when submitting the job so there is no need to manually specify it in the script.
Upon completion of the job, there will be two output files in the directory from which you submitted the job. These files, for job id 55260, are:
slurm-55260.out- standard out and standard error output
slurm-55260.stats- information about the job from Slurm
Cancelling a job¶
To cancel a queued or running job use the
scancel command and supply it with
the job ID that is to be cancelled. For example, to cancel the previous job:
Monitoring Your Jobs¶
There are a number of ways to monitor the current status of your job. You can view what’s going on by issuing any one of the following commands:
squeueis Slurm’s command for viewing the status of your jobs. This shows information such as the job’s ID and name, the QOS used (the “partition”, which will tell you the node type), the user that submitted the job, time elapsed and the number of nodes being used.
scontrolis a powerful interface that provides an advanced amount of detail regarding the status of your job. The
scontrolcan be used to view details regarding a specific job.
squeue squeue -j 55620 scontrol show job 55620
Associate Jobs with Projects and QoS¶
Every job has to be associated with a project to ensure the equitable distribution of resources. Project owners and members will have been issued a project code for each registered project, and only usernames authorised by the project owner will be able to run jobs using that project code. Additionally, every job has to be associated with a QoS.
You can see what projects you are a member of, and what QoS are available to you, by running the command:
If you are registered on more than one project then it should be specified using the
--account option followed by the
project code. For example, if your project is project-name then add the following line to your job script:
You can specify using the
--qos option followed by the QoS name. For example, if the QoS is qos-name then add the
following line to your job script: