This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| slurm_tutorial [2017/10/27 14:33] – [Examples] lenocil | slurm_tutorial [2019/01/16 08:35] (current) – [Less-common user commands] lenocil | ||
|---|---|---|---|
| Line 7: | Line 7: | ||
| Slurm is **free** software distributed under the [[http:// | Slurm is **free** software distributed under the [[http:// | ||
| - | ==== What is a parallel | + | ==== What is parallel |
| - | //A parallel job consists of tasks that run simultaneously.// | + | //A parallel job consists of tasks that run simultaneously.// |
| - | * by running a multi-process program, for example using [[https:// | + | |
| - | * by running a multi-threaded program, for example see [[http:// | + | |
| - | + | ||
| - | A multi-process program consists of multiple tasks orchestrated by MPI and possibly executed by different nodes. On the other hand, a multi-threaded program consists of multiple task using several CPUs on the same node. | + | |
| - | + | ||
| - | Slurm' | + | |
| ==== Slurm' | ==== Slurm' | ||
| Line 55: | Line 49: | ||
| < | < | ||
| $ sinfo | $ sinfo | ||
| - | PARTITION | ||
| - | playground* | ||
| - | playground* | ||
| - | computation | ||
| - | computation | ||
| - | computation | ||
| - | emergency | ||
| - | emergency | ||
| - | notebook | ||
| - | notebook | ||
| - | notebook | ||
| - | gpu up | ||
| - | computation-intel | ||
| - | computation-intel | ||
| </ | </ | ||
| A * near a partition name indicates the default partition. See '' | A * near a partition name indicates the default partition. See '' | ||
| Line 75: | Line 55: | ||
| < | < | ||
| - | $squeue -u bongo | + | $squeue -u < |
| - | JOBID PARTITION | + | |
| - | | + | |
| - | + | ||
| </ | </ | ||
| Line 87: | Line 64: | ||
| < | < | ||
| $ scontrol show partition notebook | $ scontrol show partition notebook | ||
| - | PartitionName=notebook | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | |||
| </ | </ | ||
| < | < | ||
| $scontrol show node maris004 | $scontrol show node maris004 | ||
| - | NodeName=maris004 Arch=x86_64 CoresPerSocket=4 | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | |||
| </ | </ | ||
| < | < | ||
| novamaris [1087] $ scontrol show jobs 1052 | novamaris [1087] $ scontrol show jobs 1052 | ||
| - | JobId=1052 JobName=slurm_engine.sbatch | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | |||
| </ | </ | ||
| Line 156: | Line 83: | ||
| 0: maris005 | 0: maris005 | ||
| 1: maris006 | 1: maris006 | ||
| + | </ | ||
| - | </ | ||
| **Create three tasks running on the same node** | **Create three tasks running on the same node** | ||
| < | < | ||
| Line 166: | Line 93: | ||
| </ | </ | ||
| **Create three tasks running on different nodes specifying which nodes should __at least__ be used** | **Create three tasks running on different nodes specifying which nodes should __at least__ be used** | ||
| + | |||
| < | < | ||
| srun -N3 -w " | srun -N3 -w " | ||
| Line 190: | Line 118: | ||
| **Create a job script and submit it to slurm for execution** | **Create a job script and submit it to slurm for execution** | ||
| - | :!: Use `swizard' | + | Suppose ''batch.sh'' |
| + | < | ||
| + | #!/bin/env bash | ||
| + | #SBATCH -n 2 | ||
| + | #SBATCH -w maris00[5-6] | ||
| + | srun hostname | ||
| + | </ | ||
| + | |||
| + | then submit it using '' | ||
| - | Or wrote your own script to submit using | + | See '' |
| ==== Less-common user commands ==== | ==== Less-common user commands ==== | ||
| Line 207: | Line 143: | ||
| < | < | ||
| $ sacctmgr show qos format=Name, | $ sacctmgr show qos format=Name, | ||
| - | Name MaxCPUsPU MaxJobsPU | ||
| - | ---------- --------- --------- -------------------- | ||
| - | normal | ||
| - | playground | ||
| - | notebook | ||
| </ | </ | ||
| Line 228: | Line 159: | ||
| </ | </ | ||
| - | :!: If your job is serial (not parallel, that is not submitted using `srun' | + | :!: Note that in the example above the job is identified by id '' |
| - | + | ||
| - | :!: For parallel | + | |
| === sshare === | === sshare === | ||
| Line 236: | Line 165: | ||
| < | < | ||
| - | $ sshare -U -u xxxxx | + | $ sshare -U -u < |
| | | ||
| -------------------- ---------- ---------- ----------- ----------- ------------- ---------- | -------------------- ---------- ---------- ----------- ----------- ------------- ---------- | ||
| - | xxxxx yyyyyy | + | xxxxx yyyyyy |
| </ | </ | ||
| - | :!: On maris, usage parameters will decay over time according to a PriorityDecayHalfLife of 14 days. | ||
| === sprio === | === sprio === | ||
| Line 273: | Line 201: | ||
| </ | </ | ||
| + | |||
| + | :!: Use '' | ||
| ===== Tips ===== | ===== Tips ===== | ||
| - | To minimize the time your job spends in the queue you could specify multiple partitions so that the job can start as soon as possible. Use '' | + | To minimize the time your job spends in the queue you could specify multiple partitions so that the job could start as soon as possible. Use '' |
| To have a rough estimate of when your queued job will start type '' | To have a rough estimate of when your queued job will start type '' | ||
| Line 287: | Line 217: | ||
| < | < | ||
| - | watch -n 1 -x sinfo -S" | + | sinfo -i 5 -S" |
| </ | </ | ||
| Line 316: | Line 246: | ||
| For instance ''# | For instance ''# | ||
| - | === Environment variables available to any slurm job === | + | === Environment variables available to slurm jobs === |
| - | You can use any of the following variables in your jobs | + | Type '' |
| - | < | ||
| - | $ salloc -p playground -N 10 | ||
| - | salloc: Granted job allocation 13709 | ||
| - | $ printenv | grep -i slurm_ | ||
| - | SLURM_NODELIST=maris[031-033, | ||
| - | SLURM_JOB_NAME=bash | ||
| - | SLURM_NODE_ALIASES=(null) | ||
| - | SLURM_JOB_QOS=normal | ||
| - | SLURM_NNODES=10 | ||
| - | SLURM_JOBID=13709 | ||
| - | SLURM_TASKS_PER_NODE=1(x10) | ||
| - | SLURM_JOB_ID=13709 | ||
| - | SLURM_SUBMIT_DIR=/ | ||
| - | SLURM_JOB_NODELIST=maris[031-033, | ||
| - | SLURM_CLUSTER_NAME=maris | ||
| - | SLURM_JOB_CPUS_PER_NODE=1(x10) | ||
| - | SLURM_SUBMIT_HOST=novamaris.lorentz.leidenuniv.nl | ||
| - | SLURM_JOB_PARTITION=playground | ||
| - | SLURM_JOB_ACCOUNT=yuyuysu | ||
| - | SLURM_JOB_NUM_NODES=10 | ||
| - | SLURM_MEM_PER_NODE=32174 | ||
| - | |||
| - | </ | ||