| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| slurm_tutorial [2017/11/27 15:43] – [Tips] lenocil | slurm_tutorial [2019/01/16 08:35] (current) – [Less-common user commands] lenocil |
|---|
| <code> | <code> |
| $ sinfo | $ sinfo |
| PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | |
| playground* up infinite 2 mix maris[029,031] | |
| playground* up infinite 34 idle maris[004-022,030,032-033,035-046] | |
| computation up infinite 14 mix maris[052,057-061,064-068,071-073] | |
| computation up infinite 5 alloc maris[062-063,069-070,074] | |
| computation up infinite 9 idle maris[047-051,053-056] | |
| emergency up infinite 3 mix maris[071-073] | |
| emergency up infinite 3 alloc maris[069-070,074] | |
| notebook up infinite 2 mix maris[024,027] | |
| notebook up infinite 3 alloc maris[023,025-026] | |
| notebook up infinite 1 idle maris028 | |
| gpu up infinite 1 idle maris075 | |
| computation-intel up 3-00:00:00 1 mix maris076 | |
| computation-intel up 3-00:00:00 1 alloc maris077 | |
| </code> | </code> |
| A * near a partition name indicates the default partition. See ''man sinfo'' | A * near a partition name indicates the default partition. See ''man sinfo'' |
| |
| <code> | <code> |
| $squeue -u bongo | $squeue -u <username> |
| JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |
| 324936 computati VVV2_VV bongo R 1-02:11:54 1 maris068 | |
| | |
| </code> | </code> |
| |
| <code> | <code> |
| $ scontrol show partition notebook | $ scontrol show partition notebook |
| PartitionName=notebook | |
| AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL | |
| AllocNodes=ALL Default=NO QoS=notebook | |
| DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO | |
| MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED | |
| Nodes=maris0[23-28] | |
| PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO | |
| OverTimeLimit=NONE PreemptMode=OFF | |
| State=UP TotalCPUs=48 TotalNodes=6 SelectTypeParameters=NONE | |
| DefMemPerNode=UNLIMITED MaxMemPerCPU=4096 | |
| |
| </code> | </code> |
| |
| <code> | <code> |
| $scontrol show node maris004 | $scontrol show node maris004 |
| NodeName=maris004 Arch=x86_64 CoresPerSocket=4 | |
| CPUAlloc=8 CPUErr=0 CPUTot=8 CPULoad=0.01 | |
| AvailableFeatures=(null) | |
| ActiveFeatures=(null) | |
| Gres=(null) | |
| NodeAddr=maris004 NodeHostName=maris004 Version=16.05 | |
| OS=Linux RealMemory=16046 AllocMem=16000 FreeMem=2082 Sockets=2 Boards=1 | |
| State=ALLOCATED ThreadsPerCore=1 TmpDisk=9951 Weight=1 Owner=N/A MCS_label=N/A | |
| BootTime=2016-12-22T12:08:05 SlurmdStartTime=2017-02-17T09:19:46 | |
| CapWatts=n/a | |
| CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 | |
| ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s | |
| |
| </code> | </code> |
| |
| <code> | <code> |
| novamaris [1087] $ scontrol show jobs 1052 | novamaris [1087] $ scontrol show jobs 1052 |
| JobId=1052 JobName=slurm_engine.sbatch | |
| UserId=xxxxxxx(1261909) GroupId=lorentz(9999) MCS_label=N/A | |
| Priority=1 Nice=0 Account=zzzzzz QOS=normal | |
| JobState=RUNNING Reason=None Dependency=(null) | |
| Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 | |
| RunTime=00:49:06 TimeLimit=UNLIMITED TimeMin=N/A | |
| SubmitTime=2017-02-23T12:17:34 EligibleTime=2017-02-23T12:17:34 | |
| StartTime=2017-02-23T12:17:36 EndTime=Unknown Deadline=N/A | |
| PreemptTime=None SuspendTime=None SecsPreSuspend=0 | |
| Partition=average-computation AllocNode:Sid=maris004:20658 | |
| ReqNodeList=(null) ExcNodeList=(null) | |
| NodeList=maris[024-033,035-040] | |
| BatchHost=maris024 | |
| NumNodes=16 NumCPUs=128 NumTasks=128 CPUs/Task=1 ReqB:S:C:T=0:0:*:* | |
| TRES=cpu=128,mem=514784M,node=16 | |
| Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | |
| MinCPUsNode=1 MinMemoryNode=32174M MinTmpDiskNode=0 | |
| Features=(null) Gres=(null) Reservation=(null) | |
| OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) | |
| Command=./slurm_engine.sbatch | |
| WorkDir=/marisdata/%u/ | |
| StdErr=/marisdata/xxxxxxx/.log/abcd.err | |
| StdIn=/dev/null | |
| StdOut=/marisdata/xxxxxxx/.log/abcd.out | |
| Power= | |
| |
| </code> | </code> |
| |
| 0: maris005 | 0: maris005 |
| 1: maris006 | 1: maris006 |
| |
| </code> | </code> |
| | |
| **Create three tasks running on the same node** | **Create three tasks running on the same node** |
| <code> | <code> |
| </code> | </code> |
| **Create three tasks running on different nodes specifying which nodes should __at least__ be used** | **Create three tasks running on different nodes specifying which nodes should __at least__ be used** |
| | |
| <code> | <code> |
| srun -N3 -w "maris00[5-6]" -l /bin/hostname | srun -N3 -w "maris00[5-6]" -l /bin/hostname |
| <code> | <code> |
| $ sacctmgr show qos format=Name,MaxCpusPerUser,MaxJobsPerUser,Flags | $ sacctmgr show qos format=Name,MaxCpusPerUser,MaxJobsPerUser,Flags |
| Name MaxCPUsPU MaxJobsPU Flags | |
| ---------- --------- --------- -------------------- | |
| normal | |
| playground 32 DenyOnLimit | |
| notebook 4 1 DenyOnLimit | |
| </code> | </code> |
| |
| |
| <code> | <code> |
| $ sshare -U -u xxxxx | $ sshare -U -u <username> |
| Account User RawShares NormShares RawUsage EffectvUsage FairShare | Account User RawShares NormShares RawUsage EffectvUsage FairShare |
| -------------------- ---------- ---------- ----------- ----------- ------------- ---------- | -------------------- ---------- ---------- ----------- ----------- ------------- ---------- |
| |
| </code> | </code> |
| | |
| | :!: Use ''--noconvert'' if you want sacct to display consistent units across jobs. |
| ===== Tips ===== | ===== Tips ===== |
| |