Disk filesystem – is used to list a full summary of available and used disk space of file system.
df -kh
k: To display all file system information and usage in 1024-byte blocks
h: human readable
Example usage:
[root@login03 ~]# df -kh Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 200G 70G 131G 35% / devtmpfs 252G 0 252G 0% /dev tmpfs 252G 5.3M 252G 1% /dev/shm tmpfs 252G 422M 252G 1% /run tmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2 1016M 247M 770M 25% /boot /dev/sda1 200M 9.8M 191M 5% /boot/efi beegfs_home2 219T 164T 55T 75% /home2 beegfs_scratch 175T 77T 99T 44% /scratch 172.20.239.200:/vol_home2 82T 72T 11T 88% /userfiles 172.20.239.202:/VOL_KUTTAM 49T 34T 16T 69% /kuttam 172.20.239.201:/vol_datasets 50T 45T 5.7T 89% /datasets
Disks mounted on:
/ : server root disk
/scratch : User home disk
/datasets: General Datasets disk
/userfiles: User Datasets disk
/kuttam: Kuttam disk
/home2 : System Disk
Disk usage – is used to learn disk usage of a file/folder
du -sh
s: grand total disk usage size of a directory
h: human readable
du -sh filename
In order to list disk usage of more than one files.
du -sh *
Lists information about Slurm nodes and partitions. Command output has 6 columns.
Partition: Slurm partitions
Avail: Partition availability
Timelimit: Partition max time limit
Nodes: Nodes in requested partition
State: State of nodes(mix, idle, down, draining, drained, mix, alloc)
Nodelist: Nodes in partition
sinfo shows all partitions and nodes in cluster. For ai partition, you can use “sinfo|grep ai”
[root@login02 ~]# sinfo|grep ai PARTITION AVAIL TIMELIMIT NODES STATE NODELIST admin up infinite 30 mix ag01,ai[01,03-14],be[04,06-07],buyukliman,da[01-04],dy03,it[02-04],ke[02,04],rk01,sm01 admin up infinite 3 idle ai02,be[11-12] short* up 2:00:00 29 mix ag01,ai[01,03-14],be[04,06-07],buyukliman,da[01-04],dy03,it[01-04],rk01,sm01 short* up 2:00:00 3 idle ai02,be[11-12] mid up 1-00:00:00 18 mix ai[05-11],be[04,06-07],buyukliman,da[03-04],dy03,it[03-04],rk01,sm01 long up 7-00:00:00 19 mix ag01,ai[06-14],be[04,06-07],buyukliman,da04,dy03,it04,rk01,sm01 ai up 7-00:00:00 14 mix ai[01,03-14],dy03 ai up 7-00:00:00 1 alloc dy02 ai up 7-00:00:00 1 idle ai02
In command output,
Partition: ai
Availibility: partition is upup
Time limit: max 7days
Nodes-State-Nodelist:
-There are 14 nodes with mix state. It means that some of resources on these nodes(ai[01,03-14],dy03) are used. There is still resource.
-There is only one node(dy02) fully allocated.
-There is only one node(ai02) free. fully available
⚠️⚠️⚠️ sinfo command is useful for listing resources in a short version. However, there is another command(kuacc-info) which is more detailed and more useful.
Gives cluster usage summary (CPUs, MEMs, Nodes).
There are two parts of command output. First part lists users, their running and pending jobs number, node usage of users, cpus usage of users and memory usage of users on KUACC(IT) nodes(kcc) – cosmos(csms) nodes – ilac nodes(ilc) – ai nodes.
Second part of command output is very useful. It gives detailed version of sinfo command. Following output is second part of command output. Name of node, its partition, cpu usage, mem usage and status of node.
------------------------------------------------------------------------------------------------------------ NAME TYPE CPU_USAGE MEM_USAGE STATUS ------------------------------------------------------------------------------------------------------------ ag01 COSBI 7 ( 87%) 55.25 ( 45%) BUSY ai01 AI 38 ( 95%) 186.88 ( 38%) BUSY ai02 AI 0 ( 0%) 0.00 ( 0%) FREE ai03 AI 36 ( 90%) 319.00 ( 64%) BUSY ai04 AI 20 ( 50%) 211.00 ( 42%) BUSY ai05 AI 19 ( 47%) 176.00 ( 35%) BUSY ai06 AI 5 ( 12%) 60.00 ( 12%) BUSY ai07 AI 38 ( 95%) 164.00 ( 33%) BUSY ai08 AI 26 ( 65%) 148.00 ( 30%) BUSY ai09 AI 29 ( 72%) 134.00 ( 27%) BUSY ai10 AI 34 ( 85%) 256.00 ( 52%) BUSY ai11 AI 35 ( 87%) 484.00 ( 98%) FULL(MEM) ai12 AI 37 ( 92%) 406.30 ( 82%) BUSY ai13 AI 26 ( 65%) 468.14 ( 95%) FULL(MEM) ai14 AI 24 ( 60%) 378.00 ( 76%) BUSY be01 ILAC 12 (100%) 48.00 ( 39%) FULL(CPU) be02 ILAC 12 (100%) 48.00 ( 39%) FULL(CPU) be03 ILAC 12 (100%) 48.00 ( 39%) FULL(CPU) be04 ILAC 11 ( 91%) 44.00 ( 35%) BUSY be05 ILAC 12 (100%) 48.00 ( 39%) FULL(CPU) be06 ILAC 11 ( 91%) 44.00 ( 35%) BUSY be07 ILAC 8 ( 66%) 32.00 ( 26%) BUSY be08 ILAC 12 (100%) 123.05 (100%) FULL(CPU) be09 ILAC 12 (100%) 56.00 ( 45%) FULL(CPU) be10 ILAC 12 (100%) 56.00 ( 45%) FULL(CPU) be11 ILAC 0 ( 0%) 0.00 ( 0%) FREE be12 ILAC 0 ( 0%) 0.00 ( 0%) FREE buyukliman HAMSI 18 ( 25%) 108.00 ( 44%) BUSY da01 BIYOFIZ 10 ( 50%) 40.00 ( 16%) BUSY da02 BIYOFIZ 10 ( 50%) 40.00 ( 16%) BUSY da03 BIYOFIZ 10 ( 50%) 40.00 ( 16%) BUSY da04 BIYOFIZ 3 ( 15%) 140.00 ( 57%) BUSY dy02 AI 36 (100%) 144.00 ( 29%) FULL(CPU) dy03 AI 2 ( 8%) 78.12 ( 17%) BUSY it01 KUACC 26 ( 65%) 104.00 ( 21%) BUSY it02 KUACC 10 ( 25%) 40.00 ( 8%) BUSY it03 KUACC 35 ( 87%) 140.00 ( 28%) BUSY it04 KUACC 38 ( 95%) 192.00 ( 39%) BUSY ke01 COSMOS 36 (100%) 217.66 ( 44%) FULL(CPU) ke02 COSMOS 35 ( 97%) 242.48 ( 49%) FULL(CPU) ke03 COSMOS 36 (100%) 225.66 ( 45%) FULL(CPU) ke04 COSMOS 36 (100%) 231.66 ( 47%) FULL(CPU) ke05 COSMOS 36 (100%) 198.83 ( 40%) FULL(CPU) ke06 COSMOS 36 (100%) 190.83 ( 38%) FULL(CPU) ke07 COSMOS 36 (100%) 198.83 ( 40%) FULL(CPU) ke08 COSMOS 35 ( 97%) 160.00 ( 32%) FULL(CPU) rk01 KUTEM 40 ( 55%) 240.00 ( 48%) BUSY sm01 IUI 4 ( 20%) 24.00 ( 38%) BUSY
⚠️⚠️⚠️ Command output is useful while submitting jobs. User can find a free resource and submit job on these nodes by nodelist or constraint flags.
Lists cluster nodes with specifications (CPU types, GPU list,Memory, Features etc). it is detailed version of sinfo command.
(base) [yakarken18@login02 ~]$ kuacc-nodes NODELIST STATE CPUS S:C:T MEMORY TMP_DISK GRES AVAIL_FEATURES ai[01,04-06,09] mixed 40 2:20:1 503000 0 gpu:tesla_t4:8 ai,ib,compute,40cpu,gpu ai[11-14] mixed 40 2:20:1 503000 0 gpu:tesla_v100:8 ai,ib,compute,40cpu,gpu buyukliman mixed 72 2:18:2 250000 0 (null) hamsi,ib,compute,72cpu da[03-04] mixed 20 2:10:1 250000 0 gpu:tesla_k20m:1 biyofiz,ib,compute,20cpu dy02 mixed 36 2:18:1 504000 0 gpu:tesla_k80:4 ai,ib,compute,36cpu,gpu dy03 mixed 24 2:12:1 450000 0 gpu:tesla_k80:8 ai,ib,compute,24cpu,gpu it[01-02] mixed 40 2:20:1 500000 0 gpu:tesla_v100:1 IT,ib,compute,tesla_v100 sm01 mixed 20 2:10:1 64000 0 gpu:tesla_k40m:2,gpu:tesla_k80:1 iui,ib,compute,20cpu ag01 allocated 8 2:4:1 124000 0 gpu:gtx_1080ti:2 cosbi,ib,compute,8cpu,gpu ai[02-03,07-08, allocated 40 2:20:1 503000 0 gpu:tesla_t4:8 ai,ib,compute,40cpu,gpu be[01-12] allocated 12 2:6:1 126000 0 gpu:tesla_k20m:1 ilac,ib,compute,12cpu da02 allocated 20 2:10:1 250000 0 gpu:tesla_k20m:1 biyofiz,ib,compute,20cpu it[03-04] allocated 40 2:20:1 500000 0 gpu:tesla_v100:1 IT,ib,compute,tesla_v100 rk01 allocated 72 2:18:2 504000 0 (null) kutem,ib,compute,72cpu,HT da01 idle 20 2:10:1 250000 0 gpu:tesla_k20m:1 biyofiz,ib,compute,20cpu ke[01-08] allocated 36 2:18:1 504000 0 (null) cosmos,ib,compute,36cpu ============================================================================================================ KUACC NODES CPU LIST ============================================================================================================ login02-login03 |model name : Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz ag01 |model name : Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz be01 - be14 |model name : Intel(R) Xeon(R) CPU E5-2640 @ 2.50GHz buyukliman |model name : Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz da01 - da04 |model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz dy02 |model name : Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz dy03 |model name : Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.10GHz ke01 - ke08 |model name : Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz rk01 |model name : Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz sm01 |model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz it01 - it04 |model name : Intel(R) Xeon(R) Gold 6148 @ 2.40GHz ai01 - ai14 |model name : Intel(R) Xeon(R) Gold 6248 @ 2.50GHz ============================================================================================================
⚠️⚠️⚠️ This command lists specifications of all nodes in cluster. It is very useful when you need specific resource. For example, GRES column shows all gpu types in cluster. You can choose a specific gpu by using gres flag in your script with gres from this command output.
#SBATCH --gres=gpu_tesla_k80
⚠️⚠️⚠️ AVAIL_FEATURES column in command output lists all features of nodes. These Features are set by system admins and used for constraint flag in slurm job script.
For example,
(base) [yakarken18@login02 ~]$ kuacc-nodes |grep ai ai[01,04-06,09] mixed 40 2:20:1 503000 0 gpu:tesla_t4:8 ai,ib,compute,40cpu,gpu,tesla_t4,6248,molpro,vnc ai[11-14] mixed 40 2:20:1 503000 0 gpu:tesla_v100:8 ai,ib,compute,40cpu,gpu,tesla_v100,6248,molpro,vnc dy02 mixed 36 2:18:1 504000 0 gpu:tesla_k80:4 ai,ib,compute,36cpu,gpu,tesla_k80,e52695,e52695v4, dy03 mixed 24 2:12:1 450000 0 gpu:tesla_k80:8 ai,ib,compute,24cpu,gpu,tesla_k80,e52695,e52695v2, ai[02-03,07-08, allocated 40 2:20:1 503000 0 gpu:tesla_t4:8 ai,ib,compute,40cpu,gpu,tesla_t4,6248,molpro,vnc ai01 - ai14 |model name : Intel(R) Xeon(R) Gold 6248 @ 2.50GHz
ai[01-04,06-10] mixed 40 2:20:1 503000 0 gpu:tesla_t4:8 ai,ib,compute,40cpu,gpu,tesla_t4,6248,molpro,vnc
Features for ai nodes:
ai[01-10]: Feature=ai,ib,compute,40cpu,gpu,tesla_t4,6248,molpro,vnc
ai[11-14]: Feature= ai,ib,compute,40cpu,gpu,tesla_v100,6248,molpro,vnc
dy02: Feature= ai,ib,compute,36cpu,gpu,tesla_k80,e52695,e52695v4,vnc
dy03: Feature=ai,ib,compute,24cpu,gpu,tesla_k80,e52695,e52695v2,vnc
ai: ai partition
ib: node with infiniband network
compute:compute node
24cpu: node with 24cores
36cpu: node with 36cores
40cpu: node with 40cores
tesla_XX: node with tesla_XX gpu
molpro: node with 1TB local disk
6248: node with Intel Gold 6248 cpu
e52695: node with Intel e52695 cpu
⚠️⚠️⚠️ Some users needs to run their jobs on specific node. For example, on Intel Gold 6248 cpus. User can use 6248 feature in constraint flag and user limits jobs to run on 6248 cpus.
⚠️⚠️⚠️ Any feature can be added into node available feature list.
Examples:
#SBATCH --constraint=tesla_k80
By using constraint and feature molpro. tmp2 folder as stratch.
#SBATCH --constraint=molpro export $TMP=/tmp2
At the end of job, scratch data should be cleaned.
This command is used to check any information about compute nodes.
[root@login03 ~]# scontrol show node ai12 NodeName=ai12 Arch=x86_64 CoresPerSocket=20 CPUAlloc=23 CPUErr=0 CPUTot=40 CPULoad=13.54 AvailableFeatures=ai,ib,compute,40cpu,gpu,tesla_v100,6248,molpro,vnc ActiveFeatures=ai,ib,compute,40cpu,gpu,tesla_v100,6248,molpro,vnc Gres=gpu:tesla_v100:8 NodeAddr=ai12 NodeHostName=ai12 Version=17.02 OS=Linux RealMemory=503000 AllocMem=487188 FreeMem=109049 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=90 Owner=N/A MCS_label=N/A Partitions=admin,short,mid,ai BootTime=2021-10-20T14:17:44 SlurmdStartTime=2021-10-25T21:10:44 CfgTRES=cpu=40,mem=503000M,gres/gpu:tesla_v100=8 AllocTRES=cpu=23,mem=487188M,gres/gpu:tesla_v100=3 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Information: NodeName, Arch(architecture), CoresperSocket, CPUAlloc, CPUErr, CPUTot, CPULoad, AvailableFeatures, ActiveFeatures, Gres, NodeAddr, NodeHostName, Version, OS, RealMemory, AllocMem, FreeMem, Sockets, Boards, State, ThreadsPerCore, TmpDisk, Weight, Owner, MCS_label, Partitions, BootTime, SlurmdStartTime, CfgTRES, CapWatts, CurrentWatts, LowestJoules, ConsumedJoules..
AvailableFeatures : You can use kuacc-info and choose a node from list. Then, you can use scontrol show node node_name command and check its feature for constraint flag.
Gres: shows gpu version on node
Partitions: shows node’s partitions
CfgTRES: Resources available on node
AllocTRES: Resources allocated on node (This output has a bug. It only shows gpu usage with gpu version (gres=gpu:tesla_v100:x). If user submit job with –gres=gpu:1, it is not listed in command output.
⚠️⚠️⚠️ You can check other parameters by manual command “man scontrol”