일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
- LUSTRE
- Source
- CPU
- hpcm패치
- Linux
- SLURM
- HPFSS
- rhel
- CUDA
- gpfs
- rocm
- build
- GPU
- 1.10
- nvidia
- Kernel
- AMD
- HPCM
- PFSS
- 1.9
- Singularity
- infiniband
- Cray
- patch
- HPE
- Docker
- conda
- ubuntu
- top500
- java
- Today
- Total
목록SLURM (6)
HPE CRAY 자료 공유
※ slurm gres.conf에 AutoDetect 옵션 사용을 위한 rpmbuild 빌드방법과 간단한 예제 1. NVIDIA GPU- cuda toolkit 설치$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run$ sudo sh cuda_11.8.0_520.61.05_linux.run - rpmbuild 옵션에 "--with-nvml"을 추가$ rpmbuild --define "_with_nvml --with-nvml=/usr/local/cuda-11.8" -ta slurm-22.05.6.tar.bz2 - gpu 라이브러리 포함 확인$ cd ${HOM..
1. Dependency Packages 설치 # apt install gcc gfortran make # apt install build-essential fakeroot devscripts # apt install -y munge libmunge-dev libmunge2 rng-tools python3 python3-pip libpython3-dev libssl-dev bzip2 libbz2-dev \ gcc openssl numactl hwloc lua5.3 man2html mariadb-server libmariadb-dev \ make ruby ruby-dev libmunge-dev libpam0g-dev libreadline8 libreadline-dev lz4 liblz4-dev \ libg..
RHEL 8.6 OS에 slurm + pyxis + enroot 설치 기록 1. 의존성 패지키 설치 # yum groupinstall "Development Tools" # yum install jna python3-docutils python3-devel kernel-rpm-macros \ gcc-gfortran golang bzip2-devel pam-devel readline-devel java-1.8.0-openjdk-devel \ python39 python39-devel python39-pip libatomic libatomic-static \ mariadb mariadb-server mariadb-devel tcl-devel tk-devel libseccomp-devel \ perl perl..
1. user 생성 # export MUNGEUSER=966 # groupadd -g $MUNGEUSER munge # useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge # export SLURMUSER=967 # groupadd -g $SLURMUSER slurm # useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm 2. Dependency Packages 설치 # apt install -y munge libmunge-dev libmung..
1. Commands User Commands PBS Slurm Job submission qsub [script_file] sbatch [script_file] Job deletion qdel [job_id] scancel [job_id] Job status (by job) qstat [job_id] squeue [job_id] Job status (by user) qstat -u [user_name] squeue -u [user_name] Job hold qhold [job_id] scontrol hold [job_id] Job release qrls [job_id] scontrol release [job_id] Queue list qstat -Q squeue Node list pbsnodes -l ..
sinfo 명령을 이용하여 "idle" 상태의 노드를 확인 합니다. $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST short* up 4:00:00 11 down* gpu_a100n[01-06],gpu_v100n[01-05] short* up 4:00:00 1 alloc node43 short* up 4:00:00 6 idle gpu_v100n[06-08],node[44-46] normal up 1-00:00:00 11 down* gpu_a100n[01-06],gpu_v100n[01-05] normal up 1-00:00:00 1 alloc node43 normal up 1-00:00:00 6 idle gpu_v100n[06-08],node[44-46] ..