일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
- AMD
- v1.9
- version
- CPU
- Cray
- build
- gpfs
- Docker
- CUDA
- 1.10
- hpcm패치
- rhel
- Source
- nvidia
- Linux
- HPCM
- HPE
- patch
- Singularity
- SLURM
- java
- GPU
- HPFSS
- ubuntu
- PFSS
- client
- LUSTRE
- 1.9
- infiniband
- PBS
- Today
- Total
목록Applications/Scheduler (7)
HPE CRAY 자료 공유
PBS에서 사용자의 작업 제출 허용 시키기 # qmgr -c "set server flatuid=True" PBS에서 작업 기록 보관 허용# qmgr -c "set server job_history_enable=1" PBS에서 작업 기록 보관 주기 30일 예제(기본값 2주)# qmgr -c "set server job_history_duration=720:00:00"※ "HH:MM:SS" 형식으로 사용
1. Dependency Packages 설치 # apt install gcc gfortran make # apt install build-essential fakeroot devscripts # apt install -y munge libmunge-dev libmunge2 rng-tools python3 python3-pip libpython3-dev libssl-dev bzip2 libbz2-dev \ gcc openssl numactl hwloc lua5.3 man2html mariadb-server libmariadb-dev \ make ruby ruby-dev libmunge-dev libpam0g-dev libreadline8 libreadline-dev lz4 liblz4-dev \ libg..
RHEL 8.6 OS에 slurm + pyxis + enroot 설치 기록 1. 의존성 패지키 설치 # yum groupinstall "Development Tools" # yum install jna python3-docutils python3-devel kernel-rpm-macros \ gcc-gfortran golang bzip2-devel pam-devel readline-devel java-1.8.0-openjdk-devel \ python39 python39-devel python39-pip libatomic libatomic-static \ mariadb mariadb-server mariadb-devel tcl-devel tk-devel libseccomp-devel \ perl perl..
※ slurm gres.conf 사용을 위한 간단한 예제 - cuda toolkit 설치 $ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run $ sudo sh cuda_11.8.0_520.61.05_linux.run - rpmbuild 옵션에 "--with-nvml"을 추가 $ rpmbuild --define "_with_nvml --with-nvml=/usr/local/cuda-11.8" -ta slurm-22.05.6.tar.bz2 - gpu 라이브러리 포함 확인 $ cd ${HOME}/rpmbuild/RPMS/x86_64 $ rpm -qlp slur..
1. user 생성 # export MUNGEUSER=966 # groupadd -g $MUNGEUSER munge # useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge # export SLURMUSER=967 # groupadd -g $SLURMUSER slurm # useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm 2. Dependency Packages 설치 # apt install -y munge libmunge-dev libmung..
1. Commands User Commands PBS Slurm Job submission qsub [script_file] sbatch [script_file] Job deletion qdel [job_id] scancel [job_id] Job status (by job) qstat [job_id] squeue [job_id] Job status (by user) qstat -u [user_name] squeue -u [user_name] Job hold qhold [job_id] scontrol hold [job_id] Job release qrls [job_id] scontrol release [job_id] Queue list qstat -Q squeue Node list pbsnodes -l ..
sinfo 명령을 이용하여 "idle" 상태의 노드를 확인 합니다. $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST short* up 4:00:00 11 down* gpu_a100n[01-06],gpu_v100n[01-05] short* up 4:00:00 1 alloc node43 short* up 4:00:00 6 idle gpu_v100n[06-08],node[44-46] normal up 1-00:00:00 11 down* gpu_a100n[01-06],gpu_v100n[01-05] normal up 1-00:00:00 1 alloc node43 normal up 1-00:00:00 6 idle gpu_v100n[06-08],node[44-46] ..