일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 |
- build
- SLURM
- ubuntu
- infiniband
- 1.9
- patch
- nvidia
- HPE
- Linux
- hpcm패치
- rhel
- GPU
- Docker
- LUSTRE
- Source
- Cray
- client
- AMD
- Singularity
- HPCM
- gpfs
- CUDA
- 1.10
- v1.9
- version
- top500
- HPFSS
- CPU
- java
- PFSS
- Today
- Total
목록Applications (48)
HPE CRAY 자료 공유
RHEL 8.6 OS에 slurm + pyxis + enroot 설치 기록 1. 의존성 패지키 설치 # yum groupinstall "Development Tools" # yum install jna python3-docutils python3-devel kernel-rpm-macros \ gcc-gfortran golang bzip2-devel pam-devel readline-devel java-1.8.0-openjdk-devel \ python39 python39-devel python39-pip libatomic libatomic-static \ mariadb mariadb-server mariadb-devel tcl-devel tk-devel libseccomp-devel \ perl perl..
※ slurm gres.conf 사용을 위한 간단한 예제 - cuda toolkit 설치 $ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run $ sudo sh cuda_11.8.0_520.61.05_linux.run - rpmbuild 옵션에 "--with-nvml"을 추가 $ rpmbuild --define "_with_nvml --with-nvml=/usr/local/cuda-11.8" -ta slurm-22.05.6.tar.bz2 - gpu 라이브러리 포함 확인 $ cd ${HOME}/rpmbuild/RPMS/x86_64 $ rpm -qlp slur..
[Compare nm-settings with ifcfg-* directives (IPv4)] nmcli con mod ifcfg-* file Effect ipv4.method manual BOOTPROTO=none IPv4 address configured statically ipv4.method auto OOTPROTO=dhcp Will look for configuration settings from a DHCPv4 server ipv4.address "192.168.0.10/24" IPADDR=192.168.0.10 PREFIX=24 Set static IPv4 address, network prefix ipv4.gateway 192.168.0.1 GATEWAY=192.168.0.1 Set IPv..
1. 의존성 패키지들 설치 # yum groupinstall "Development Tools" # yum install gcc-gfortran golang tcl-devel tk-devel 2. Environment Modules Source Build - Source Download page : https://modules.sourceforge.net - Source Build # wget https://sourceforge.net/projects/modules/files/Modules/modules-5.2.0/modules-5.2.0.tar.gz/download -O modules-5.2.0.tar.gz # tar xvzf modules-5.2.0.tar.gz # cd modules-5.2.0 # ..
# ipmitool No command provided! Commands: raw Send a RAW IPMI request and print response i2c Send an I2C Master Write-Read command and print response spd Print SPD info from remote I2C device lan Configure LAN Channels chassis Get chassis status and set power state power Shortcut to chassis power commands event Send pre-defined events to MC mc Management Controller status and global enables sdr ..
- Linux Openfile 개수 # cat /proc/sys/fs/file-nr - drop_caches pagecache cache clear # echo 1 > /proc/sys/vm/drop_caches dentries, inodes cache clear # echo 2 > /proc/sys/vm/drop_caches pagecache, dentries, inodes cache clear # echo 3 > /proc/sys/vm/drop_caches
conda를 이용하여 offline 환경에 MLDE 0.19.8 버전 사용 방법 정리 conda pack을 이용한 패키지 내보내기 $ conda create -n mlde_0.19.8 python=3.8 $ source activate mlde_0.19.8 $ conda install conda-pack $ pip install "determined==0.19.8" "msrest==0.6.21" "backoff==1.10.0" "azure_core==1.22.1" $ conda pack -n mlde_0.19.8 -o mlde_0.19.8.tar.gz $ conda deactivate conda unpack을 이용한 패키지 설치 $ mkdir -p mlde_0.19.8 $ cd mlde_0.19.8 $ ..
/etc/group 에 계정정보가 없는 group 확인용 # awk -F ":" '{if($4=="") print $0}' /etc/group 계정의 기본 group 계정 정보 추가 하는 스크립트 예제 #!/bin/sh while read line do user_name=$(echo $line | awk -F ":" '{print $1}') user_group=$(echo $line | awk -F ":" '{print $4}') if [ ${user_group} -eq 0 ]; then echo "disallow root" else # echo "UID: ${user_name}, GID: ${user_group}" usermod -aG ${user_group} ${user_name} fi done
[/etc/ssh/sshd_config] #Port 22 Port 22022 #PermitRootLogin yes PermitRootLogin no Match Address 192.168.0.0/24 PermitRootLogin yes ※ sshd port를 22022로 변경 ※ root 계정을 차단하고, "192.168.0.0/24"에 대해서만 root 접속을 허용 [/etc/ssh/ssh_config] Host * Port 22022 ※ 변경한 노드들 사이에 22022 포트를 이용해서 ssh 접속을 하도록 설정 서비스 재 기동 # systemctl restart sshd.service # systemctl status sshd.service
- 원인: Nvidia HPC Sdk에서 "libatomic.so.1" 라이브러리를 필요로 하지만 RedHat 8의 "Development Tools" group에 libatomic 라이브러리 미포함 - 오류 내용#1 # mpicc --version /apps/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpicc: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory - 오류 내용#2 # ldd /apps/nvidia/hpc_sdk/Linux_x86_64/22.11/co..