일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 |
- HPFSS
- client
- v1.9
- HPCM
- Singularity
- ubuntu
- GPU
- Cray
- PFSS
- rhel
- hpcm패치
- patch
- infiniband
- top500
- SLURM
- gpfs
- java
- nvidia
- Docker
- 1.9
- LUSTRE
- 1.10
- build
- Source
- version
- CPU
- Linux
- CUDA
- AMD
- HPE
- Today
- Total
목록분류 전체보기 (92)
HPE CRAY 자료 공유
1. 의존성 패키지들 설치 # yum groupinstall "Development Tools" # yum install gcc-gfortran golang tcl-devel tk-devel 2. Environment Modules Source Build - Source Download page : https://modules.sourceforge.net - Source Build # wget https://sourceforge.net/projects/modules/files/Modules/modules-5.2.0/modules-5.2.0.tar.gz/download -O modules-5.2.0.tar.gz # tar xvzf modules-5.2.0.tar.gz # cd modules-5.2.0 # ..
# ipmitool No command provided! Commands: raw Send a RAW IPMI request and print response i2c Send an I2C Master Write-Read command and print response spd Print SPD info from remote I2C device lan Configure LAN Channels chassis Get chassis status and set power state power Shortcut to chassis power commands event Send pre-defined events to MC mc Management Controller status and global enables sdr ..
- Linux Openfile 개수 # cat /proc/sys/fs/file-nr - drop_caches pagecache cache clear # echo 1 > /proc/sys/vm/drop_caches dentries, inodes cache clear # echo 2 > /proc/sys/vm/drop_caches pagecache, dentries, inodes cache clear # echo 3 > /proc/sys/vm/drop_caches
conda를 이용하여 offline 환경에 MLDE 0.19.8 버전 사용 방법 정리 conda pack을 이용한 패키지 내보내기 $ conda create -n mlde_0.19.8 python=3.8 $ source activate mlde_0.19.8 $ conda install conda-pack $ pip install "determined==0.19.8" "msrest==0.6.21" "backoff==1.10.0" "azure_core==1.22.1" $ conda pack -n mlde_0.19.8 -o mlde_0.19.8.tar.gz $ conda deactivate conda unpack을 이용한 패키지 설치 $ mkdir -p mlde_0.19.8 $ cd mlde_0.19.8 $ ..
/etc/group 에 계정정보가 없는 group 확인용 # awk -F ":" '{if($4=="") print $0}' /etc/group 계정의 기본 group 계정 정보 추가 하는 스크립트 예제 #!/bin/sh while read line do user_name=$(echo $line | awk -F ":" '{print $1}') user_group=$(echo $line | awk -F ":" '{print $4}') if [ ${user_group} -eq 0 ]; then echo "disallow root" else # echo "UID: ${user_name}, GID: ${user_group}" usermod -aG ${user_group} ${user_name} fi done
[/etc/ssh/sshd_config] #Port 22 Port 22022 #PermitRootLogin yes PermitRootLogin no Match Address 192.168.0.0/24 PermitRootLogin yes ※ sshd port를 22022로 변경 ※ root 계정을 차단하고, "192.168.0.0/24"에 대해서만 root 접속을 허용 [/etc/ssh/ssh_config] Host * Port 22022 ※ 변경한 노드들 사이에 22022 포트를 이용해서 ssh 접속을 하도록 설정 서비스 재 기동 # systemctl restart sshd.service # systemctl status sshd.service
- MLDE 내용 참고 Export # cm image capture -i hpe-mlde-login-0.17.15-rhel-8.5 -n # tar -C /opt/clmgr/image/images --numeric-owner --xattrs --acls -cpvzf hpe-mlde-login-0.17.15-rhel-8.5.tar.gz hpe-mlde-login-0.17.15-rhel-8.5 Import # tar -C /opt/clmgr/image/images --xattrs --acls --xattrs-include=* -zpxvf hpe-mlde-master-0.17.15-rhel-8.5.tar.gz # cm image create -i hpe-mlde-master-0.17.15-rhel-8.5 --..
- 원인: Nvidia HPC Sdk에서 "libatomic.so.1" 라이브러리를 필요로 하지만 RedHat 8의 "Development Tools" group에 libatomic 라이브러리 미포함 - 오류 내용#1 # mpicc --version /apps/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpicc: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory - 오류 내용#2 # ldd /apps/nvidia/hpc_sdk/Linux_x86_64/22.11/co..
1. Intel 1-Socket Server(s7200AP - KNL) Event Data (RAW) Event Data 1 Event Data 2 Event Data 3 DIMM Slot a00000 a0=Correctable Error 00= N/A 00=CPU1-CH=A P1-DimmA a00001 a0=Correctable Error 00= N/A 01=CPU1-CH=B P1-DimmB a00002 a0=Correctable Error 00= N/A 02=CPU1-CH=C P1-DimmC a00003 a0=Correctable Error 00= N/A 03=CPU1-CH=D P1-DimmD a00004 a0=Correctable Error 00= N/A 04=CPU1-CH=E P1-DimmE a0..
- 오류 내용 The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'address' The error appears to be in '/root/ece-installer/ansible/config-server.yml': line 43, column 7, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: Update the /etc/hosts file with entries for all nodes here - 오류 이유 : ..