일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- PFSS
- version
- HPE
- hpcm패치
- gpfs
- CPU
- CUDA
- GPU
- patch
- conda
- Linux
- ubuntu
- Kernel
- Source
- build
- Cray
- HPCM
- AMD
- Docker
- nvidia
- Singularity
- 1.9
- infiniband
- 1.10
- LUSTRE
- rhel
- HPFSS
- top500
- java
- SLURM
- Today
- Total
목록전체 글 (98)
HPE CRAY 자료 공유
/etc/group 에 계정정보가 없는 group 확인용 # awk -F ":" '{if($4=="") print $0}' /etc/group 계정의 기본 group 계정 정보 추가 하는 스크립트 예제 #!/bin/sh while read line do user_name=$(echo $line | awk -F ":" '{print $1}') user_group=$(echo $line | awk -F ":" '{print $4}') if [ ${user_group} -eq 0 ]; then echo "disallow root" else # echo "UID: ${user_name}, GID: ${user_group}" usermod -aG ${user_group} ${user_name} fi done
[/etc/ssh/sshd_config] #Port 22 Port 22022 #PermitRootLogin yes PermitRootLogin no Match Address 192.168.0.0/24 PermitRootLogin yes ※ sshd port를 22022로 변경 ※ root 계정을 차단하고, "192.168.0.0/24"에 대해서만 root 접속을 허용 [/etc/ssh/ssh_config] Host * Port 22022 ※ 변경한 노드들 사이에 22022 포트를 이용해서 ssh 접속을 하도록 설정 서비스 재 기동 # systemctl restart sshd.service # systemctl status sshd.service
- MLDE 내용 참고 Export # cm image capture -i hpe-mlde-login-0.17.15-rhel-8.5 -n # tar -C /opt/clmgr/image/images --numeric-owner --xattrs --acls -cpvzf hpe-mlde-login-0.17.15-rhel-8.5.tar.gz hpe-mlde-login-0.17.15-rhel-8.5 Import # tar -C /opt/clmgr/image/images --xattrs --acls --xattrs-include=* -zpxvf hpe-mlde-master-0.17.15-rhel-8.5.tar.gz # cm image create -i hpe-mlde-master-0.17.15-rhel-8.5 --..
- 원인: Nvidia HPC Sdk에서 "libatomic.so.1" 라이브러리를 필요로 하지만 RedHat 8의 "Development Tools" group에 libatomic 라이브러리 미포함 - 오류 내용#1 # mpicc --version /apps/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpicc: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory - 오류 내용#2 # ldd /apps/nvidia/hpc_sdk/Linux_x86_64/22.11/co..
1. Intel 1-Socket Server(s7200AP - KNL) Event Data (RAW) Event Data 1 Event Data 2 Event Data 3 DIMM Slot a00000 a0=Correctable Error 00= N/A 00=CPU1-CH=A P1-DimmA a00001 a0=Correctable Error 00= N/A 01=CPU1-CH=B P1-DimmB a00002 a0=Correctable Error 00= N/A 02=CPU1-CH=C P1-DimmC a00003 a0=Correctable Error 00= N/A 03=CPU1-CH=D P1-DimmD a00004 a0=Correctable Error 00= N/A 04=CPU1-CH=E P1-DimmE a0..
- 오류 내용 The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'address' The error appears to be in '/root/ece-installer/ansible/config-server.yml': line 43, column 7, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: Update the /etc/hosts file with entries for all nodes here - 오류 이유 : ..
1. 조치 전 # curl -X GET "admin:9200/_cluster/health?pretty" { "cluster_name" : "hpcm_cluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 520, "active_shards" : 520, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 519, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_f..
1. Network 설정 # nmcli con mod enp0s8 ipv4.address 192.168.56.10/24 # nmcli con mod enp0s8 ipv4.method manual # nmcli con mod enp0s8 connection.autoconnect yes # nmcli con up enp0s8 2. Yum local repo 구성 [AppStream] name=AppStream baseurl=file:///mnt/AppStream enabled=1 gpgcheck=0 [BaseOS] name=BaseOS baseurl=file:///mnt/BaseOS enabled=1 gpgcheck=0 3. HOSTNAME 설정 # hostnamectl set-hostname mgmt 4...
1. /proc 에서 확인 $ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 2. nvidia-smi 명령으로 확인 $ nvidia-smi --query-gpu=driver_version --format=csv,noheader 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06
1. xtcheckhss 명령으로 Bus Address를 확인 후 2. xtlogin으로 blade를 접속해서 해당 Bus의 AOC 케이블 정보 확인 3. cat 명령어로 /sys/bus/i2c/devices/{BusAddr}/vendor_part_number/vendor_part_number 4. cat 명령어로 /sys/bus/i2c/devices/{BusAddr}/vendor_serial_number/vendor_serial_number [예제 : c1-0c1s14 blade] # xtcheckhss --cclist=none --bclist=c1-0c1s14 --detail=f # xtlogin c1-0c1s14 # cat /sys/bus/i2c/devices/1-0054/vendor_part_nu..