일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
- ubuntu
- rocm
- Kernel
- Singularity
- hpcm패치
- nvidia
- 1.10
- HPE
- gpfs
- GPU
- HPFSS
- rhel
- Docker
- build
- 1.9
- CPU
- AMD
- patch
- top500
- conda
- PFSS
- CUDA
- Source
- java
- LUSTRE
- HPCM
- SLURM
- Linux
- Cray
- infiniband
- Today
- Total
목록전체 글 (103)
HPE CRAY 자료 공유
- 오류 내용 The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'address' The error appears to be in '/root/ece-installer/ansible/config-server.yml': line 43, column 7, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: Update the /etc/hosts file with entries for all nodes here - 오류 이유 : ..
1. 조치 전 # curl -X GET "admin:9200/_cluster/health?pretty" { "cluster_name" : "hpcm_cluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 520, "active_shards" : 520, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 519, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_f..
1. Network 설정 # nmcli con mod enp0s8 ipv4.address 192.168.56.10/24 # nmcli con mod enp0s8 ipv4.method manual # nmcli con mod enp0s8 connection.autoconnect yes # nmcli con up enp0s8 2. Yum local repo 구성 [AppStream] name=AppStream baseurl=file:///mnt/AppStream enabled=1 gpgcheck=0 [BaseOS] name=BaseOS baseurl=file:///mnt/BaseOS enabled=1 gpgcheck=0 3. HOSTNAME 설정 # hostnamectl set-hostname mgmt 4...
1. /proc 에서 확인 $ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 2. nvidia-smi 명령으로 확인 $ nvidia-smi --query-gpu=driver_version --format=csv,noheader 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06 450.51.06
1. xtcheckhss 명령으로 Bus Address를 확인 후 2. xtlogin으로 blade를 접속해서 해당 Bus의 AOC 케이블 정보 확인 3. cat 명령어로 /sys/bus/i2c/devices/{BusAddr}/vendor_part_number/vendor_part_number 4. cat 명령어로 /sys/bus/i2c/devices/{BusAddr}/vendor_serial_number/vendor_serial_number [예제 : c1-0c1s14 blade] # xtcheckhss --cclist=none --bclist=c1-0c1s14 --detail=f # xtlogin c1-0c1s14 # cat /sys/bus/i2c/devices/1-0054/vendor_part_nu..
내용옵션비고"bash: orted: command not found" 오류--enable-mpirun-prefix-by-default "#PBS -V" 안될 경우--with-tm IB--with-verbs (1.8.x 버전 이후)(1.8.x 버전 이전 --with-openib)OPA--with-psm2 Lustre filesystem--with-lustre UCX--with-ucx - 참고 : OpenMPI 설치 옵션$ export LD_LIBRARY_PATH=/opt/pbs/lib:$LD_LIBRARY_PATH $ export LDFLAGS="-L/opt/pbs/lib -lpbs -lpthread -lcrypto" $ ./configure -prefix=/apps/compiler/intel/18.0...
1. Patch 파일 준비 CentOS 7.9 Kernel에 적용할 patch 파일은 github에서 down 받습니다. - URL : https://github.com/AMDEPYC/CENTOS-MILAN-SUPPORT-PATCHES 2. CentOS 7.9용 Kernel source 파일 준비 # wget https://vault.centos.org/centos/7/updates/Source/SPackages/kernel-3.10.0-1160.el7.src.rpm --no-check-certificate 3. 의존 패키지 설치 # yum install asciidoc audit-libs-devel binutils-devel bison \ elfutils-devel flex hmaccalc java-d..
1. 의존 패키지 설치 # yum groupinstall "Development Tools" # yum install openssl-devel wget cryptsetup libuuid-devel libseccomp-devel squashfs-tools 2. Golang 설치 $ wget https://golang.org/dl/go1.16.5.linux-amd64.tar.gz $ tar xvzf go1.16.5.linux-amd64.tar.gz ※ go는 PATH 설정 후 진행, 이하 진행은 module 생성 후 진행 내용 입니다. 3. singularity 설치 $ module load go/1.16.5 $ export VERSION=3.8.5 $ wget https://github.com/hpcng/..
1. 테스트 환경 HPE HPC Partner Lab znode44 2. Dockerfile 작성 및 build Dockerfile 예시 FROM tensorflow/tensorflow:latest-gpu RUN pip install tensorflow_dataset 후술하겠지만 docker 를 사용자 계정으로 실행하면 docker image에 python 패키지 설치가 용이하지 않음. 먼저 Dockerfile 을 작성하고 빌드 $ docker build -t 이미지:태그 3. slurm interactive 할당 $ srun -p short -N 1 -n 1 -w znode44 --pty bash 4. (nvidia) docker command (znode44 에서) $ docker run -u $(i..
1. user 생성 # export MUNGEUSER=966 # groupadd -g $MUNGEUSER munge # useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge # export SLURMUSER=967 # groupadd -g $SLURMUSER slurm # useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm 2. Dependency Packages 설치 # apt install -y munge libmunge-dev libmung..