Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- ubuntu
- HPFSS
- nvidia
- rhel
- 1.10
- Kernel
- gpfs
- version
- infiniband
- SLURM
- HPCM
- Source
- Cray
- LUSTRE
- top500
- conda
- Singularity
- CPU
- java
- hpcm패치
- CUDA
- GPU
- PFSS
- build
- Linux
- Docker
- AMD
- 1.9
- HPE
- patch
Archives
- Today
- Total
HPE CRAY 자료 공유
[BMT] conda를 이용한 HPL benchmark 본문
1. conda environment 생성
$ conda create -n hpl_2.3 |
2. HPL 의존 패키지 설치
$ source activate hpl_2.3 $ conda install gcc_linux-64 gxx_linux-64 gfortran_linux-64 openmpi mkl mkl-static -c intel |
3. HPL Build
$ wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz $ tar xvzf hpl-2.3.tar.gz $ cd hpl-2.3 $ cp setup/Make.Linux_Intel64 Make.Linux_x86_64 $ vi Make.Linux_x86_64 - - - 수정사항 참고 - - - $ make arch=Linux_x86_64 |
[수정사항]
-- 생략 -- ARCH = Linux_x86_64 -- 생략 -- TOPdir = $(HOME)/HPL/hpl-2.3 -- 생략 -- MPdir = $(HOME)/miniconda3/envs/hpl_2.3 MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi -- 생략 -- LAdir = $(HOME)/miniconda3/envs/hpl_2.3 ifndef LAinc LAinc = $(LAdir)/include endif ifndef LAlib LAlib = -L$(LAdir)/lib \ -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 \ -lpthread -ldl endif -- 생략 -- CC = mpicc OMP_DEFS = -fopenmp CCFLAGS = $(HPL_DEFS) -O3 -w -z noexecstack -z relro -z now -Wall -- 생략 -- |
4. 테스트
- 입력파일(HPL.dat) 생성
[예제 : 1노드, 32코어, 메모리 512000MB, Block size 192 ]
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 231552 Ns 1 # of NBs 192 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 4 Ps 8 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ##### This line (no. 32) is ignored (it serves as a separator). ###### 0 Number of additional problem sizes for PTRANS 1200 10000 30000 values of N 0 number of additional blocking sizes for PTRANS 40 9 8 13 13 20 16 32 64 values of NB |
※ 간단한 테스트는 https://www.advancedclustering.com/act_kb/tune-hpl-dat-file 에서 생성하기를 권함.
- conda 환경 설정 및 실행
$ source activate hpl_2.3 $ cd $HOME/HPL/hpl-2.3 $ cd bin/Linux_x86_64 $ mpirun -np 32 ./xhpl |
- HPL 이론 성능(Rpeak) 계산 방법
Rpeak(GFlops) = Clock Speed(GHz) * cores * CPU sockets * FP |
[ AMD EPYC 7543 예제]
2.8 GHz * 32 core * 2 socket * 16 FP = 2867.2 GFlops |
※ 주의 : 위 페이지는 간단한 테스트 용도로 작성되어 최적화와 관련된 내용은 아닙니다.
※ 참고 : Top500 HPL Calculator(http://hpl-calculator.sourceforge.net)
'Applications > BMT관련' 카테고리의 다른 글
[BMT] HPC Benchmark list (0) | 2024.04.16 |
---|---|
[TOP500] November 2023 (0) | 2023.11.15 |
[nvidia-docker] tensorflow multi gpu test (0) | 2021.09.15 |
[pytorch] mnist (0) | 2021.08.10 |
[BMT] STREAM (0) | 2021.07.05 |