[BMT] STREAM

CRAY KOREA Blog 2021. 7. 5. 13:59

2021. 7. 5. 13:59

STREAM: Sustainable Memory Bandwidth in High Performance Computers

- 참고 : https://www.cs.virginia.edu/stream

1. Source code download

$ wget https://www.cs.virginia.edu/stream/FTP/Code/Versions/stream_omp.c

2. Compile

$ gcc -O3 -fopenmp -D OPENMP stream_omp.c -o stream_c.x

3. test

- test script 작성

#!/bin/sh
#SBATCH -J stream
#SBATCH -p short
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err
#SBATCH --time=01:00:00

export OMP_NUM_THREADS=32

/home/hpcd/leesangy/stream/stream_c.x

exit 0

- 작업 제출

$ sbatch job.sh

- 결과는 아래와 같이 출력 됩니다.

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
Number of Threads requested = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 87 microseconds.
   (= 87 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:      681308.2640       0.0001       0.0000       0.0001
Scale:     604584.3604       0.0001       0.0001       0.0001
Add:       696631.8062       0.0001       0.0001       0.0001
Triad:     696631.8062       0.0001       0.0001       0.0001
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

- 테스트#1 : compiler : gcc 4.8.5, option : -O2 -fopenmp

Threads	Copy(MB/s)	Scale(MB/s)	Add(MB/s)	Triad(MB/s)
1	35394.9705	21461.1014	34215.9402	31516.3732
10	200026.4203	110014.5311	202541.8431	129387.2699
24	371794.2604	251815.6248	390167.8140	300039.6304
32	464421.2042	323416.2120	480493.0597	377723.4371
64	464421.2042	438620.0261	479349.0286	475949.3901

- 테스트#2 : compiler : gcc 9.2.0, option : -O3 -fopenmp -march=znver1

Threads	Copy(MB/s)	Scale(MB/s)	Add(MB/s)	Triad(MB/s)
1	51286.8659	36611.4915	49236.1438	45579.9393
10	271146.9253	162491.1961	177068.2427	175218.9661
24	593883.7522	532610.0317	608237.4381	608237.4381
32	681308.2640	604584.3604	696631.8062	696631.8062
64	654720.6244	651542.3689	748425.9926	751218.6269

# AMD EPYC™ 7xx1 Series Processors 최적화 옵션 "-march=znver1" 사용을 위하여 gcc 9.2 컴파일러 사용

# 참고 : https://www.amd.com/system/files/documents/compiler-options-guide-amd-epyc-7xx1-series-processors.pdf

4. 기타(테스트 환경)

구분	상세 정보	비고
Hostname	aixl645dn04
Platform	Apollo 6500 Gen10+ XL645d (Bantha)
Processor	1 x 7543	hyperthreading enabled
OS	Ubuntu 20.04
Memory	8 x 64GB 3200MHz = 512GB
GPU	4 x A100-SXM4-40GB

'Applications > BMT관련' 카테고리의 다른 글

[BMT] HPC Benchmark list (0)	2024.04.16
[TOP500] November 2023 (0)	2023.11.15
[nvidia-docker] tensorflow multi gpu test (0)	2021.09.15
[pytorch] mnist (0)	2021.08.10
[BMT] conda를 이용한 HPL benchmark (0)	2021.07.08

HPE CRAY 자료 공유

[BMT] STREAM

'Applications > BMT관련' 카테고리의 다른 글

+ Recent posts

티스토리툴바