HPE CRAY 자료 공유

[AMD] podman 테스트 본문

SYSTEMS/GPU

[AMD] podman 테스트

CRAY KOREA Blog 2025. 1. 22. 16:53

- OS: RHEL 9.4

- GPU: Radeon PRO W6800

 

1. Redhat OS local repo 생성

[root@cray ~]# cat /etc/yum.repos.d/local.repo 
[media-baseos]
name=BaseOS
baseurl=file:///data/REPO/rhel9.4/BaseOS
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

[media-appstream]
name=AppStream
baseurl=file:///data/REPO/rhel9.4/AppStream
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

 

2. AMD GPU driver & rocm download

- 아래 예제와 같이 사용하면 AMD에서 제공하는 repository에서 rpm파일들을 일괄 다운 받을 수 있다.

# wget -e robots=off -r -np 'https://repo.radeon.com/amdgpu/latest/rhel/9.4/main/x86_64'
# wget -e robots=off -r -np 'https://repo.radeon.com/rocm/rhel9/latest/main'

- "-e robots=off" 옵션을 이용하면 크롤링 로봇을 차단하는 기능을 끌 수 있다. 

 

3. AMD GPU repo 생성

# yum install createrepo_c.x86_64
# cd /data/REPO/AMD/amdgpu
# createrepo .
# cd /data/REPO/AMD/rocm
# createrepo .
# vi /etc/yum.repos.d/amd.repo
- - 아래 amd.repo 예제 참고 - -

 

- amd.repo 예제

[amdgpu]
name=amdgpu Packages
gpgcheck=0
enabled=1
baseurl=file:///data/REPO/AMD/amdgpu

[rocm]
name=rocm Packages
gpgcheck=0
enabled=1
baseurl=file:///data/REPO/AMD/rocm

 

4. 패키지 설치

# yum groupinstall "Development Tools"
# yum install amdgpu-dkms
# yum install rocm

 

- 설치 후 lsmod 명령을 이용해 amdgpu 모듈이 잘 로드되어져 있는지 확인한다.

[sylee@cray ~]$ lsmod | grep amdgpu
amdgpu              15462400  0
amddrm_ttm_helper      16384  1 amdgpu
amdttm                106496  2 amdgpu,amddrm_ttm_helper
amddrm_buddy           24576  1 amdgpu
amdxcp                 16384  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
drm_exec               16384  1 amdgpu
drm_suballoc_helper    16384  1 amdgpu
amd_sched              69632  1 amdgpu
amdkcl                 32768  3 amd_sched,amdttm,amdgpu
drm_display_helper    212992  1 amdgpu
drm_kms_helper        245760  4 drm_display_helper,amdgpu
video                  73728  1 amdgpu
drm                   741376  11 drm_kms_helper,drm_exec,amd_sched,amdttm,drm_suballoc_helper,drm_display_helper,amdgpu,amddrm_buddy,amddrm_ttm_helper,amdxcp

 

5. GPU 권한 udev mode 0666

# vi /etc/udev/rules.d/70-amdgpu.rules
KERNEL=="kfd", GROUP="video", MODE="0666"
# udevadm control --reload-rules && udevadm trigger

- 참고: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/prerequisites.html

 

6. selinux disabled

# grubby --update-kernel ALL --args selinux=0
# shutdown -r now

 

7. AMD GPU podman test

# yum install podman-docker.noarch podman-plugins.x86_64
# setsebool -P container_use_devices 1 

 

- container 에서 pytorch 실행 예제

$ podman pull docker.io/rocm/pytorch:latest
$ podman run -it --device /dev/kfd --device /dev/dri --net=host --security-opt=no-new-privileges --cap-drop=ALL docker.io/rocm/pytorch:latest python3
>>> import torch;
>>> torch.cuda.is_available();
True
>>> torch.cuda.current_device();
0
>>> torch.cuda.get_device_name(0);
'AMD Radeon Graphics' 

 

- container 에서 rocm-smi 실행 예제

[sylee@cray ~]$ podman run -it --device /dev/kfd --device /dev/dri --net=host --security-opt=no-new-privileges --cap-drop=ALL docker.io/rocm/pytorch:latest rocm-smi


======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device  Node  IDs              Temp    Power  Partitions          SCLK  MCLK   Fan    Perf  PwrCap  VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)  (Mem, Compute, ID)                                                 
==================================================================================================================
0       1     0x73a3,   45123  24.0°C  8.0W   N/A, N/A, 0         0Mhz  96Mhz  20.0%  auto  213.0W  0%     0%    
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================
[sylee@cray ~]$ 

- 참고: https://access.redhat.com/solutions/7073764  

'SYSTEMS > GPU' 카테고리의 다른 글

[ROCM] Unable to open /dev/kfd read-write: Permission denied  (0) 2024.06.13
[GPU] Cuda Samples Utilities  (0) 2024.05.02
[HBM] TrendForce 자료  (0) 2024.04.24
[AMD] GPU 내용 정리  (0) 2024.02.20
[NVIDIA] GPU 내용 정리  (1) 2024.01.13