Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
Tags
- patch
- SLURM
- hpcm패치
- GPU
- HPFSS
- HPCM
- 1.9
- ubuntu
- top500
- PFSS
- CPU
- build
- rhel
- Kernel
- java
- Linux
- CUDA
- LUSTRE
- nvidia
- conda
- AMD
- Singularity
- Source
- rocm
- infiniband
- 1.10
- HPE
- Docker
- gpfs
- Cray
Archives
- Today
- Total
HPE CRAY 자료 공유
[AMD] podman 테스트 본문
- OS: RHEL 9.4
- GPU: Radeon PRO W6800
1. Redhat OS local repo 생성
[root@cray ~]# cat /etc/yum.repos.d/local.repo [media-baseos] name=BaseOS baseurl=file:///data/REPO/rhel9.4/BaseOS gpgcheck=1 enabled=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release [media-appstream] name=AppStream baseurl=file:///data/REPO/rhel9.4/AppStream gpgcheck=1 enabled=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release |
2. AMD GPU driver & rocm download
- 아래 예제와 같이 사용하면 AMD에서 제공하는 repository에서 rpm파일들을 일괄 다운 받을 수 있다.
# wget -e robots=off -r -np 'https://repo.radeon.com/amdgpu/latest/rhel/9.4/main/x86_64' # wget -e robots=off -r -np 'https://repo.radeon.com/rocm/rhel9/latest/main' |
- "-e robots=off" 옵션을 이용하면 크롤링 로봇을 차단하는 기능을 끌 수 있다.
3. AMD GPU repo 생성
# yum install createrepo_c.x86_64 # cd /data/REPO/AMD/amdgpu # createrepo . # cd /data/REPO/AMD/rocm # createrepo . # vi /etc/yum.repos.d/amd.repo - - 아래 amd.repo 예제 참고 - - |
- amd.repo 예제
[amdgpu] name=amdgpu Packages gpgcheck=0 enabled=1 baseurl=file:///data/REPO/AMD/amdgpu [rocm] name=rocm Packages gpgcheck=0 enabled=1 baseurl=file:///data/REPO/AMD/rocm |
4. 패키지 설치
# yum groupinstall "Development Tools" # yum install amdgpu-dkms # yum install rocm |
- 설치 후 lsmod 명령을 이용해 amdgpu 모듈이 잘 로드되어져 있는지 확인한다.
[sylee@cray ~]$ lsmod | grep amdgpu amdgpu 15462400 0 amddrm_ttm_helper 16384 1 amdgpu amdttm 106496 2 amdgpu,amddrm_ttm_helper amddrm_buddy 24576 1 amdgpu amdxcp 16384 1 amdgpu i2c_algo_bit 16384 1 amdgpu drm_exec 16384 1 amdgpu drm_suballoc_helper 16384 1 amdgpu amd_sched 69632 1 amdgpu amdkcl 32768 3 amd_sched,amdttm,amdgpu drm_display_helper 212992 1 amdgpu drm_kms_helper 245760 4 drm_display_helper,amdgpu video 73728 1 amdgpu drm 741376 11 drm_kms_helper,drm_exec,amd_sched,amdttm,drm_suballoc_helper,drm_display_helper,amdgpu,amddrm_buddy,amddrm_ttm_helper,amdxcp |
5. GPU 권한 udev mode 0666
# vi /etc/udev/rules.d/70-amdgpu.rules KERNEL=="kfd", GROUP="video", MODE="0666" # udevadm control --reload-rules && udevadm trigger |
- 참고: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/prerequisites.html
6. selinux disabled
# grubby --update-kernel ALL --args selinux=0 # shutdown -r now |
7. AMD GPU podman test
# yum install podman-docker.noarch podman-plugins.x86_64
# setsebool -P container_use_devices 1
- container 에서 pytorch 실행 예제
$ podman pull docker.io/rocm/pytorch:latest $ podman run -it --device /dev/kfd --device /dev/dri --net=host --security-opt=no-new-privileges --cap-drop=ALL docker.io/rocm/pytorch:latest python3 >>> import torch; >>> torch.cuda.is_available(); True >>> torch.cuda.current_device(); 0 >>> torch.cuda.get_device_name(0); 'AMD Radeon Graphics' |
- container 에서 rocm-smi 실행 예제
[sylee@cray ~]$ podman run -it --device /dev/kfd --device /dev/dri --net=host --security-opt=no-new-privileges --cap-drop=ALL docker.io/rocm/pytorch:latest rocm-smi ======================================== ROCm System Management Interface ======================================== ================================================== Concise Info ================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ================================================================================================================== 0 1 0x73a3, 45123 24.0°C 8.0W N/A, N/A, 0 0Mhz 96Mhz 20.0% auto 213.0W 0% 0% ================================================================================================================== ============================================== End of ROCm SMI Log =============================================== [sylee@cray ~]$ |
'SYSTEMS > GPU' 카테고리의 다른 글
[ROCM] Unable to open /dev/kfd read-write: Permission denied (0) | 2024.06.13 |
---|---|
[GPU] Cuda Samples Utilities (0) | 2024.05.02 |
[HBM] TrendForce 자료 (0) | 2024.04.24 |
[AMD] GPU 내용 정리 (0) | 2024.02.20 |
[NVIDIA] GPU 내용 정리 (1) | 2024.01.13 |