HPE CRAY 자료 공유

[ROCM] Unable to open /dev/kfd read-write: Permission denied 본문

SYSTEMS/GPU

[ROCM] Unable to open /dev/kfd read-write: Permission denied

CRAY KOREA Blog 2024. 6. 13. 14:10

일반 사용자 계정으로 rocminfo 명령 실행 시 오류 해결 방법

 

오류 내용

[sylee@cray ~]$ rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
sylee is not member of "video" group, the default DRM access group. Users must be a member of the "video" group or another DRM access
 group in order for ROCm applications to run successfully.

- 일반계정(예: sylee)으로 AMD GPU의 rocminfo 명령 실행 시 /dev/kfd 디바이스에 권한 오류가 발생

 

문제 진단

[sylee@cray ~]$ ls -l /dev/kfd 
crw-rw----. 1 root video 242, 0 Jun 10 14:33 /dev/kfd

[sylee@cray ~]$ getent group video
video:x:39:

[sylee@cray ~]$ id
uid=5003(sylee) gid=5003(sylee) groups=5003(sylee) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

- /dev/kfd의 그룹은 video이고, 테스트 시 사용한 계정 sylee는 video 그룹에 등록되어져 있지 않다.

 

video 그룹에 sylee 계정 추가

[sylee@cray ~]$ sudo usermod -aG video sylee

 

video 그룹 계정 추가 확인

[sylee@cray ~]$ getent group video
video:x:39:sylee

[sylee@cray ~]$ id
uid=5003(sylee) gid=5003(sylee) groups=5003(sylee),39(video) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

 

rocminfo 재 실행

[sylee@cray ~]$ rocminfo 
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB     
- - - - 이 하 생 략 - - - - 

 

'SYSTEMS > GPU' 카테고리의 다른 글

[GPU] Cuda Samples Utilities  (0) 2024.05.02
[HBM] TrendForce 자료  (0) 2024.04.24
[AMD] GPU 내용 정리  (0) 2024.02.20
[NVIDIA] GPU 내용 정리  (1) 2024.01.13
[NVIDIA] nvidia-smi 항목 설명  (0) 2021.07.28