SYSTEMS/HPCM

[Patch] HPCM 1.10 패치 목록

CRAY KOREA Blog 2024. 6. 3. 13:34

1. Patch 11793 - HPCM 1.10: cfirmware updates

1.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-2435e54955e04bfa

 

1.2. 패치 목록

HPCM-1765  add FW flashing support for Cray XD2000 computes

HPCM-2589  add support for iLO firmware upgrade via cfirmware

HPCM-5186  add new async_apis rpm

HPCM-5225  python library needs requests-toolbelt 1.0.0

HPCM-5297  asyncio_cmdb add to_thread for io blocking functions

HPCM-5337  aiclientsession: Add more kwargs filters

HPCM-5382  cfirmware: ModuleNotFoundError - 'requests_toolbelt'

HPCM-5383  add library dependency on async-apis in cfirmware

HPCM-5412  fix error with cfirmware mnic

HPCM-5427  clientsession_kw: regression with duplicate timeout args

HPCM-5434  unable to flash the cC controllers

HPCM-5437  cassini check fails on mountain system

  

2. Patch 11795 - HPCM 1.10: field and performance diags

2.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-67955bddeb8845ca

 

2.2. 패치 목록

HPCM-2427  HPCG for AMD Gpus

HPCM-4452  EX255a -Check rectifier status and telemetry for issues, Make sure they are running and balanced

HPCM-5061  Cluster health-Verify AMD GPU dgemm and stream test failed

HPCM-5113  Add xkdiags for EX254n

HPCM-5115  Add nvidia dgemm for EX254n

HPCM-5141  Add EX254n diagnostics

HPCM-5191  (memchk) Memory size and DIMM speed are not reported in EX254n nodes

HPCM-5345  Remove agt & AMDXIO from stout728

HPCM-5378  HPCG -local : If job fails on one node due to UME slurm kills off on all other node

HPCM-5449  Add wrapper to run rochpcg

HPCM-5466  Add babelstream binaries and script for EX255a

HPCM-5467  Add transferbench binary for EX255a

HPCM-5469  Remove rochpl & rochpcg build part and mhist the binary

HPCM-5474  Add wrapper to run rochpl

HPCM-5582  EX254n diags failure due to recipe change

HPCM-5603  Fix linpack and nvidia-gpu-xhpl on EX254n

 

3. Patch 11796 - HPCM 1.10: monitoring and clusterhealth updates

3.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-a09e73e026294f9a

 

3.2. 패치 목록

HPCM-2012  gluster-exporter causes "gluster volume status" to continuously say that locking failed

HPCM-3935  Upgrade cray-sdu-rda component

HPCM-4268  Integrate HPE Cluster View Dashboard Automation (mPhasis)

HPCM-4377  Opensearch Grafana Dashboards giving Unexpected Error

HPCM-4462  Upgrade Telegraf to newer version in HPCM

HPCM-4606  SIM dashboard are not enabled after upgrade

HPCM-4633  Alerting rule reference file needs more comments and schema file needs doc string

HPCM-4731  Alerta being retired, need to changes in AIOps for code

HPCM-4741  Make changes to the WLM dashboards and Logstash files to accommodate remlog-collect to Telegraf transition for wlm (slurm/pbs) monitoring

HPCM-4882  Increase task.shutdown.graceful.timeout.ms

HPCM-5033  WLM Telemetry Fails to Write to Timescale

HPCM-5134  After HPCM1.10 upgrade ELK and SIM services failed to start

HPCM-5190  Netchk reports inaccurate errors in the log files of EX254n/EX255a nodes

HPCM-5232  Support NVME disks in diskchk, diskperf and fsperf

HPCM-5233  Add 'loop' and 'fabric' parameter to cpuperf, cwcpuperf and fabricperf

HPCM-5242  SIM: logstash-exporter messages continue flooding in /var/log/messages after adding monitoring-services group in SIM

HPCM-5247  System monitoring (cn) timescaledb not showing data in dashboard

HPCM-5254  slurm/jobmonitor/grafana - dashboards have incorrect or missing partition information

HPCM-5277  Routing and unrouting alerts to kafka and opensearch

HPCM-5325  Routing and unrouting alerts to slack

HPCM-5329  cm health alertman: csv/json/text dump of alerts

HPCM-5356  Node down/up alert rules status not updated when heartbeat elk indices are generated

HPCM-5380  cm aiops enable should remove dependency on alerta

HPCM-5408  Alerting enable validation should continue when there is a failure instead exiting

HPCM-5409  Modify cm monitoring alerting status command output to include routing status

HPCM-5523  Add rpm dependency on clusterview-config-automation

HPCM-5548  PDU Monitoring grafana dashboard fails to load any data

HPCM-5648  sst-nginx is not installed with fresh installation

 

4. Patch 11797 - HPCM 1.10: core infrastructure updates

4.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-1cac4fd17cf241c9

 

4.2. 패치 목록

HPCM-4300  Add the DNS search path to cminfo via a new cm-configuration script

HPCM-4820  Add cfirmware support for XD6500/XD665 M4 Genoa

HPCM-5246  Fix conserver reload issue on large cluster

HPCM-5249  Set up DNS server correctly for cray-sdu-rda container for HPCM

HPCM-5375  Copy /tmp/miniroot-mgmt-network-device to /opt/clmgr/etc to handle upgrades

HPCM-5379  Document unsupported procedure to upgrade an ubuntu compute image and node

HPCM-5489  discover_skip_switchconfig has a comma in configure-cluster preventing it from being set

HPCM-5505  cfirmware sc check|update|type not working on new slingshot blade switches

HPCM-5511  uboot not rebooting after update

HPCM-5514  cfirmware cannot update cassini

HPCM-5525  gluster volumes are mounted multiple times over head and head-bmc

HPCM-5534  cfirmware fails to update cc --recovery_image

HPCM-5561  adjust file permissions for resolv.conf

HPCM-5585  missing iptables dependency for ip failover event from ctdb

HPCM-5602  admin DNS_DOMAIN set to cluster instead of house

 

5. Patch 11808 - HPCM 1.10: optional field and perf diags updates for SLES15 with CPE 23.12

5.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-299c9aa31d9341eb

 

5.2. 패치 목록

HPCM-5854  Diags failure on x86 sles15sp5 cluster because of craype lib mismatch

HPCM-6050  Rebuild xkdiags and rank for SLES CPE 23.12 (x86)

 

6. Patch 11819 - HPCM 1.10: optional sgi-admin-node update

6.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-41c26d544e454e18

 

6.2. 패치 목록

HPCM-6261  immprove default gluster nfs server mount options

 

7. Patch 11822 - HPCM 1.10: recommended timescale-sink update

7.1. 패치 정보 주소

https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-5b6839bfb8694486

 

7.2. 패치 목록

HPCM-5485 timescale sink writing empty labels