NVIDIA has released urgent security advisories for multiple vulnerabilities in its Hopper HGX 8-GPU High-Performance Computing platforms. A critical flaw (CVE-2024-0114, CVSS 8.1) allows unauthorized code execution, privilege escalation, and data compromise.
A medium-severity vulnerability (CVE-2024-0141, CVSS 6.8) in the GPU vBIOS layer The vulnerabilities could enable denial-of-service attacks through unsupported registry writes, affecting critical infrastructure in AI/ML clusters, supercomputing environments, and enterprise data centers using NVIDIA’s HGX architecture.
CVE-2024-0114: HMC Privilege Escalation:
The HGX Management Controller (HMC), responsible for managing GPU resources and firmware updates across multiple GPUs, has an authentication bypass vulnerability.
Attackers with admin access to the Baseboard Management Controller (BMC), often accessible via IPMI or Redfish interfaces, can elevate their privileges to HMC administrator level, gaining full control over:
Code Execution: Run harmful programs through the HMC’s firmware update process, affecting all connected GPUs.
Data Tampering: Modify GPU compute workloads or training datasets in AI pipelines.
Lateral Movement: Exploit HMC’s intra-node communication (NVLink/NVSwitch) to propagate across GPU clusters.
NVIDIA’s advisory states that exploit chains can remain effective after reboots because HMC’s persistence layer saves configuration data in non-volatile flash memory.
CVE-2024-0141: vBIOS Registry Corruption:
The GPU vBIOS vulnerability lets cloud users or containerized applications write to restricted hardware registers. This disrupts the GPU’s power management and memory controllers, causing systemic issues. Exploiting this can render GPUs unresponsive, necessitating physical resets or BMC-level hard resets to recover.
Affected Firmware Versions and Mitigation:
Component Vulnerable Firmware Versions Patched Version
HMC Controller HGX-22.10-1-rc67 (1.5.0) 1.6.0+
HGX-22.10-1-rc63 (1.4.0)
HGX-22.10-1-rc59 (1.3.2)
GPU vBIOS All versions prior to 1.6.0 1.6.0+
Administrators must:
Isolate BMC Interfaces by strictly segmenting the network for IPMI/Redfish endpoints and using certificate-based authentication.
Apply Firmware Updates with NVIDIA’s nvfwupd tool to install HMC 1.6.0 or higher.
Audit Tenant Permissions by limiting GPU passthrough in virtualized environments to avoid vBIOS exploits.
These vulnerabilities highlight systemic risks in computational acceleration platforms where hardware controllers and firmware run with high privileges.