Linux Kernel GICv3 Quirk for NVIDIA T241-FABRIC-4 Erratum
Vulnerability
A vulnerability exists in the Linux kernel's handling of the ARM Generic Interrupt Controller (GIC) on NVIDIA T241 platforms. This issue arises from a hardware erratum (T241-FABRIC-4) that causes GIC state corruption when multiple transactions are received simultaneously from different sources, leading to problems such as kernel panics. The vulnerability affects NVIDIA server platforms with more than two interconnected T241 chips, each supporting 320 {E}SPIs. The erratum causes packets from different GICs to be incorrectly interleaved, disrupting the expected flow of data and potentially corrupting the GIC state.
Impact
The vulnerability can cause GIC state corruption, leading to kernel panics and other unexpected behaviors on the system.
Reproduction
The vulnerability can be reproduced on an NVIDIA T241 platform by sending multiple inter-socket AXI4 Stream packets with multiple transfers from different GICs simultaneously. This can be done by using GICv3 commands that generate multiple transfer packets over the inter-socket AXI4 Stream interface, such as register reads from GICD_I* and GICD_N*, writes to 64-bit GICD registers (except GICD_IROUTERn*), and ITS command MOVALL. In a system with more than two T241 chips, this interleaving can occur, causing GIC state corruption and the associated problems.
Remediation
The Linux kernel has been updated to include a workaround for this issue. The workaround ensures that read accesses to the GICD_In{E} registers are directed to the chip that owns the SPI and disables GICv4.x features. Instructions for applying this update can be found in the Linux kernel's official documentation.
Vulnerability Rating
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.
