llama.cpp Integer Overflow Vulnerability Leading to Heap Buffer Overflow and Remote Code Execution

Vulnerability

An integer overflow vulnerability has been identified in the `ggml_nbytes` function of llama.cpp, an inference library for various LLM models written in C/C++. This vulnerability, present in versions prior to b7824, allows attackers to craft GGUF files with specific tensor dimensions that bypass memory validation. The manipulated tensor dimensions cause the `ggml_nbytes` function to return a drastically reduced size, leading to a heap-based buffer overflow when the application processes the tensor. This memory corruption vulnerability opens the door to potential remote code execution.

Impact

Exploitation of this vulnerability causes a heap-based buffer overflow, leading to memory corruption and allowing for remote code execution.

Reproduction

To reproduce this vulnerability, create a GGUF file with a tensor of type `GGML_TYPE_F32` and dimensions that include `4398046511105` (which is `2^42 + 1`). When this file is loaded, the integer overflow occurs, causing the application to miscalculate the required memory size, leading to a buffer overflow when the tensor is processed.

Remediation

Users can update to llama.cpp version b7824 or later, where this vulnerability has been fixed.

Added: Mar 24, 2026, 1:25 AM
Updated: Mar 24, 2026, 1:25 AM

Vulnerability Rating

Custom Algorithm
spread
0.0
impact
7.5
exploitability
7.0
remediation
0.0
relevance
4.3
threat
6.4
urgency
2.9
incentive
0.0

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.