llama.cpp Buffer Overflow Vulnerability in Vocabulary Loading Prior to Version b5662

Vulnerability

A buffer overflow vulnerability has been identified in llama.cpp, an inference library for various LLM models, prior to version b5662. The issue arises in the vocabulary-loading code, specifically within the function 'llama_vocab::impl::token_to_piece()'. Here, an attacker-supplied GGUF model vocabulary can trigger the vulnerability by providing a token length that exceeds the maximum limit. This oversized length bypasses the length check, leading to an unchecked 'memcpy' operation that allows for arbitrary memory corruption. Such memory corruption can potentially be exploited for code execution.

Impact

Exploitation of this vulnerability causes a buffer overflow, allowing for arbitrary memory corruption. This memory corruption can lead to application crashes, instability, or, in some cases, remote code execution by overwriting critical data such as return addresses or vtable pointers. Additionally, the vulnerability can cause a denial-of-service condition, especially when the application is running under memory sanitizers, which can detect such errors and cause the program to crash.

Reproduction

To reproduce this vulnerability, load a GGUF model file that contains a vocabulary entry with a token size exceeding INT32_MAX. During the model loading process, the 'token_to_piece()' function will be called, where the oversized token length will bypass the length check and trigger the buffer overflow. This can also be reproduced through the public API 'llama_token_to_piece()' or during detokenization processes.

Remediation

Users should update to llama.cpp version b5662 or later, where this vulnerability has been patched.

Added: Jun 17, 2025, 8:17 PM
Updated: Jun 17, 2025, 9:51 PM

Vulnerability Rating

Custom Algorithm
spread
0.0
impact
10.0
exploitability
7.4
remediation
7.7
relevance
0.2
threat
4.8
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.