llama.cpp Out-of-Bounds Write Vulnerability in Completion Endpoints Allowing Memory Corruption and Remote Code Execution

Vulnerability

A vulnerability in llama.cpp, an inference implementation for various large language models, allows for out-of-bounds memory writes that can lead to process crashes or remote code execution. This issue arises in versions through commit 55d4206c8, where the n_discard parameter is improperly validated before being parsed from JSON input in the server's completion endpoints. When a negative value is introduced and the context becomes full, it causes a desynchronization in the key-value cache, creating an opportunity for memory corruption during token evaluation. The vulnerability is exploitable through the public completions, chat completions, and slots resume endpoints, particularly when the server is running with context shift enabled.

Impact

Exploitation of this vulnerability causes remote, unauthenticated out-of-bounds writes that can crash the server process or allow arbitrary code execution. The issue affects any llama.cpp deployment with the HTTP server context shift feature activated, on both CPU and GPU builds.

Reproduction

To reproduce this vulnerability, build the llama.cpp server with CUDA support and AddressSanitizer enabled, then launch the server with context shift activated. Afterward, send a POST request to the server's completions endpoint, including a negative n_discard value. The server will log the context shift and n_discard value, followed by reports of key-value inconsistency or out-of-bounds memory access, indicating successful exploitation. In a non-sanitized build, this memory corruption occurs silently, paving the way for remote code execution.

Added: Jan 8, 2026, 12:17 AM
Updated: Jan 8, 2026, 12:17 AM

Vulnerability Rating

Custom Algorithm
spread
0.0
impact
10.0
exploitability
8.7
remediation
0.0
relevance
1.9
threat
6.4
urgency
2.9
incentive
5.8

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.