vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- >= 0.10.2
A memory corruption vulnerability has been identified in vLLM, an inference and serving engine for large language models, specifically in versions 0.10.2 prior to 0.11.1. The vulnerability exists in the Completions API endpoint, where user-supplied prompt embeddings are processed. The endpoint uses torch.load() to deserialize tensors without adequate validation. Following a change in PyTorch 2.8.0 that disables sparse tensor integrity checks by default, maliciously crafted tensors can evade internal bounds checks, leading to an out-of-bounds memory write when the tensor is converted to a dense format. This memory corruption can cause vLLM to crash and potentially allow code execution on the server hosting vLLM.
Exploitation of this vulnerability can cause a denial-of-service by crashing the vLLM server. Additionally, the memory corruption could be leveraged for remote code execution on the server.
To reproduce this vulnerability, send a request to the Completions API endpoint with a base64-encoded tensor that has been crafted to bypass the default sparse tensor integrity checks. The tensor should be designed to exploit the out-of-bounds write vulnerability when deserialized with torch.load().
This vulnerability has been patched in vLLM version 0.11.1. Users should update to this version to address the issue.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.