vLLM Memory Corruption Vulnerability in Completions API Endpoint Allows Denial-of-Service and Potential Remote Code Execution

Vulnerability

A memory corruption vulnerability has been identified in vLLM, an inference and serving engine for large language models, specifically in versions 0.10.2 prior to 0.11.1. The vulnerability exists in the Completions API endpoint, where user-supplied prompt embeddings are processed. The endpoint uses torch.load() to deserialize tensors without adequate validation. Following a change in PyTorch 2.8.0 that disables sparse tensor integrity checks by default, maliciously crafted tensors can evade internal bounds checks, leading to an out-of-bounds memory write when the tensor is converted to a dense format. This memory corruption can cause vLLM to crash and potentially allow code execution on the server hosting vLLM.

Impact

Exploitation of this vulnerability can cause a denial-of-service by crashing the vLLM server. Additionally, the memory corruption could be leveraged for remote code execution on the server.

Reproduction

To reproduce this vulnerability, send a request to the Completions API endpoint with a base64-encoded tensor that has been crafted to bypass the default sparse tensor integrity checks. The tensor should be designed to exploit the out-of-bounds write vulnerability when deserialized with torch.load().

Remediation

This vulnerability has been patched in vLLM version 0.11.1. Users should update to this version to address the issue.

Added: Nov 21, 2025, 2:22 AM
Updated: Nov 21, 2025, 2:22 AM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
10.0
exploitability
5.9
remediation
8.3
relevance
1.1
threat
4.8
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.