vllm Uninitialized Resource Vulnerability in KV Block Handler

Vulnerability

A vulnerability exists in vllm versions through 0.19.0 within the KV Block Handler component. The issue arises in the 'has_mamba_layers' function of 'vllm/v1/kv_cache_interface.py', where the block allocator returns GPU KV cache blocks to the free pool without clearing their contents. This flaw allows a subsequent request to access stale data from a previous one, disrupting the expected data flow. The vulnerability can be exploited remotely, particularly in multi-tenant environments, where it could cause one user's data to inadvertently influence another's. The issue has been publicly disclosed and is challenging to exploit, but a proof-of-concept exploit is available.

Impact

Exploitation leads to the use of uninitialized resources, causing KV cache blocks to carry over stale data from previous requests. This can result in non-deterministic outputs, especially at lower temperatures where consistent responses are expected. In a shared environment, it could allow one user's data to affect another's responses.

Reproduction

The vulnerability can be reproduced by sending concurrent requests to a vllm server running the affected version, without prefix caching or any special configurations. This can be done using the vllm OpenAI API server entry point, with a model that supports variable-length sequences. After starting the server, the vulnerability can be triggered by sending a mix of short and long requests, which will demonstrate the KV block corruption issue.

Remediation

Users are advised to update to vllm version 0.19.1 or later, where this vulnerability has been fixed.

Added: Apr 27, 2026, 5:21 PM
Updated: Apr 27, 2026, 5:21 PM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
0.6
exploitability
5.8
remediation
7.7
relevance
6.8
threat
6.4
urgency
2.9
incentive
0.0

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.