vLLM Cache Salting Vulnerability Allows Timing-Based Side-Channel Attacks

Vulnerability

A timing-based side-channel vulnerability has been identified in vLLM, an inference engine for large language models, prior to version 0.9.0. The issue arises from the prefix caching mechanism, which can be exploited to infer prompt reuse through timing differences. When a prompt is processed, matching prefix chunks accelerate the prefill process, creating noticeable timing variations. This vulnerability could be exploited to guess sensitive inputs by measuring response times, potentially leading to the leakage of private information.

Impact

Exploitation of this vulnerability allows for timing-based side-channel attacks, where an attacker can infer cached prompt content by observing latency differences, potentially leading to the leakage of sensitive information.

Reproduction

The vulnerability can be reproduced by processing prompts in vLLM versions prior to 0.9.0, with the PageAttention mechanism enabled. Timing differences can be exploited by measuring the Time to First Token (TTFT) for prompts that share matching prefixes, allowing an attacker to infer cached content.

Remediation

Users can update to vLLM version 0.9.0 or later, where this vulnerability has been patched.

Added: Jun 9, 2025, 7:46 PM
Updated: Jun 9, 2025, 7:46 PM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
2.5
exploitability
6.9
remediation
7.7
relevance
0.0
threat
4.8
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.