vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- < 0.9.0
A timing-based side-channel vulnerability has been identified in vLLM, an inference engine for large language models, prior to version 0.9.0. The issue arises from the prefix caching mechanism, which can be exploited to infer prompt reuse through timing differences. When a prompt is processed, matching prefix chunks accelerate the prefill process, creating noticeable timing variations. This vulnerability could be exploited to guess sensitive inputs by measuring response times, potentially leading to the leakage of private information.
Exploitation of this vulnerability allows for timing-based side-channel attacks, where an attacker can infer cached prompt content by observing latency differences, potentially leading to the leakage of sensitive information.
The vulnerability can be reproduced by processing prompts in vLLM versions prior to 0.9.0, with the PageAttention mechanism enabled. Timing differences can be exploited by measuring the Time to First Token (TTFT) for prompts that share matching prefixes, allowing an attacker to infer cached content.
Users can update to vLLM version 0.9.0 or later, where this vulnerability has been patched.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.