vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- < 0.7.2
A vulnerability in vLLM, a high-throughput inference engine for large language models, arises from hash collisions in prefix caching. This issue, present in vLLM versions prior to 0.7.2, is exploited by using maliciously crafted prompts that take advantage of Python's built-in hash function. As of Python 3.12, the hash value for 'None' has become a predictable constant, increasing the risk of collisions. Exploiting this vulnerability could lead to unintended behavior by reusing cached responses generated from different content, potentially disrupting the accuracy of the model's output.
Exploitation of this vulnerability allows for prefix cache reuse based on predictable hash collisions, which can interfere with the accuracy of responses in a shared model inference environment.
The vulnerability can be reproduced by using vLLM versions prior to 0.7.2 with Python 3.12. Malicious prompts can be crafted to collide by taking advantage of the predictable hashing of 'None', leading to cache entries being mixed up during processing. This can cause responses to reflect cached data from different prompts, creating 'mixed summaries' or incorrect outputs.
Users are advised to upgrade to vLLM version 0.7.2 or later, where this vulnerability has been fixed.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.