vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- >= 0.8.0, < 0.8.5
A denial-of-service vulnerability has been identified in vLLM, a high-throughput inference engine for large language models. The issue affects versions 0.8.0 prior to 0.8.5, and arises from the input preprocessing logic of the multimodal tokenizer. The tokenizer dynamically replaces placeholder tokens for audio and image with repeated tokens based on precomputed lengths. However, due to inefficient list concatenation operations, this process has a quadratic time complexity, allowing malicious actors to cause resource exhaustion by sending specially crafted inputs. The vulnerability has been patched in version 0.8.5.
Exploitation of this vulnerability leads to significant CPU and memory exhaustion, causing a denial-of-service condition.
The vulnerability can be reproduced by sending inputs that include a large number of placeholder tokens for audio or images. For example, inputting 10,000 audio placeholder tokens can trigger approximately 100 million operations due to the quadratic time complexity of the token processing algorithm.
Users can update to vLLM version 0.8.5 or later, where this vulnerability has been fixed.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.