vLLM Multimodal Tokenizer Input Processing Denial-of-Service Vulnerability

Vulnerability

A denial-of-service vulnerability has been identified in vLLM, a high-throughput inference engine for large language models. The issue affects versions 0.8.0 prior to 0.8.5, and arises from the input preprocessing logic of the multimodal tokenizer. The tokenizer dynamically replaces placeholder tokens for audio and image with repeated tokens based on precomputed lengths. However, due to inefficient list concatenation operations, this process has a quadratic time complexity, allowing malicious actors to cause resource exhaustion by sending specially crafted inputs. The vulnerability has been patched in version 0.8.5.

Impact

Exploitation of this vulnerability leads to significant CPU and memory exhaustion, causing a denial-of-service condition.

Reproduction

The vulnerability can be reproduced by sending inputs that include a large number of placeholder tokens for audio or images. For example, inputting 10,000 audio placeholder tokens can trigger approximately 100 million operations due to the quadratic time complexity of the token processing algorithm.

Remediation

Users can update to vLLM version 0.8.5 or later, where this vulnerability has been fixed.

Added: Jun 9, 2025, 7:46 PM
Updated: Jun 9, 2025, 7:46 PM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
2.5
exploitability
6.2
remediation
7.7
relevance
0.0
threat
6.4
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.