vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- >= 0.6.1, < 0.20.0
A token injection vulnerability has been identified in vLLM, an inference and serving engine for large language models. This vulnerability exists in versions 0.6.1 prior to 0.20.0 and affects vLLM's multimodal processing. The issue arises when unauthenticated, text-only prompts include special tokens, which are then interpreted as control commands. Additionally, image and video placeholder sequences provided without corresponding data cause vLLM to access empty grids during input-position calculation. This oversight leads to an unhandled IndexError, causing the worker to terminate or degrade service availability. The vulnerability impacts multimodal paths that utilize 'image_grid_thw' or 'video_grid_thw'.
Exploitation of this vulnerability causes an unhandled exception that terminates the worker process, leading to a degradation of service availability. In some cases, the worker exit reduces capacity until a manual restart is performed.
The vulnerability can be reproduced by sending a text-only prompt that includes special vision tokens, such as '<|vision_start|><|image_pad|><|vision_end|>', without accompanying image or video data. This can be done using the OpenAI API with a model that supports multimodal processing, such as Qwen2.5-VL.
Users can update to vLLM version 0.20.0 or later, where this vulnerability has been fixed.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.