vLLM Denial-of-Service Vulnerability via Improperly Shaped Multimodal Embedding Inputs

Vulnerability

A denial-of-service vulnerability has been identified in vLLM, an inference and serving engine for large language models. This issue affects versions 0.5.5 through prior to 0.11.1. The vulnerability arises when users pass multimodal embedding inputs that have the correct number of dimensions but an incorrect shape, such as a mismatched hidden dimension. This can cause the vLLM engine to crash while serving models, regardless of whether the specific model is designed to handle such inputs. The problem originated with the introduction of image embedding support in version 0.5.5.

Impact

Exploitation of this vulnerability leads to a crash of the vLLM engine, causing a denial-of-service condition.

Reproduction

To reproduce this vulnerability, first ensure that the vLLM engine is running a version between 0.5.5 and prior to 0.11.1. Then, send a request to a multimodal model that includes embedding inputs with the correct number of dimensions but an incorrect shape, such as an improper hidden dimension size. This can be done through the vLLM API by using the 'image_embeds' key in the 'multi_modal_data' dictionary, or by using the 'prompt_embeds' key for text embeddings. The engine will crash upon processing the malformed embeddings.

Remediation

Users can update to vLLM version 0.11.1 or later, where this vulnerability has been patched. For those unable to update, the '--limit-mm-per-prompt' flag can be set to 0 to disable non-text multimodal inputs, although this will restrict the use of multimodal models.

Added: Nov 21, 2025, 2:21 AM
Updated: Nov 21, 2025, 2:21 AM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
2.5
exploitability
5.9
remediation
7.9
relevance
1.1
threat
4.8
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.