vLLM Denial-of-Service Vulnerability in Idefics3 Vision Models

Vulnerability

A denial-of-service vulnerability has been identified in vLLM, an inference and serving engine for large language models, specifically in versions 0.6.4 prior to 0.12.0. The issue arises when serving multimodal models that utilize the Idefics3 vision model implementation. By sending a specially crafted 1x1 pixel image, users can trigger a tensor dimension mismatch that leads to an unhandled runtime error, causing complete server termination. This vulnerability has been patched in vLLM version 0.12.0.

Impact

Exploitation of this vulnerability causes the vLLM engine to crash, terminating the server process and disrupting service.

Reproduction

The vulnerability can be reproduced by sending a 1x1 pixel image in HWC (Height, Width, Channel) format to a vLLM server running a model that uses the Idefics3 vision implementation. The image processing will misinterpret the dimensions, leading to a fatal error when the engine attempts to split the tensor based on an incorrectly calculated number of image patches. This unhandled exception will crash the server.

Remediation

Users can upgrade to vLLM version 0.12.0 or later, where this vulnerability has been fixed.

Added: Jan 10, 2026, 7:18 AM
Updated: Jan 10, 2026, 7:18 AM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
2.5
exploitability
5.2
remediation
7.7
relevance
2.0
threat
1.6
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.