vLLM Denial-of-Service Vulnerability via Unvalidated Chat Template Parameters

Vulnerability

A denial-of-service vulnerability has been identified in vLLM, an inference and serving engine for large language models. This issue affects versions 0.5.5 prior to 0.11.1. The vulnerability arises in the /v1/chat/completions and /tokenize endpoints, where the chat_template_kwargs request parameter is used without proper validation. Exploiting this flaw allows an authenticated user to send requests that block the API server's processing for extended periods, delaying all other requests. The vulnerability is caused by the chat_template_kwargs being unpacked into the apply_hf_chat_template method without validation, enabling the override of optional parameters like tokenize, which can disrupt the server's event loop and hinder request handling.

Impact

Exploitation of this vulnerability leads to a significant denial-of-service condition on the vLLM server, particularly during Chat Completion or Tokenization requests.

Reproduction

To reproduce this vulnerability, send a request to the /v1/chat/completions or /tokenize endpoints with a chat_template_kwargs parameter that includes a tokenize value set to True. This will trigger the vulnerability by causing the server to process the request in a way that blocks other incoming requests.

Remediation

Users can update to vLLM version 0.11.1 or later, where this vulnerability has been patched.

Added: Nov 21, 2025, 2:19 AM
Updated: Nov 21, 2025, 2:19 AM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
2.5
exploitability
5.9
remediation
7.7
relevance
1.1
threat
4.8
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.