vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- >= 0.5.5
A denial-of-service vulnerability has been identified in vLLM, an inference and serving engine for large language models. This issue affects versions 0.5.5 prior to 0.11.1. The vulnerability arises in the /v1/chat/completions and /tokenize endpoints, where the chat_template_kwargs request parameter is used without proper validation. Exploiting this flaw allows an authenticated user to send requests that block the API server's processing for extended periods, delaying all other requests. The vulnerability is caused by the chat_template_kwargs being unpacked into the apply_hf_chat_template method without validation, enabling the override of optional parameters like tokenize, which can disrupt the server's event loop and hinder request handling.
Exploitation of this vulnerability leads to a significant denial-of-service condition on the vLLM server, particularly during Chat Completion or Tokenization requests.
To reproduce this vulnerability, send a request to the /v1/chat/completions or /tokenize endpoints with a chat_template_kwargs parameter that includes a tokenize value set to True. This will trigger the vulnerability by causing the server to process the request in a way that blocks other incoming requests.
Users can update to vLLM version 0.11.1 or later, where this vulnerability has been patched.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.