vLLM
cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*
- >= 0.5.2, < 0.8.5
A denial-of-service and data exposure vulnerability has been identified in vLLM, a high-throughput inference engine for large language models. This issue affects vLLM versions 0.5.2 through 0.8.5, specifically in multi-node deployments where vLLM uses ZeroMQ for inter-node communication. The primary host opens an XPUB socket bound to all interfaces, allowing any client with network access to connect and receive broadcasted data intended for secondary vLLM hosts. This internal state information, while not directly useful to an attacker, can be exploited to cause a denial-of-service by slowing down or blocking the publisher. The vulnerability has been patched in vLLM version 0.8.5.
Exploitation can lead to a denial-of-service condition on the vLLM host by disrupting the publication of data over the ZeroMQ socket, which is used for coordinating tasks in a multi-node deployment.
In a multi-node vLLM deployment, the primary host opens an XPUB ZeroMQ socket bound to all interfaces. This socket is used to broadcast internal state information to secondary vLLM hosts during tensor parallelism. Any client with network access to the primary host can connect to this socket and receive the broadcasted data. By connecting to the socket multiple times and not reading the published data, the client can slow down or block the publisher, causing a denial-of-service condition.
Users can upgrade to vLLM version 0.8.5 or later, where this vulnerability has been patched. If an upgrade is not immediately possible, ensure that the vLLM host is not exposed to untrusted networks and that only other vLLM hosts can connect to the TCP port used for the XPUB socket.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.