vLLM Remote Code Execution Vulnerability via Unsafe Deserialization in Mooncake Integration

Vulnerability

A remote code execution vulnerability has been identified in vLLM, a high-throughput inference engine for large language models. This issue arises when vLLM is configured to use Mooncake, as it exposes unsafe deserialization over ZMQ/TCP on all network interfaces. Attackers can exploit this vulnerability to execute remote code on distributed hosts. The problem is rooted in the Mooncake Pipe, which, by design, allows public access to certain sockets over the network, facilitating the exploitation of deserialization vulnerabilities. This vulnerability affects vLLM versions 0.6.5 through 0.8.0.

Impact

Exploitation of this vulnerability allows for remote code execution on affected hosts.

Reproduction

To reproduce this vulnerability, deploy vLLM versions 0.6.5 through 0.8.0 with Mooncake integration enabled. The Mooncake Pipe will automatically expose a socket over ZMQ/TCP, accessible to arbitrary users. When an attacker sends a payload to this socket, the vLLM process will deserialize the data using Python's pickle module, executing any embedded code on the host.

Remediation

This vulnerability has been fixed in vLLM version 0.8.0.

Added: Jun 9, 2025, 7:46 PM
Updated: Jun 9, 2025, 7:46 PM

Vulnerability Rating

Custom Algorithm
spread
2.6
impact
10.0
exploitability
5.9
remediation
7.7
relevance
0.0
threat
4.8
urgency
2.9
incentive
1.7

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.