Hugging Face Smolagents Sandbox Escape Vulnerability Leading to Remote Code Execution

Vulnerability

A sandbox escape vulnerability allowing remote code execution has been identified in Hugging Face Smolagents version 1.14.0. The issue arises from the local_python_executor.py module, which fails to properly restrict Python code execution. Despite implementing static and dynamic checks, the module allows exploitation of whitelisted functions and modules to execute arbitrary code, potentially compromising the host system. This vulnerability undermines the intended isolation of untrusted code, leading to unauthorized code execution, data leakage, and possible integration-level compromise.

Impact

Exploitation of this vulnerability allows for arbitrary code execution on the host system, bypassing the intended sandbox restrictions. This could lead to unauthorized access to data, such as credentials and internal files, and allow for lateral movement within production workflows. Additionally, it could compromise integrations that use Hugging Face agent-based chains.

Reproduction

The vulnerability can be reproduced by using the LocalPythonExecutor from the Smolagents library. After creating an instance of the executor without any additional trusted modules, the 'itertools.accumulate()' function can be used to chain 'getattr()' calls, evading the module whitelist checks. This method can access the 'open' function through the 're' module, which is whitelisted. Once the 'open' function is obtained, it can be used to create a file that executes arbitrary commands, such as creating a file in the '/tmp' directory. After removing the original 'random' module from memory, the malicious module can be imported and executed, achieving remote code execution.

Remediation

Users should update to Hugging Face Smolagents version 1.17.0 or later, where this vulnerability has been fixed.

Added: Jul 27, 2025, 8:19 AM
Updated: Jul 27, 2025, 8:19 AM

Vulnerability Rating

Custom Algorithm
spread
0.0
impact
2.5
exploitability
7.7
remediation
7.7
relevance
0.3
threat
6.4
urgency
2.9
incentive
0.8

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.