Langchain-Text-Splitters XML External Entity Vulnerability in HTMLSectionSplitter

Vulnerability

A vulnerability allowing XML External Entity (XXE) attacks has been identified in the HTMLSectionSplitter class of langchain-text-splitters version 0.3.8. This issue stems from unsafe parsing of XSLT stylesheets, which are loaded using lxml without any protective measures. In lxml versions up to 4.9.x, external entities are resolved by default, enabling attackers to read arbitrary local files or make outbound HTTP(S) requests. Although lxml versions 5.0 and above disable entity expansion, the XSLT document() function can still access any URI unless XSLTAccessControl is applied. Exploiting this vulnerability allows remote attackers to gain read-only access to any file accessible by the LangChain process, including sensitive files like SSH keys, environment files, source code, or cloud metadata. The vulnerability requires no authentication, special privileges, or user interaction and is exploitable in default deployments that allow custom XSLT.

Impact

Exploitation of this vulnerability could lead to unauthorized access to sensitive files within the LangChain process, such as SSH keys, environment files, source code, or cloud metadata. The vulnerability is read-only, with no modifications to data or service disruptions.

Reproduction

The vulnerability can be reproduced by creating a malicious XSLT file that exploits the XXE vulnerability, such as one that reads the /etc/hostname file. This XSLT file can then be processed by the HTMLSectionSplitter class, which will inadvertently disclose the contents of the targeted file.

Remediation

Users can update to langchain-text-splitters version 0.3.9 or later, where this vulnerability has been addressed.

Added: Oct 6, 2025, 6:17 PM
Updated: Oct 6, 2025, 6:17 PM

Vulnerability Rating

Custom Algorithm
spread
0.0
impact
2.5
exploitability
8.7
remediation
0.0
relevance
0.7
threat
6.4
urgency
2.9
incentive
5.8

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.