lxml_html_clean <base> Tag Injection Vulnerability Allowing URL Hijacking
Vulnerability
A vulnerability exists in lxml_html_clean versions prior to 0.4.4, where the default Cleaner configuration does not properly handle <base> tags. This oversight allows attackers to inject <base> tags and hijack relative links on the page. Although the Cleaner removes <html>, <head>, and <title> tags when page_structure=True, <base> tags are not addressed, creating a potential attack vector. The injected <base> tag can redirect all relative URLs to a domain controlled by the attacker, leading to phishing, cross-site scripting, or defacement attacks.
Impact
Exploitation of this vulnerability allows for the injection of <base> tags, which can hijack all relative URLs on the page. This could redirect links and form submissions to an attacker-controlled domain, steal credentials or sensitive data, load malicious JavaScript files, or facilitate UI redressing or defacement by manipulating image or stylesheet references.
Reproduction
To reproduce this vulnerability, use lxml_html_clean version 0.4.3 with the default Cleaner configuration. Inject a <base> tag into the HTML being cleaned. After processing, the <base> tag will be preserved, demonstrating that the vulnerability exists by allowing redirection of relative URLs to an attacker-controlled domain.
Remediation
Users can upgrade to lxml_html_clean version 0.4.4 or later, where this vulnerability has been patched.
Vulnerability Rating
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.
