Primary

Context

NLTK StanfordSegmenter Arbitrary Code Execution Vulnerability

Vulnerability

A vulnerability allowing arbitrary code execution exists in NLTK versions through 3.9.2, specifically within the StanfordSegmenter module. The issue stems from inadequate input validation, as the module loads external Java .jar files without proper verification or sandboxing. This flaw enables an attacker to substitute the JAR file with a malicious one, which is then executed by the Java Virtual Machine (JVM) when the segmenter is used. Exploitation can occur through model poisoning, man-in-the-middle attacks, or dependency poisoning, leading to remote code execution.

Impact

Exploitation of this vulnerability allows for arbitrary execution of Java code, which can escape the Python runtime, execute operating system-level commands via Java APIs, and potentially compromise the entire environment where the segmentation is performed. This vulnerability represents a supply-chain remote code execution risk, particularly if the JAR file path or execution environment is controlled by the attacker.

Reproduction

To reproduce this vulnerability, first replace the core classifier in the Stanford segmenter JAR with a malicious Java class that executes a payload, such as a command to create a file. After compiling this class and packaging it into a JAR, the modified JAR can be used with the NLTK StanfordSegmenter. When the segmenter processes text, the malicious payload is executed, demonstrating the arbitrary code execution vulnerability.

Remediation

Users can update to NLTK version 3.9.3 or later, where this vulnerability has been fixed. The fix involves adding verification for the JAR files used by the StanfordSegmenter, ensuring that only trusted or user-validated files are executed.

Added: Mar 5, 2026, 9:24 PM

Updated: Mar 5, 2026, 9:24 PM

Vulnerability Rating

Custom Algorithm

spread

6.6

impact

10.0

exploitability

5.4

remediation

0.0

relevance

3.5

threat

6.4

urgency

2.9

incentive

0.0

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.

Vulnerability Rating

Custom Algorithm

spread

6.6

impact

10.0

exploitability

5.4

remediation

0.0

relevance

3.5

threat

6.4

urgency

2.9

incentive

0.0

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.

NLTK StanfordSegmenter Arbitrary Code Execution Vulnerability

Vulnerability

Impact

Reproduction

Remediation

Affected Products

NLTK

CVSS Scores

References

Vulnerability Rating

Vulnerability Rating