NLTK
cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*
- <= 3.9.2
A vulnerability allowing arbitrary code execution exists in NLTK versions through 3.9.2, specifically within the StanfordSegmenter module. The issue stems from inadequate input validation, as the module loads external Java .jar files without proper verification or sandboxing. This flaw enables an attacker to substitute the JAR file with a malicious one, which is then executed by the Java Virtual Machine (JVM) when the segmenter is used. Exploitation can occur through model poisoning, man-in-the-middle attacks, or dependency poisoning, leading to remote code execution.
Exploitation of this vulnerability allows for arbitrary execution of Java code, which can escape the Python runtime, execute operating system-level commands via Java APIs, and potentially compromise the entire environment where the segmentation is performed. This vulnerability represents a supply-chain remote code execution risk, particularly if the JAR file path or execution environment is controlled by the attacker.
To reproduce this vulnerability, first replace the core classifier in the Stanford segmenter JAR with a malicious Java class that executes a payload, such as a command to create a file. After compiling this class and packaging it into a JAR, the modified JAR can be used with the NLTK StanfordSegmenter. When the segmenter processes text, the malicious payload is executed, demonstrating the arbitrary code execution vulnerability.
Users can update to NLTK version 3.9.3 or later, where this vulnerability has been fixed. The fix involves adding verification for the JAR files used by the StanfordSegmenter, ensuring that only trusted or user-validated files are executed.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.