NLTK WordListCorpusReader
cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*
- <= 3.9.2
A path traversal vulnerability has been identified in NLTK versions through 3.9.2, specifically within multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes inadequately sanitize or validate file paths, allowing attackers to traverse directories and access sensitive files on the server. This vulnerability is particularly concerning in contexts where user-controlled file inputs are processed, such as machine learning APIs, chatbots, or natural language processing pipelines. Exploiting this issue could lead to unauthorized access to sensitive files like system files, SSH private keys, and API tokens, with the potential for escalating to remote code execution when combined with other vulnerabilities.
Successful exploitation allows for arbitrary file reading, with potential access to sensitive files such as /etc/passwd, /etc/shadow, SSH private keys, environment secrets, and API tokens. In machine learning contexts, this vulnerability could be exploited to read training data or source code, and if combined with certain types of deserialization vulnerabilities, could lead to remote code execution.
The vulnerability can be reproduced by creating an instance of one of the affected CorpusReader classes with a file path that traverses directories, such as '../etc/passwd'. The corpus reader will return the contents of the traversed file, demonstrating the path traversal vulnerability. This can also be exploited through a web application that uses NLTK and accepts file paths from users, such as a Flask app with an endpoint that reads files using the vulnerable corpus readers.
A suggested patch involves adding path validation to the CorpusReader.open() method. This includes blocking absolute paths, preventing path traversal, and ensuring that file accesses are sandboxed within the corpus root.
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.