Docugami Reader MD5 Hash Collision Vulnerability in Llama Index
Vulnerability
A hash collision vulnerability has been identified in the DocugamiReader class of the run-llama/llama_index repository, affecting versions prior to 0.12.28. The vulnerability arises from using MD5 hashing to generate IDs for document chunks, leading to collisions when structurally distinct chunks contain identical text. This flaw allows one chunk to overwrite another, causing the loss of semantically or legally important content, disrupting parent-child chunk hierarchies, and generating inaccurate or hallucinated responses in AI outputs.
Impact
Exploitation of this vulnerability causes hash collisions that allow document chunks with identical text to overwrite each other, leading to the loss of important content and disruption of chunk hierarchies. In the context of AI outputs, this can result in inaccurate or fabricated responses.
Remediation
Users can upgrade to version 0.3.1 to address this vulnerability.
Vulnerability Rating
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.
