Langchain-Chatchat Weak Hash Vulnerability in Vision Chat Image Handling
Vulnerability
A vulnerability exists in Langchain-Chatchat versions up to 0.3.1.3, specifically within the Vision Chat feature that processes pasted images. The issue arises because the application uses the PIL library's Image.tobytes() method to generate an MD5 hash for image filenames, relying solely on the raw pixel data. This approach ignores essential image metadata, such as dimensions and color information, creating a scenario where two visually distinct images can produce identical byte representations and hash values. As a result, an attacker on the same local network can overwrite a victim's image file, leading to incorrect responses from the language model during chat interactions.
Impact
Exploitation of this vulnerability allows for silent overwriting of image files associated with chat sessions, causing the language model to reference incorrect visual content and generate irrelevant responses.
Reproduction
To reproduce this vulnerability, first create a pair of PNG images in Palette mode that are visually different but yield the same output when processed with the Image.tobytes() method. This can be achieved by manipulating the images' palettes while keeping the pixel data identical. Once the collision pair is prepared, upload the first image (Image A) through the Langchain-Chatchat interface. The server will process this image and assign it a filename based on its MD5 hash. Afterward, upload the second image (Image B) using the same filename. The server will overwrite the file corresponding to Image A with Image B, effectively replacing the original content. Finally, initiate a chat session that retrieves the image file, which will now reflect the overwritten content.
Remediation
Update the image handling code to use a hashing method that incorporates the complete PNG byte stream, including all metadata, and switch from MD5 to SHA-256 to mitigate collision risks.
Vulnerability Rating
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.
