uutils coreutils comm Utility Data Corruption Vulnerability via Lossy UTF-8 Conversion

Vulnerability

A vulnerability exists in the comm utility of uutils coreutils, where it improperly handles non-UTF-8 binary data by applying a lossy UTF-8 conversion. This issue arises because the utility uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character. As a result, when comm is used to compare binary files or files with non-UTF-8 legacy encodings, the output data is silently corrupted. This behavior contrasts with GNU comm, which accurately processes raw bytes and maintains the integrity of the original input.

Impact

The vulnerability leads to silent data corruption, where non-UTF-8 text is altered in the output, potentially causing loss of information or misinterpretation of data.

Reproduction

The vulnerability can be reproduced by creating two files that contain invalid UTF-8 byte sequences, such as bytes 0xfe and 0xff. After saving these files, the comm utility can be run to compare them. The output can be checked using the od command to display the hexadecimal representation of the bytes. While GNU comm preserves the original byte sequences, the uutils version replaces the invalid bytes with the UTF-8 replacement character, demonstrating the data corruption.

Remediation

Users can update to uutils coreutils version 0.6.0 or later, where this issue has been fixed.

Added: Apr 22, 2026, 6:16 PM
Updated: Apr 22, 2026, 6:16 PM

Vulnerability Rating

Custom Algorithm
spread
0.0
impact
0.6
exploitability
4.6
remediation
0.0
relevance
6.5
threat
6.4
urgency
2.9
incentive
0.0

Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.