Anthropic Claude Code CLI and Agent SDK OS Command Injection Vulnerability in Authentication Helpers
Vulnerability
A command injection vulnerability has been identified in Anthropic's Claude Code CLI and Claude Agent SDK for Python. This vulnerability arises in the authentication helper execution, where configuration values are processed with shell interpretation enabled, but without proper input validation. Attackers who can manipulate authentication settings may inject shell metacharacters through various parameters, such as apiKeyHelper, awsAuthRefresh, awsCredentialExport, and gcpAuthRefresh. This exploitation allows for the execution of arbitrary commands with the user's privileges or within the automation environment, facilitating the theft of credentials and exfiltration of environment variables.
Impact
Exploitation of this vulnerability leads to arbitrary command execution in the context of the user or automation environment. This is particularly critical in CI/CD contexts, where it allows for the exfiltration of cloud credentials, API tokens, and other sensitive information.
Reproduction
The vulnerability can be reproduced by influencing the CLI's authentication settings to include shell metacharacters. This can be done by injecting a malicious payload into the settings file of a repository, which is then executed when the CLI is run in a non-interactive mode. The injected command can be verified by checking for its execution on the system or by monitoring network callbacks if the payload is designed to exfiltrate data.
Remediation
Users are advised to set credentials using environment variables instead of through the authentication helpers. In CI/CD environments, it is important to generate settings from trusted sources and to review changes to the .claude/settings.json file with the same scrutiny as other CI/CD configuration changes. For the vendor, recommendations include executing authentication helpers with structured command configurations that do not allow shell metacharacters, adding flags to skip helper execution or pin settings content, and logging all helper executions.
Vulnerability Rating
Our algorithm analyzes dozens of metrics to generate these 8 key vulnerability categories, which are then combined to calculate the overall risk score.
