Introducing CodeMender: an AI agent for code security

Early results from research on CodeMender, a new AI-powered agent, demonstrate its ability to automatically improve code security.

Software vulnerabilities pose a significant challenge for developers, being difficult and time-consuming to identify and resolve, even with traditional automated techniques such as fuzzing. AI-based initiatives like Big Sleep and OSS-Fuzz have shown AI’s capability to uncover new zero-day vulnerabilities in thoroughly tested software. As advancements in AI-powered vulnerability discovery continue, it will become increasingly challenging for human efforts alone to keep pace.

CodeMender addresses this issue through a comprehensive code security strategy. It acts reactively by instantly patching new vulnerabilities and proactively by rewriting and securing existing code, thereby eliminating entire categories of vulnerabilities. In its initial six months, CodeMender has contributed 72 security fixes to open-source projects, some involving codebases as extensive as 4.5 million lines.

By automating the creation and application of high-quality security patches, CodeMender’s AI-powered agent enables developers and maintainers to concentrate on software development.

CodeMender in action

CodeMender functions by utilizing the reasoning abilities of recent Gemini Deep Think models, creating an autonomous agent capable of debugging and resolving intricate vulnerabilities.

The CodeMender agent is equipped with robust tools that allow it to reason about code prior to making modifications and to automatically validate those changes, ensuring correctness and preventing regressions.

Animation showing CodeMender’s process for fixing vulnerabilities.

Although large language models are advancing quickly, errors in code security can be expensive. CodeMender’s automated validation process guarantees that code changes are correct across multiple aspects. It only presents high-quality patches for human review that, for instance, address the root cause, are functionally sound, introduce no regressions, and adhere to style guidelines.

New techniques and tools were developed to enable CodeMender to reason about code and validate changes more effectively. These include:

Advanced program analysis: Tools based on advanced program analysis, including static analysis, dynamic analysis, differential testing, fuzzing, and SMT solvers, were developed. By using these tools to systematically scrutinize code patterns, control flow, and data flow, CodeMender can more effectively identify the root causes of security flaws and architectural weaknesses.
Multi-agent systems: Special-purpose agents were developed to enable CodeMender to address specific aspects of an underlying problem. For example, CodeMender employs a large language model-based critique tool that highlights differences between original and modified code to verify that proposed changes do not introduce regressions, and allows for self-correction as needed.

Fixing vulnerabilities

To effectively patch a vulnerability and prevent its re-emergence, CodeMender utilizes a debugger, source code browser, and other tools to identify root causes and formulate patches. Two examples of CodeMender patching vulnerabilities are provided in the video carousel below.

Example #1: Identifying the root cause of a vulnerability

Here’s a snippet of the agent’s reasoning about the root cause for a CodeMender-generated patch, after analyzing the results of debugger output and a code search tool.

Although the final patch in this example only changed a few lines of code, the root cause of the vulnerability was not immediately clear. In this case, the crash report showed a heap buffer overflow, but the actual problem was elsewhere — an incorrect stack management of Extensible Markup Language (XML) elements during parsing.

Example #2: Agent is able to create non-trivial patches

In this example, the CodeMender agent was able to come up with a non-trivial patch that deals with a complex object lifetime issue.

The agent was not only able to figure out the root cause of the vulnerability, but was also able to modify a completely custom system for generating C code within the project.

Proactively rewriting existing code for better security

CodeMender was also designed to proactively rewrite existing code, enabling the use of more secure data structures and APIs.

For instance, CodeMender was deployed to apply -fbounds-safety annotations to sections of libwebp, a widely used image compression library. With -fbounds-safety annotations, the compiler incorporates bounds checks into the code, which helps prevent attackers from exploiting buffer overflows or underflows to execute arbitrary code.

Several years prior, a heap buffer overflow vulnerability in libwebp (CVE-2023-4863) was exploited by a threat actor as part of a zero-click iOS exploit. With -fbounds-safety annotations, this vulnerability, and most other buffer overflows in annotated parts of the project, would have been rendered unexploitable.

The video carousel below illustrates examples of the agent’s decision-making process, including its validation steps.

Example #1: Agent’s reasoning steps

In this example, the CodeMender agent is asked to address the following -fbounds-safety error on bit_depths pointer:

Example #2: Agent automatically corrects errors and test failures

Another of CodeMender’s key features is its ability to automatically correct new errors and any test failures that arise from its own annotations. Here is an example of the agent recovering from a compilation error.

Example #3: Agent validates the changes

In this example, the CodeMender agent modifies a function and then uses the LLM judge tool configured for functional equivalence to verify that the functionality remains intact. When the tool detects a failure, the agent self-corrects based on the LLM judge’s feedback.

Making software secure for everyone

While early results with CodeMender show promise, a cautious approach is being taken, prioritizing reliability. Currently, all patches generated by CodeMender undergo review by human researchers before being submitted upstream.

CodeMender has already been used to submit patches to various critical open-source libraries, with many already accepted and upstreamed. This process is being gradually scaled up to ensure quality and systematically incorporate feedback from the open-source community.

Interested maintainers of critical open-source projects will also be gradually contacted with CodeMender-generated patches. By iterating on feedback from this process, the aim is to release CodeMender as a tool usable by all software developers to maintain secure codebases.

A number of techniques and results are planned for sharing, with intentions to publish them as technical papers and reports in the coming months. CodeMender represents an initial step in exploring AI’s significant potential to enhance software security for all.

Latest Post

Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling

How Cloudflare Mitigated a Vulnerability in its ACME Validation Logic

Demis Hassabis and John Jumper Receive Nobel Prize in Chemistry

Demis Hassabis and John Jumper Receive Nobel Prize in Chemistry

ChatGPT’s Dominance Among Young Indians: Usage Insights from OpenAI

SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

Latest Post

Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling

How Cloudflare Mitigated a Vulnerability in its ACME Validation Logic

Demis Hassabis and John Jumper Receive Nobel Prize in Chemistry

Latest Post

Introducing CodeMender: an AI agent for code security

CodeMender in action

Fixing vulnerabilities

Proactively rewriting existing code for better security

Making software secure for everyone

Related Posts