The emergence of agentic software development has dramatically accelerated the pace of code creation, review, and deployment across the industry. This rapid evolution necessitates a corresponding advancement in testing frameworks to keep up with the speed of change. Faster development cycles demand testing solutions that can identify bugs as soon as they appear in a codebase, without requiring constant updates and manual upkeep.
Just-in-Time Tests (JiTTests) represent a groundbreaking approach where large language models (LLMs) automatically generate tests dynamically. These tests are designed to catch bugs, including those that traditional methods might miss, precisely when they matter most—before code is deployed to production.
A Catching JiTTest specifically targets regressions introduced by code modifications. This method redefines decades of software testing principles and practices. Unlike traditional testing, which relies on static test suites, manual authoring, and continuous maintenance, Catching JiTTests eliminate the need for test maintenance and code review. This allows engineers to dedicate their expertise to resolving genuine bugs rather than dealing with false positives. Catching JiTTests employ advanced techniques to maximize the value of test signals and minimize the burden of false positives, directing attention to critical failures.
HOW TRADITIONAL TESTING OPERATES
Under the conventional testing model, tests are manually created as new code is integrated into a codebase and are executed continuously. This process demands regular updates and ongoing maintenance. Engineers responsible for these tests face the challenge of verifying the behavior not only of the current code but also of all potential future changes. The inherent uncertainty about future modifications often results in tests that either fail to detect issues or generate false positives when they do. Agentic development significantly increases the rate of code changes, placing immense strain on test development and escalating the costs associated with false positives and test maintenance to an unsustainable level.
HOW CATCHING JITTESTS FUNCTION
Broadly, JiTTests are customized tests, specifically designed for a particular code change, that provide engineers with clear, actionable feedback regarding unexpected behavior changes without requiring them to write or read test code. LLMs can automatically generate JiTTests the moment a pull request is submitted. Because the JiTTest itself is LLM-generated, it can often infer the likely intention behind a code change and simulate potential faults that might arise from it.
By understanding the intended purpose, Catching JiTTests can significantly reduce the occurrence of false positives.
The key steps in the Catching JiTTest process include:
- New code is introduced into the codebase.
- The system deduces the intended purpose of the code change.
- It generates mutants (versions of the code with faults intentionally inserted) to simulate potential issues.
- Tests are generated and executed to identify these faults.
- Combinations of rule-based and LLM-based assessors refine the signal to pinpoint true positive failures.
- Engineers receive precise, relevant reports about unexpected changes exactly when they are most crucial.
WHY THIS APPROACH IS SIGNIFICANT
Catching JiTTests are specifically designed for the era of AI-powered agentic software development, accelerating testing by concentrating on critical, unexpected bugs. With this system, engineers no longer need to spend time writing, reviewing, and testing complex test code. Catching JiTTests inherently address many of the problems associated with traditional testing:
- They are generated dynamically for each code change and do not reside within the codebase, thus eliminating ongoing maintenance costs and shifting the effort from human intervention to automated processes.
- They are customized for each specific change, making them more robust and less susceptible to breaking due to intentional updates.
- They automatically adapt as the underlying code evolves.
- Human review is only required when an actual bug is detected.
This represents a crucial shift in testing infrastructure, moving the focus from general code quality to whether a test effectively identifies faults in a specific change without generating false positives. This approach enhances overall testing efficiency while enabling it to keep pace with the rapid speed of agentic coding. Further details can be found in the paper Just-in-Time Catching Test Generation at Meta.

