Claude Code (Opus 4.6) stands out as a leading model within the Claude family, particularly skilled in making intricate design decisions. However, its per-token pricing can become a significant consideration, making its use for routine implementation tasks seem inefficient.
This is where Moonshot AI’s Kimi K2.5 offers a compelling solution.
Why Kimi K2.5?
Kimi K2.5, a 1-trillion parameter MoE model released in January 2026, functions as a CLI-based coding agent known as “Kimi Code.” It can autonomously manage file edits, execute commands, and run tests directly in the terminal, much like Claude Code.
Its performance as a standalone agent is noteworthy:
Benchmark
Kimi K2.5
Claude Opus 4.5
GPT-5.2
SWE-Bench Verified
76.8%
80.9%
80.0%
AIME 2025
96.1%
92.8%
—
LiveCodeBench v6
85.0%
—
—
Kimi K2.5’s performance on SWE-Bench is comparable to Opus, and it even surpasses Opus in mathematical reasoning. Achieving this level of performance with the Moderato plan (priced at $19/month at the time of writing, offering 2048 requests/week) represents excellent value.
However, Kimi K2.5’s most distinctive feature is its Agent Swarm capability.
Agent Swarm: The Swarm Intelligence Concept
Agent Swarm is an architectural design that allows for the simultaneous launch of up to 100 sub-agents, capable of executing as many as 1,500 tool calls in parallel. An internal orchestrator breaks down complex tasks into smaller, parallelizable subtasks, distributing them among specialized agents (such as an AI Researcher or Fact Checker).
On BrowseComp, the standard mode’s 60.6% success rate significantly improves to 78.4% with Agent Swarm, while execution time can be reduced by up to 4.5 times. Moonshot AI’s philosophy behind swarm intelligence suggests that “A group of moderately intelligent models often outperforms a single highly intelligent model on practical tasks.”
Design Insight: Strengths and Weaknesses of Swarm Intelligence
An analysis of the benchmarks and Agent Swarm’s operational mechanics leads to a specific hypothesis:
Kimi’s primary strength lies in its parallel execution power. It performs exceptionally well when multiple agents execute clearly defined tasks concurrently. Conversely, high-level design decisions—such as determining “what to build” and “how to design it”—are outside the scope of swarm intelligence. Kimi’s internal orchestrator is optimized for task decomposition and parallelization, not for “higher-level orchestrator” roles like architectural design or making trade-off decisions.
In essence, Kimi is an excellent worker but not suited to be an orchestrator. This suggests a natural division of labor: Opus serves as the orchestrator (handling design, decisions, and review), while Kimi manages the implementation and testing.
Another important consideration is how to instruct Kimi. If a human-oriented plan (e.g., “fix the auth module”) is provided as-is, Kimi will spend time determining “which file?” and “what are the completion criteria?” To effectively utilize swarm intelligence’s parallel execution, agents require structured specifications that include concrete file paths, verification commands, and hints for parallelization. This structured approach is referred to as spec.md.
Architecture: Orchestrator + Worker
Based on this analysis, the following setup was designed: Opus creates the initial plan, converts it into a spec.md document after approval, and then passes it to Kimi.
User
↓ Task request
Claude Code (Opus 4.6) — Orchestrator
├── Plan creation (design decisions)
├── Kimi delegation judgment
├── Plan → spec.md conversion
├── Dispatch
└── Review
↓ spec.md
Kimi K2.5 — Worker
├── Implementation based on spec (leveraging swarm intelligence)
├── Test execution
└── Result return (on isolated branch)
This architecture combines Opus’s design capabilities with Kimi’s execution power. Kimi’s modifications are always performed on an isolated branch, which is merged only after review by Claude. This safety mechanism allows for confident delegation of tasks to Kimi.
Initial Design: Two-Step Commands
The first implementation utilized two distinct slash commands:
/kimi-spec <task summary> → Generate spec.md
/kimi-dispatch <spec-path> → Pass to Kimi for execution
While functional, this approach introduced cognitive load, as users had to remember two separate commands.
The Turning Point: “Can We Integrate into Plan Mode?”
Claude Code includes a plan mode (accessible via Shift+Tab or /plan). For complex tasks, it generates a plan that users approve before implementation. By integrating into this existing workflow, users would not need to learn new commands.
Standard flow:
1. User requests task
2. Claude creates plan
3. User approves → Claude implements
Hybrid flow:
1. User requests task
2. Claude creates plan
3. User approves with choice:
- "Claude implements" → Standard flow
- "Delegate to Kimi" → Plan → spec conversion → Kimi dispatch
From a user’s perspective, this simply appears as “one more option on the usual approval screen,” resulting in a zero learning curve.
Is spec.md Really Necessary? — A Design Discussion
At this point, a pause was taken to consider if spec.md was truly necessary. If integration with plan mode was the goal, could the plan file not be passed directly to Kimi? Was the spec.md conversion step an unnecessary overhead consuming Opus tokens?
The idea of reverting to “just pass the plan directly” was considered. However, a calm comparison revealed clear differences in their roles:
plan
spec.md
Audience
Human + Claude
Kimi (autonomous agent)
Path specification
“Fix auth module”
MODIFY src/auth/handler.ts
Completion criteria
“Tests pass”
pytest –cov=src achieves 80%+
Parallel hints
None
[INDEPENDENT] tag
Plans and specs serve different audiences. A plan is intended for human review and judgment, while a spec provides precise instructions for autonomous agents to execute without hesitation.
While Kimi K2.5 is intelligent enough to work with vague instructions, under the Moderato plan’s constraint of 2048 requests/week, the waste incurred by Kimi “searching around” for steps becomes significant.
- Opus conversion cost: A few thousand tokens (inexpensive)
- Kimi step savings: 10-20 steps per task (directly impacts quota)
Therefore, spec conversion is an investment in quota conservation. The decision was made to retain it.
Implementation: Having Kimi Create the Rules
As the initial real-world test of the hybrid environment, the task of creating “plan mode integration rules” was delegated to Kimi itself.
spec.md Content
# Spec: 001 -- Kimi Plan Integration
## Tasks
### Task 1: Create Kimi Delegation Rules [INDEPENDENT]
Files to create:
- CREATE ~/.claude/rules/common/kimi-delegation.md
Requirements:
- When to Suggest (proposal criteria)
- When NOT to Suggest (prohibitions)
- Plan Approval Flow (approval process)
- Plan to Spec Conversion (conversion procedure)
- Quota Awareness (quota consciousness)
Verification:
- Confirm 5 sections exist
- Confirm reference to kimi-wrapper.sh
Kimi Execution Results
$ kimi --prompt "$(cat spec-001.md)" --thinking --yolo --max-steps-per-turn 100
The `kimi-wrapper.sh` script handles model specification and working directory assignment, but the essential options are shown above.
The first dispatch completed in approximately 10 seconds, involving 6 steps. Kimi read 3 existing rule files, understood their format, generated a 119-line rule file, and passed verification.
Because the spec provided concrete paths, section structure, and verification commands, Kimi proceeded directly without any hesitation regarding “what to create.”
Quality of Generated Rules
# Kimi Delegation
## When to Suggest
- Simple implementation tasks — boilerplate, CRUD, standard patterns
- Mechanical changes across multiple files — renaming, format unification
- User explicitly specifies Kimi
...
## Quota Awareness
| Change Scale | Recommended Approach |
|----------|---------------|
| 1-2 files, under 50 lines | Direct Claude implementation |
| 3+ files, 100+ lines | Actively propose Kimi delegation |
The generated rules were consistent with existing formats and included concrete judgment criteria. They passed Opus review without requiring any changes.
File Structure (Full Picture)
The final file listing for the hybrid environment is as follows:
~/.kimi/
├── config.toml # Moderato profile (max_steps=100)
├── config.swarm.toml # Swarm backup
└── credentials/ # OAuth credentials
~/.claude/
├── bin/
│ ├── kimi-wrapper.sh # Claude → Kimi dispatcher
│ └── kimi-profile-switch.sh # Moderato ⇔ Swarm toggle
├── commands/
│ ├── kimi-spec.md # spec generation (manual)
│ └── kimi-dispatch.md # dispatch & review (manual)
├── rules/common/
│ └── kimi-delegation.md # plan mode integration rules ← Kimi created
└── templates/hybrid/
└── spec-template.md # spec template
Moderato Plan Optimization Settings
Constraints and corresponding settings for the Kimi Moderato plan ($19/month):
Setting
Value
Why
max_steps_per_turn
100
Step count = request consumption
tool_call_timeout_ms
120000
Maintain for parallel tool calls
wrapper TIMEOUT
300s
300s is enough for 100 steps
If upgrading to the Swarm plan, `kimi-profile-switch.sh swarm` allows for instant toggling.
Key Learnings
1. Spec Precision = Quota Efficiency
Providing concrete paths, section structures, and verification commands within spec.md significantly reduces Kimi’s exploration steps. In this instance, it took only 6 steps to completion. With vague instructions, it would have required 20-30 steps.
2. Integration with Existing Workflows is Key
Rather than introducing new commands, embedding the functionality into existing plan mode workflows drastically lowers adoption barriers. From the user’s perspective, it is simply “one more option on the approval screen,” resulting in virtually no learning cost.
3. Safety Design Between Agents
Kimi’s changes are always made on isolated branches. Merging these changes only after review by Claude instills confidence, allowing for a casual “just throw it to Kimi” approach.
Summary
The initial dispatch was successful, completing in 10 seconds with 6 steps. This confirmed that the overhead of spec conversion significantly improves Kimi’s quota efficiency.
The core principle of this setup is “dividing roles between LLMs and embedding handoffs into existing workflows.” By combining Kimi Moderato ($19/month) with Claude Code, individuals can establish a practical multi-agent development environment.
Future plans include exploring parallel dispatch for multiple tasks and automatic pull request creation from Kimi’s results.

