feat: implement three-tier model strategy with Opus 4.5 (#139)

* feat: implement three-tier model strategy with Opus 4.5 This implements a strategic model selection approach based on agent complexity and use case, addressing Issue #136. Three-Tier Strategy: - Tier 1 (opus): 17 critical agents for architecture, security, code review - Tier 2 (inherit): 21 complex agents where users choose their model - Tier 3 (sonnet): 63 routine development agents (unchanged) - Tier 4 (haiku): 47 fast operational agents (unchanged) Why Opus 4.5 for Tier 1: - 80.9% on SWE-bench (industry-leading for code) - 65% fewer tokens for long-horizon tasks - Superior reasoning for architectural decisions Changes: - Update architect-review, cloud-architect, kubernetes-architect, database-architect, security-auditor, code-reviewer to opus - Update backend-architect, performance-engineer, ai-engineer, prompt-engineer, ml-engineer, mlops-engineer, data-scientist, blockchain-developer, quant-analyst, risk-manager, sql-pro, database-optimizer to inherit - Update README with three-tier model documentation Relates to #136 * feat: comprehensive model tier redistribution for Opus 4.5 This commit implements a strategic rebalancing of agent model assignments, significantly increasing the use of Opus 4.5 for critical coding tasks while ensuring Sonnet is used more than Haiku for support tasks. Final Distribution (153 total agent files): - Tier 1 Opus: 42 agents (27.5%) - All production coding + critical architecture - Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable - Tier 3 Sonnet: 38 agents (24.8%) - Support tasks needing intelligence - Tier 4 Haiku: 31 agents (20.3%) - Simple operational tasks Key Changes: Tier 1 (Opus) - Production Coding + Critical Review: - ALL code-reviewers (6 total): Ensures highest quality code review across all contexts (comprehensive, git PR, code docs, codebase cleanup, refactoring, TDD) - All major language pros (7): python, golang, rust, typescript, cpp, java, c - Framework specialists (6): django (2), fastapi (2), graphql-architect (2) - Complex specialists (6): terraform-specialist (3), tdd-orchestrator (2), data-engineer - Blockchain: blockchain-developer (smart contracts are critical) - Game dev (2): unity-developer, minecraft-bukkit-pro - Architecture (existing): architect-review, cloud-architect, kubernetes-architect, hybrid-cloud-architect, database-architect, security-auditor Tier 2 (Inherit) - User Flexibility: - Secondary languages (6): javascript, scala, csharp, ruby, php, elixir - All frontend/mobile (8): frontend-developer (4), mobile-developer (2), flutter-expert, ios-developer - Specialized (6): observability-engineer (2), temporal-python-pro, arm-cortex-expert, context-manager (2), database-optimizer (2) - AI/ML, backend-architect, performance-engineer, quant/risk (existing) Tier 3 (Sonnet) - Intelligent Support: - Documentation (4): docs-architect (2), tutorial-engineer (2) - Testing (2): test-automator (2) - Developer experience (3): dx-optimizer (2), business-analyst - Modernization (4): legacy-modernizer (3), database-admin - Other support agents (existing) Tier 4 (Haiku) - Simple Operations: - SEO/Marketing (10): All SEO agents, content, search - Deployment (4): deployment-engineer (4 instances) - Debugging (5): debugger (2), error-detective (3) - DevOps (3): devops-troubleshooter (3) - Other simple operational tasks Rationale: - Opus 4.5 achieves 80.9% on SWE-bench with 65% fewer tokens on complex tasks - Production code deserves the best model: all language pros now on Opus - All code review uses Opus for maximum quality and security - Sonnet > Haiku (38 vs 31) ensures better intelligence for support tasks - Inherit tier gives users cost control for frontend, mobile, and specialized tasks Related: #136, #132 * feat: upgrade final 13 agents from Haiku to Sonnet Based on research into Haiku 4.5 vs Sonnet 4.5 capabilities, upgraded agents requiring deep analytical intelligence from Haiku to Sonnet. Research Findings: - Haiku 4.5: 73.3% SWE-bench, 3-5x faster, 1/3 cost, sub-200ms responses - Best for Haiku: Real-time apps, data extraction, templates, high-volume ops - Best for Sonnet: Complex reasoning, root cause analysis, strategic planning Agents Upgraded (13 total): - Debugging (5): debugger (2), error-detective (3) - Complex root cause analysis - DevOps (3): devops-troubleshooter (3) - System diagnostics & troubleshooting - Network (2): network-engineer (2) - Complex network analysis & optimization - API Documentation (2): api-documenter (2) - Deep API understanding required - Payments (1): payment-integration - Critical financial integration Final Distribution (153 total): - Tier 1 Opus: 42 agents (27.5%) - Production coding + critical architecture - Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable - Tier 3 Sonnet: 51 agents (33.3%) - Support tasks needing intelligence - Tier 4 Haiku: 18 agents (11.8%) - Fast operational tasks only Haiku Now Reserved For: - SEO/Marketing (8): Pattern matching, data extraction, content templates - Deployment (4): Operational execution tasks - Simple Docs (3): reference-builder, mermaid-expert, c4-code - Sales/Support (2): High-volume, template-based interactions - Search (1): Knowledge retrieval Sonnet > Haiku as requested (51 vs 18) Sources: - https://www.creolestudios.com/claude-haiku-4-5-vs-sonnet-4-5-comparison/ - https://www.anthropic.com/news/claude-haiku-4-5 - https://caylent.com/blog/claude-haiku-4-5-deep-dive-cost-capabilities-and-the-multi-agent-opportunity Related: #136 * docs: add cost considerations and clarify inherit behavior Addresses PR feedback: - Added comprehensive cost comparison for all model tiers - Documented how 'inherit' model works (uses session default, falls back to Sonnet) - Explained cost optimization strategies - Clarified when Opus token efficiency offsets higher rate This helps users make informed decisions about model selection and cost control.
2026-03-18 09:37:15 +00:00 · 2025-12-10 15:52:06 -05:00
parent 27a246a8c6
commit c7ad381360
108 changed files with 137 additions and 114 deletions
--- a/plugins/systems-programming/agents/c-pro.md
+++ b/plugins/systems-programming/agents/c-pro.md
@@ -1,7 +1,7 @@
 ---
 name: c-pro
 description: Write efficient C code with proper memory management, pointer arithmetic, and system calls. Handles embedded systems, kernel modules, and performance-critical code. Use PROACTIVELY for C optimization, memory issues, or system programming.
-model: sonnet
+model: opus
 ---

 You are a C programming expert specializing in systems programming and performance.
--- a/plugins/systems-programming/agents/cpp-pro.md
+++ b/plugins/systems-programming/agents/cpp-pro.md
@@ -1,7 +1,7 @@
 ---
 name: cpp-pro
 description: Write idiomatic C++ code with modern features, RAII, smart pointers, and STL algorithms. Handles templates, move semantics, and performance optimization. Use PROACTIVELY for C++ refactoring, memory safety, or complex C++ patterns.
-model: sonnet
+model: opus
 ---

 You are a C++ programming expert specializing in modern C++ and high-performance software.
--- a/plugins/systems-programming/agents/golang-pro.md
+++ b/plugins/systems-programming/agents/golang-pro.md
@@ -1,7 +1,7 @@
 ---
 name: golang-pro
 description: Master Go 1.21+ with modern patterns, advanced concurrency, performance optimization, and production-ready microservices. Expert in the latest Go ecosystem including generics, workspaces, and cutting-edge frameworks. Use PROACTIVELY for Go development, architecture design, or performance optimization.
-model: sonnet
+model: opus
 ---

 You are a Go expert specializing in modern Go 1.21+ development with advanced concurrency patterns, performance optimization, and production-ready system design.
--- a/plugins/systems-programming/agents/rust-pro.md
+++ b/plugins/systems-programming/agents/rust-pro.md
@@ -1,7 +1,7 @@
 ---
 name: rust-pro
 description: Master Rust 1.75+ with modern async patterns, advanced type system features, and production-ready systems programming. Expert in the latest Rust ecosystem including Tokio, axum, and cutting-edge crates. Use PROACTIVELY for Rust development, performance optimization, or systems programming.
-model: sonnet
+model: opus
 ---

 You are a Rust expert specializing in modern Rust 1.75+ development with advanced async programming, systems-level performance, and production-ready applications.