feat: implement three-tier model strategy with Opus 4.5 (#139)

* feat: implement three-tier model strategy with Opus 4.5

This implements a strategic model selection approach based on agent
complexity and use case, addressing Issue #136.

Three-Tier Strategy:
- Tier 1 (opus): 17 critical agents for architecture, security, code review
- Tier 2 (inherit): 21 complex agents where users choose their model
- Tier 3 (sonnet): 63 routine development agents (unchanged)
- Tier 4 (haiku): 47 fast operational agents (unchanged)

Why Opus 4.5 for Tier 1:
- 80.9% on SWE-bench (industry-leading for code)
- 65% fewer tokens for long-horizon tasks
- Superior reasoning for architectural decisions

Changes:
- Update architect-review, cloud-architect, kubernetes-architect,
  database-architect, security-auditor, code-reviewer to opus
- Update backend-architect, performance-engineer, ai-engineer,
  prompt-engineer, ml-engineer, mlops-engineer, data-scientist,
  blockchain-developer, quant-analyst, risk-manager, sql-pro,
  database-optimizer to inherit
- Update README with three-tier model documentation

Relates to #136

* feat: comprehensive model tier redistribution for Opus 4.5

This commit implements a strategic rebalancing of agent model assignments,
significantly increasing the use of Opus 4.5 for critical coding tasks while
ensuring Sonnet is used more than Haiku for support tasks.

Final Distribution (153 total agent files):
- Tier 1 Opus: 42 agents (27.5%) - All production coding + critical architecture
- Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable
- Tier 3 Sonnet: 38 agents (24.8%) - Support tasks needing intelligence
- Tier 4 Haiku: 31 agents (20.3%) - Simple operational tasks

Key Changes:

Tier 1 (Opus) - Production Coding + Critical Review:
- ALL code-reviewers (6 total): Ensures highest quality code review across
  all contexts (comprehensive, git PR, code docs, codebase cleanup, refactoring, TDD)
- All major language pros (7): python, golang, rust, typescript, cpp, java, c
- Framework specialists (6): django (2), fastapi (2), graphql-architect (2)
- Complex specialists (6): terraform-specialist (3), tdd-orchestrator (2), data-engineer
- Blockchain: blockchain-developer (smart contracts are critical)
- Game dev (2): unity-developer, minecraft-bukkit-pro
- Architecture (existing): architect-review, cloud-architect, kubernetes-architect,
  hybrid-cloud-architect, database-architect, security-auditor

Tier 2 (Inherit) - User Flexibility:
- Secondary languages (6): javascript, scala, csharp, ruby, php, elixir
- All frontend/mobile (8): frontend-developer (4), mobile-developer (2),
  flutter-expert, ios-developer
- Specialized (6): observability-engineer (2), temporal-python-pro,
  arm-cortex-expert, context-manager (2), database-optimizer (2)
- AI/ML, backend-architect, performance-engineer, quant/risk (existing)

Tier 3 (Sonnet) - Intelligent Support:
- Documentation (4): docs-architect (2), tutorial-engineer (2)
- Testing (2): test-automator (2)
- Developer experience (3): dx-optimizer (2), business-analyst
- Modernization (4): legacy-modernizer (3), database-admin
- Other support agents (existing)

Tier 4 (Haiku) - Simple Operations:
- SEO/Marketing (10): All SEO agents, content, search
- Deployment (4): deployment-engineer (4 instances)
- Debugging (5): debugger (2), error-detective (3)
- DevOps (3): devops-troubleshooter (3)
- Other simple operational tasks

Rationale:
- Opus 4.5 achieves 80.9% on SWE-bench with 65% fewer tokens on complex tasks
- Production code deserves the best model: all language pros now on Opus
- All code review uses Opus for maximum quality and security
- Sonnet > Haiku (38 vs 31) ensures better intelligence for support tasks
- Inherit tier gives users cost control for frontend, mobile, and specialized tasks

Related: #136, #132

* feat: upgrade final 13 agents from Haiku to Sonnet

Based on research into Haiku 4.5 vs Sonnet 4.5 capabilities, upgraded
agents requiring deep analytical intelligence from Haiku to Sonnet.

Research Findings:
- Haiku 4.5: 73.3% SWE-bench, 3-5x faster, 1/3 cost, sub-200ms responses
- Best for Haiku: Real-time apps, data extraction, templates, high-volume ops
- Best for Sonnet: Complex reasoning, root cause analysis, strategic planning

Agents Upgraded (13 total):
- Debugging (5): debugger (2), error-detective (3) - Complex root cause analysis
- DevOps (3): devops-troubleshooter (3) - System diagnostics & troubleshooting
- Network (2): network-engineer (2) - Complex network analysis & optimization
- API Documentation (2): api-documenter (2) - Deep API understanding required
- Payments (1): payment-integration - Critical financial integration

Final Distribution (153 total):
- Tier 1 Opus: 42 agents (27.5%) - Production coding + critical architecture
- Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable
- Tier 3 Sonnet: 51 agents (33.3%) - Support tasks needing intelligence
- Tier 4 Haiku: 18 agents (11.8%) - Fast operational tasks only

Haiku Now Reserved For:
- SEO/Marketing (8): Pattern matching, data extraction, content templates
- Deployment (4): Operational execution tasks
- Simple Docs (3): reference-builder, mermaid-expert, c4-code
- Sales/Support (2): High-volume, template-based interactions
- Search (1): Knowledge retrieval

Sonnet > Haiku as requested (51 vs 18)

Sources:
- https://www.creolestudios.com/claude-haiku-4-5-vs-sonnet-4-5-comparison/
- https://www.anthropic.com/news/claude-haiku-4-5
- https://caylent.com/blog/claude-haiku-4-5-deep-dive-cost-capabilities-and-the-multi-agent-opportunity

Related: #136

* docs: add cost considerations and clarify inherit behavior

Addresses PR feedback:
- Added comprehensive cost comparison for all model tiers
- Documented how 'inherit' model works (uses session default, falls back to Sonnet)
- Explained cost optimization strategies
- Clarified when Opus token efficiency offsets higher rate

This helps users make informed decisions about model selection and cost control.
This commit is contained in:
Seth Hobson
2025-12-10 15:52:06 -05:00
committed by GitHub
parent 27a246a8c6
commit c7ad381360
108 changed files with 137 additions and 114 deletions

View File

@@ -1,7 +1,7 @@
---
name: cloud-architect
description: Expert cloud architect specializing in AWS/Azure/GCP multi-cloud infrastructure design, advanced IaC (Terraform/OpenTofu/CDK), FinOps cost optimization, and modern architectural patterns. Masters serverless, microservices, security, compliance, and disaster recovery. Use PROACTIVELY for cloud architecture, cost optimization, migration planning, or multi-cloud strategies.
model: sonnet
model: opus
---
You are a cloud architect specializing in scalable, cost-effective, and secure multi-cloud infrastructure design.

View File

@@ -1,7 +1,7 @@
---
name: devops-troubleshooter
description: Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability. Masters log analysis, distributed tracing, Kubernetes debugging, performance optimization, and root cause analysis. Handles production outages, system reliability, and preventive monitoring. Use PROACTIVELY for debugging, incident response, or system troubleshooting.
model: haiku
model: sonnet
---
You are a DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability practices.

View File

@@ -1,7 +1,7 @@
---
name: kubernetes-architect
description: Expert Kubernetes architect specializing in cloud-native infrastructure, advanced GitOps workflows (ArgoCD/Flux), and enterprise container orchestration. Masters EKS/AKS/GKE, service mesh (Istio/Linkerd), progressive delivery, multi-tenancy, and platform engineering. Handles security, observability, cost optimization, and developer experience. Use PROACTIVELY for K8s architecture, GitOps implementation, or cloud-native platform design.
model: sonnet
model: opus
---
You are a Kubernetes architect specializing in cloud-native infrastructure, modern GitOps workflows, and enterprise container orchestration at scale.

View File

@@ -1,7 +1,7 @@
---
name: terraform-specialist
description: Expert Terraform/OpenTofu specialist mastering advanced IaC automation, state management, and enterprise infrastructure patterns. Handles complex module design, multi-cloud deployments, GitOps workflows, policy as code, and CI/CD integration. Covers migration strategies, security best practices, and modern IaC ecosystems. Use PROACTIVELY for advanced IaC, state management, or infrastructure automation.
model: sonnet
model: opus
---
You are a Terraform/OpenTofu specialist focused on advanced infrastructure automation, state management, and modern IaC practices.