Seth Hobson c7ad381360 feat: implement three-tier model strategy with Opus 4.5 (#139)
* feat: implement three-tier model strategy with Opus 4.5

This implements a strategic model selection approach based on agent
complexity and use case, addressing Issue #136.

Three-Tier Strategy:
- Tier 1 (opus): 17 critical agents for architecture, security, code review
- Tier 2 (inherit): 21 complex agents where users choose their model
- Tier 3 (sonnet): 63 routine development agents (unchanged)
- Tier 4 (haiku): 47 fast operational agents (unchanged)

Why Opus 4.5 for Tier 1:
- 80.9% on SWE-bench (industry-leading for code)
- 65% fewer tokens for long-horizon tasks
- Superior reasoning for architectural decisions

Changes:
- Update architect-review, cloud-architect, kubernetes-architect,
  database-architect, security-auditor, code-reviewer to opus
- Update backend-architect, performance-engineer, ai-engineer,
  prompt-engineer, ml-engineer, mlops-engineer, data-scientist,
  blockchain-developer, quant-analyst, risk-manager, sql-pro,
  database-optimizer to inherit
- Update README with three-tier model documentation

Relates to #136

* feat: comprehensive model tier redistribution for Opus 4.5

This commit implements a strategic rebalancing of agent model assignments,
significantly increasing the use of Opus 4.5 for critical coding tasks while
ensuring Sonnet is used more than Haiku for support tasks.

Final Distribution (153 total agent files):
- Tier 1 Opus: 42 agents (27.5%) - All production coding + critical architecture
- Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable
- Tier 3 Sonnet: 38 agents (24.8%) - Support tasks needing intelligence
- Tier 4 Haiku: 31 agents (20.3%) - Simple operational tasks

Key Changes:

Tier 1 (Opus) - Production Coding + Critical Review:
- ALL code-reviewers (6 total): Ensures highest quality code review across
  all contexts (comprehensive, git PR, code docs, codebase cleanup, refactoring, TDD)
- All major language pros (7): python, golang, rust, typescript, cpp, java, c
- Framework specialists (6): django (2), fastapi (2), graphql-architect (2)
- Complex specialists (6): terraform-specialist (3), tdd-orchestrator (2), data-engineer
- Blockchain: blockchain-developer (smart contracts are critical)
- Game dev (2): unity-developer, minecraft-bukkit-pro
- Architecture (existing): architect-review, cloud-architect, kubernetes-architect,
  hybrid-cloud-architect, database-architect, security-auditor

Tier 2 (Inherit) - User Flexibility:
- Secondary languages (6): javascript, scala, csharp, ruby, php, elixir
- All frontend/mobile (8): frontend-developer (4), mobile-developer (2),
  flutter-expert, ios-developer
- Specialized (6): observability-engineer (2), temporal-python-pro,
  arm-cortex-expert, context-manager (2), database-optimizer (2)
- AI/ML, backend-architect, performance-engineer, quant/risk (existing)

Tier 3 (Sonnet) - Intelligent Support:
- Documentation (4): docs-architect (2), tutorial-engineer (2)
- Testing (2): test-automator (2)
- Developer experience (3): dx-optimizer (2), business-analyst
- Modernization (4): legacy-modernizer (3), database-admin
- Other support agents (existing)

Tier 4 (Haiku) - Simple Operations:
- SEO/Marketing (10): All SEO agents, content, search
- Deployment (4): deployment-engineer (4 instances)
- Debugging (5): debugger (2), error-detective (3)
- DevOps (3): devops-troubleshooter (3)
- Other simple operational tasks

Rationale:
- Opus 4.5 achieves 80.9% on SWE-bench with 65% fewer tokens on complex tasks
- Production code deserves the best model: all language pros now on Opus
- All code review uses Opus for maximum quality and security
- Sonnet > Haiku (38 vs 31) ensures better intelligence for support tasks
- Inherit tier gives users cost control for frontend, mobile, and specialized tasks

Related: #136, #132

* feat: upgrade final 13 agents from Haiku to Sonnet

Based on research into Haiku 4.5 vs Sonnet 4.5 capabilities, upgraded
agents requiring deep analytical intelligence from Haiku to Sonnet.

Research Findings:
- Haiku 4.5: 73.3% SWE-bench, 3-5x faster, 1/3 cost, sub-200ms responses
- Best for Haiku: Real-time apps, data extraction, templates, high-volume ops
- Best for Sonnet: Complex reasoning, root cause analysis, strategic planning

Agents Upgraded (13 total):
- Debugging (5): debugger (2), error-detective (3) - Complex root cause analysis
- DevOps (3): devops-troubleshooter (3) - System diagnostics & troubleshooting
- Network (2): network-engineer (2) - Complex network analysis & optimization
- API Documentation (2): api-documenter (2) - Deep API understanding required
- Payments (1): payment-integration - Critical financial integration

Final Distribution (153 total):
- Tier 1 Opus: 42 agents (27.5%) - Production coding + critical architecture
- Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable
- Tier 3 Sonnet: 51 agents (33.3%) - Support tasks needing intelligence
- Tier 4 Haiku: 18 agents (11.8%) - Fast operational tasks only

Haiku Now Reserved For:
- SEO/Marketing (8): Pattern matching, data extraction, content templates
- Deployment (4): Operational execution tasks
- Simple Docs (3): reference-builder, mermaid-expert, c4-code
- Sales/Support (2): High-volume, template-based interactions
- Search (1): Knowledge retrieval

Sonnet > Haiku as requested (51 vs 18)

Sources:
- https://www.creolestudios.com/claude-haiku-4-5-vs-sonnet-4-5-comparison/
- https://www.anthropic.com/news/claude-haiku-4-5
- https://caylent.com/blog/claude-haiku-4-5-deep-dive-cost-capabilities-and-the-multi-agent-opportunity

Related: #136

* docs: add cost considerations and clarify inherit behavior

Addresses PR feedback:
- Added comprehensive cost comparison for all model tiers
- Documented how 'inherit' model works (uses session default, falls back to Sonnet)
- Explained cost optimization strategies
- Clarified when Opus token efficiency offsets higher rate

This helps users make informed decisions about model selection and cost control.
2025-12-10 15:52:06 -05:00
2025-10-16 21:06:05 -04:00

Claude Code Plugins: Orchestration and Automation

Updated for Opus 4.5, Sonnet 4.5 & Haiku 4.5 — Three-tier model strategy for optimal performance

Run in Smithery

🎯 Agent Skills Enabled — 47 specialized skills extend Claude's capabilities across plugins with progressive disclosure

A comprehensive production-ready system combining 91 specialized AI agents, 15 multi-agent workflow orchestrators, 47 agent skills, and 45 development tools organized into 65 focused, single-purpose plugins for Claude Code.

Overview

This unified repository provides everything needed for intelligent automation and multi-agent orchestration across modern software development:

  • 65 Focused Plugins - Granular, single-purpose plugins optimized for minimal token usage and composability
  • 91 Specialized Agents - Domain experts with deep knowledge across architecture, languages, infrastructure, quality, data/AI, documentation, business operations, and SEO
  • 47 Agent Skills - Modular knowledge packages with progressive disclosure for specialized expertise
  • 15 Workflow Orchestrators - Multi-agent coordination systems for complex operations like full-stack development, security hardening, ML pipelines, and incident response
  • 45 Development Tools - Optimized utilities including project scaffolding, security scanning, test automation, and infrastructure setup

Key Features

  • Granular Plugin Architecture: 65 focused plugins optimized for minimal token usage
  • Comprehensive Tooling: 45 development tools including test generation, scaffolding, and security scanning
  • 100% Agent Coverage: All plugins include specialized agents
  • Agent Skills: 47 specialized skills following for progressive disclosure and token efficiency
  • Clear Organization: 23 categories with 1-6 plugins each for easy discovery
  • Efficient Design: Average 3.4 components per plugin (follows Anthropic's 2-8 pattern)

How It Works

Each plugin is completely isolated with its own agents, commands, and skills:

  • Install only what you need - Each plugin loads only its specific agents, commands, and skills
  • Minimal token usage - No unnecessary resources loaded into context
  • Mix and match - Compose multiple plugins for complex workflows
  • Clear boundaries - Each plugin has a single, focused purpose
  • Progressive disclosure - Skills load knowledge only when activated

Example: Installing python-development loads 3 Python agents, 1 scaffolding tool, and makes 5 skills available (~300 tokens), not the entire marketplace.

Quick Start

Step 1: Add the Marketplace

Add this marketplace to Claude Code:

/plugin marketplace add wshobson/agents

This makes all 65 plugins available for installation, but does not load any agents or tools into your context.

Step 2: Install Plugins

Browse available plugins:

/plugin

Install the plugins you need:

# Essential development plugins
/plugin install python-development          # Python with 5 specialized skills
/plugin install javascript-typescript       # JS/TS with 4 specialized skills
/plugin install backend-development         # Backend APIs with 3 architecture skills

# Infrastructure & operations
/plugin install kubernetes-operations       # K8s with 4 deployment skills
/plugin install cloud-infrastructure        # AWS/Azure/GCP with 4 cloud skills

# Security & quality
/plugin install security-scanning           # SAST with security skill
/plugin install code-review-ai             # AI-powered code review

# Full-stack orchestration
/plugin install full-stack-orchestration   # Multi-agent workflows

Each installed plugin loads only its specific agents, commands, and skills into Claude's context.

Documentation

Core Guides

What's New

Agent Skills (47 skills across 14 plugins)

Specialized knowledge packages following Anthropic's progressive disclosure architecture:

Language Development:

  • Python (5 skills): async patterns, testing, packaging, performance, UV package manager
  • JavaScript/TypeScript (4 skills): advanced types, Node.js patterns, testing, modern ES6+

Infrastructure & DevOps:

  • Kubernetes (4 skills): manifests, Helm charts, GitOps, security policies
  • Cloud Infrastructure (4 skills): Terraform, multi-cloud, hybrid networking, cost optimization
  • CI/CD (4 skills): pipeline design, GitHub Actions, GitLab CI, secrets management

Development & Architecture:

  • Backend (3 skills): API design, architecture patterns, microservices
  • LLM Applications (4 skills): LangChain, prompt engineering, RAG, evaluation

Blockchain & Web3 (4 skills): DeFi protocols, NFT standards, Solidity security, Web3 testing

And more: Framework migration, observability, payment processing, ML operations, security scanning

→ View complete skills documentation

Three-Tier Model Strategy

Strategic model assignment for optimal performance and cost:

Tier Model Agents Use Case
Tier 1 Opus 4.5 42 Critical architecture, security, ALL code review, production coding (language pros, frameworks)
Tier 2 Inherit 42 Complex tasks - user chooses model (AI/ML, backend, frontend/mobile, specialized)
Tier 3 Sonnet 51 Support with intelligence (docs, testing, debugging, network, API docs, DX, legacy, payments)
Tier 4 Haiku 18 Fast operational tasks (SEO, deployment, simple docs, sales, content, search)

Why Opus 4.5 for Critical Agents?

  • 80.9% on SWE-bench (industry-leading)
  • 65% fewer tokens for complex tasks
  • Best for architecture decisions and security audits

Tier 2 Flexibility (inherit): Agents marked inherit use your session's default model, letting you balance cost and capability:

  • Set via claude --model opus or claude --model sonnet when starting a session
  • Falls back to Sonnet 4.5 if no default specified
  • Perfect for frontend/mobile developers who want cost control
  • AI/ML engineers can choose Opus for complex model work

Cost Considerations:

  • Opus 4.5: $5/$25 per million input/output tokens - Premium for critical work
  • Sonnet 4.5: $3/$15 per million tokens - Balanced performance/cost
  • Haiku 4.5: $1/$5 per million tokens - Fast, cost-effective operations
  • Opus's 65% token reduction on complex tasks often offsets higher rate
  • Use inherit tier to control costs for high-volume use cases

Orchestration patterns combine models for efficiency:

Opus (architecture) → Sonnet (development) → Haiku (deployment)

→ View model configuration details

Full-Stack Feature Development

/full-stack-orchestration:full-stack-feature "user authentication with OAuth2"

Coordinates 7+ agents: backend-architect → database-architect → frontend-developer → test-automator → security-auditor → deployment-engineer → observability-engineer

→ View all workflow examples

Security Hardening

/security-scanning:security-hardening --level comprehensive

Multi-agent security assessment with SAST, dependency scanning, and code review.

Python Development with Modern Tools

/python-development:python-scaffold fastapi-microservice

Creates production-ready FastAPI project with async patterns, activating skills:

  • async-python-patterns - AsyncIO and concurrency
  • python-testing-patterns - pytest and fixtures
  • uv-package-manager - Fast dependency management

Kubernetes Deployment

# Activates k8s skills automatically
"Create production Kubernetes deployment with Helm chart and GitOps"

Uses kubernetes-architect agent with 4 specialized skills for production-grade configs.

→ View complete usage guide

Plugin Categories

23 categories, 65 plugins:

  • 🎨 Development (4) - debugging, backend, frontend, multi-platform
  • 📚 Documentation (3) - code docs, API specs, diagrams, C4 architecture
  • 🔄 Workflows (3) - git, full-stack, TDD
  • Testing (2) - unit testing, TDD workflows
  • 🔍 Quality (3) - code review, comprehensive review, performance
  • 🤖 AI & ML (4) - LLM apps, agent orchestration, context, MLOps
  • 📊 Data (2) - data engineering, data validation
  • 🗄️ Database (2) - database design, migrations
  • 🚨 Operations (4) - incident response, diagnostics, distributed debugging, observability
  • Performance (2) - application performance, database/cloud optimization
  • ☁️ Infrastructure (5) - deployment, validation, Kubernetes, cloud, CI/CD
  • 🔒 Security (4) - scanning, compliance, backend/API, frontend/mobile
  • 💻 Languages (7) - Python, JS/TS, systems, JVM, scripting, functional, embedded
  • 🔗 Blockchain (1) - smart contracts, DeFi, Web3
  • 💰 Finance (1) - quantitative trading, risk management
  • 💳 Payments (1) - Stripe, PayPal, billing
  • 🎮 Gaming (1) - Unity, Minecraft plugins
  • 📢 Marketing (4) - SEO content, technical SEO, SEO analysis, content marketing
  • 💼 Business (3) - analytics, HR/legal, customer/sales
  • And more...

→ View complete plugin catalog

Architecture Highlights

Granular Design

  • Single responsibility - Each plugin does one thing well
  • Minimal token usage - Average 3.4 components per plugin
  • Composable - Mix and match for complex workflows
  • 100% coverage - All 91 agents accessible across plugins

Progressive Disclosure (Skills)

Three-tier architecture for token efficiency:

  1. Metadata - Name and activation criteria (always loaded)
  2. Instructions - Core guidance (loaded when activated)
  3. Resources - Examples and templates (loaded on demand)

Repository Structure

claude-agents/
├── .claude-plugin/
│   └── marketplace.json          # 65 plugins
├── plugins/
│   ├── python-development/
│   │   ├── agents/               # 3 Python experts
│   │   ├── commands/             # Scaffolding tool
│   │   └── skills/               # 5 specialized skills
│   ├── kubernetes-operations/
│   │   ├── agents/               # K8s architect
│   │   ├── commands/             # Deployment tools
│   │   └── skills/               # 4 K8s skills
│   └── ... (63 more plugins)
├── docs/                          # Comprehensive documentation
└── README.md                      # This file

→ View architecture details

Contributing

To add new agents, skills, or commands:

  1. Identify or create the appropriate plugin directory in plugins/
  2. Create .md files in the appropriate subdirectory:
    • agents/ - For specialized agents
    • commands/ - For tools and workflows
    • skills/ - For modular knowledge packages
  3. Follow naming conventions (lowercase, hyphen-separated)
  4. Write clear activation criteria and comprehensive content
  5. Update the plugin definition in .claude-plugin/marketplace.json

See Architecture Documentation for detailed guidelines.

Resources

Documentation

This Repository

License

MIT License - see LICENSE file for details.

Star History

Star History Chart

Description
No description provided
Readme MIT 12 MiB
Languages
Python 51.3%
C# 35.4%
Shell 7.9%
Makefile 5.4%