mirror of https://github.com/wshobson/agents.git synced 2026-03-18 17:47:16 +00:00

Go to file

Seth Hobson c7ad381360 feat: implement three-tier model strategy with Opus 4.5 (#139 )

* feat: implement three-tier model strategy with Opus 4.5

This implements a strategic model selection approach based on agent
complexity and use case, addressing Issue #136.

Three-Tier Strategy:
- Tier 1 (opus): 17 critical agents for architecture, security, code review
- Tier 2 (inherit): 21 complex agents where users choose their model
- Tier 3 (sonnet): 63 routine development agents (unchanged)
- Tier 4 (haiku): 47 fast operational agents (unchanged)

Why Opus 4.5 for Tier 1:
- 80.9% on SWE-bench (industry-leading for code)
- 65% fewer tokens for long-horizon tasks
- Superior reasoning for architectural decisions

Changes:
- Update architect-review, cloud-architect, kubernetes-architect,
  database-architect, security-auditor, code-reviewer to opus
- Update backend-architect, performance-engineer, ai-engineer,
  prompt-engineer, ml-engineer, mlops-engineer, data-scientist,
  blockchain-developer, quant-analyst, risk-manager, sql-pro,
  database-optimizer to inherit
- Update README with three-tier model documentation

Relates to #136

* feat: comprehensive model tier redistribution for Opus 4.5

This commit implements a strategic rebalancing of agent model assignments,
significantly increasing the use of Opus 4.5 for critical coding tasks while
ensuring Sonnet is used more than Haiku for support tasks.

Final Distribution (153 total agent files):
- Tier 1 Opus: 42 agents (27.5%) - All production coding + critical architecture
- Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable
- Tier 3 Sonnet: 38 agents (24.8%) - Support tasks needing intelligence
- Tier 4 Haiku: 31 agents (20.3%) - Simple operational tasks

Key Changes:

Tier 1 (Opus) - Production Coding + Critical Review:
- ALL code-reviewers (6 total): Ensures highest quality code review across
  all contexts (comprehensive, git PR, code docs, codebase cleanup, refactoring, TDD)
- All major language pros (7): python, golang, rust, typescript, cpp, java, c
- Framework specialists (6): django (2), fastapi (2), graphql-architect (2)
- Complex specialists (6): terraform-specialist (3), tdd-orchestrator (2), data-engineer
- Blockchain: blockchain-developer (smart contracts are critical)
- Game dev (2): unity-developer, minecraft-bukkit-pro
- Architecture (existing): architect-review, cloud-architect, kubernetes-architect,
  hybrid-cloud-architect, database-architect, security-auditor

Tier 2 (Inherit) - User Flexibility:
- Secondary languages (6): javascript, scala, csharp, ruby, php, elixir
- All frontend/mobile (8): frontend-developer (4), mobile-developer (2),
  flutter-expert, ios-developer
- Specialized (6): observability-engineer (2), temporal-python-pro,
  arm-cortex-expert, context-manager (2), database-optimizer (2)
- AI/ML, backend-architect, performance-engineer, quant/risk (existing)

Tier 3 (Sonnet) - Intelligent Support:
- Documentation (4): docs-architect (2), tutorial-engineer (2)
- Testing (2): test-automator (2)
- Developer experience (3): dx-optimizer (2), business-analyst
- Modernization (4): legacy-modernizer (3), database-admin
- Other support agents (existing)

Tier 4 (Haiku) - Simple Operations:
- SEO/Marketing (10): All SEO agents, content, search
- Deployment (4): deployment-engineer (4 instances)
- Debugging (5): debugger (2), error-detective (3)
- DevOps (3): devops-troubleshooter (3)
- Other simple operational tasks

Rationale:
- Opus 4.5 achieves 80.9% on SWE-bench with 65% fewer tokens on complex tasks
- Production code deserves the best model: all language pros now on Opus
- All code review uses Opus for maximum quality and security
- Sonnet > Haiku (38 vs 31) ensures better intelligence for support tasks
- Inherit tier gives users cost control for frontend, mobile, and specialized tasks

Related: #136, #132

* feat: upgrade final 13 agents from Haiku to Sonnet

Based on research into Haiku 4.5 vs Sonnet 4.5 capabilities, upgraded
agents requiring deep analytical intelligence from Haiku to Sonnet.

Research Findings:
- Haiku 4.5: 73.3% SWE-bench, 3-5x faster, 1/3 cost, sub-200ms responses
- Best for Haiku: Real-time apps, data extraction, templates, high-volume ops
- Best for Sonnet: Complex reasoning, root cause analysis, strategic planning

Agents Upgraded (13 total):
- Debugging (5): debugger (2), error-detective (3) - Complex root cause analysis
- DevOps (3): devops-troubleshooter (3) - System diagnostics & troubleshooting
- Network (2): network-engineer (2) - Complex network analysis & optimization
- API Documentation (2): api-documenter (2) - Deep API understanding required
- Payments (1): payment-integration - Critical financial integration

Final Distribution (153 total):
- Tier 1 Opus: 42 agents (27.5%) - Production coding + critical architecture
- Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable
- Tier 3 Sonnet: 51 agents (33.3%) - Support tasks needing intelligence
- Tier 4 Haiku: 18 agents (11.8%) - Fast operational tasks only

Haiku Now Reserved For:
- SEO/Marketing (8): Pattern matching, data extraction, content templates
- Deployment (4): Operational execution tasks
- Simple Docs (3): reference-builder, mermaid-expert, c4-code
- Sales/Support (2): High-volume, template-based interactions
- Search (1): Knowledge retrieval

Sonnet > Haiku as requested (51 vs 18)

Sources:
- https://www.creolestudios.com/claude-haiku-4-5-vs-sonnet-4-5-comparison/
- https://www.anthropic.com/news/claude-haiku-4-5
- https://caylent.com/blog/claude-haiku-4-5-deep-dive-cost-capabilities-and-the-multi-agent-opportunity

Related: #136

* docs: add cost considerations and clarify inherit behavior

Addresses PR feedback:
- Added comprehensive cost comparison for all model tiers
- Documented how 'inherit' model works (uses session default, falls back to Sonnet)
- Explained cost optimization strategies
- Clarified when Opus token efficiency offsets higher rate

This helps users make informed decisions about model selection and cost control.

2025-12-10 15:52:06 -05:00

.claude-plugin

add c4 documentation workflow and agents (#129 )

2025-12-10 14:53:11 -05:00

.github

Add Claude Code GitHub Workflow (#140 )

2025-12-10 15:12:53 -05:00

docs

add c4 documentation workflow and agents (#129 )

2025-12-10 14:53:11 -05:00

plugins

feat: implement three-tier model strategy with Opus 4.5 (#139 )

2025-12-10 15:52:06 -05:00

.gitignore

chore: add .venv to gitignore

2025-10-16 21:06:05 -04:00

LICENSE

chore: add copyright holder name to LICENSE

2025-11-01 09:35:17 -04:00

README.md

feat: implement three-tier model strategy with Opus 4.5 (#139 )

2025-12-10 15:52:06 -05:00

README.md

Claude Code Plugins: Orchestration and Automation

⚡ Updated for Opus 4.5, Sonnet 4.5 & Haiku 4.5 — Three-tier model strategy for optimal performance

🎯 Agent Skills Enabled — 47 specialized skills extend Claude's capabilities across plugins with progressive disclosure

A comprehensive production-ready system combining 91 specialized AI agents, 15 multi-agent workflow orchestrators, 47 agent skills, and 45 development tools organized into 65 focused, single-purpose plugins for Claude Code.

Overview

This unified repository provides everything needed for intelligent automation and multi-agent orchestration across modern software development:

65 Focused Plugins - Granular, single-purpose plugins optimized for minimal token usage and composability
91 Specialized Agents - Domain experts with deep knowledge across architecture, languages, infrastructure, quality, data/AI, documentation, business operations, and SEO
47 Agent Skills - Modular knowledge packages with progressive disclosure for specialized expertise
15 Workflow Orchestrators - Multi-agent coordination systems for complex operations like full-stack development, security hardening, ML pipelines, and incident response
45 Development Tools - Optimized utilities including project scaffolding, security scanning, test automation, and infrastructure setup

Key Features

Granular Plugin Architecture: 65 focused plugins optimized for minimal token usage
Comprehensive Tooling: 45 development tools including test generation, scaffolding, and security scanning
100% Agent Coverage: All plugins include specialized agents
Agent Skills: 47 specialized skills following for progressive disclosure and token efficiency
Clear Organization: 23 categories with 1-6 plugins each for easy discovery
Efficient Design: Average 3.4 components per plugin (follows Anthropic's 2-8 pattern)

How It Works

Each plugin is completely isolated with its own agents, commands, and skills:

Install only what you need - Each plugin loads only its specific agents, commands, and skills
Minimal token usage - No unnecessary resources loaded into context
Mix and match - Compose multiple plugins for complex workflows
Clear boundaries - Each plugin has a single, focused purpose
Progressive disclosure - Skills load knowledge only when activated

Example: Installing python-development loads 3 Python agents, 1 scaffolding tool, and makes 5 skills available (~300 tokens), not the entire marketplace.

Quick Start

Step 1: Add the Marketplace

Add this marketplace to Claude Code:

/plugin marketplace add wshobson/agents

This makes all 65 plugins available for installation, but does not load any agents or tools into your context.

Step 2: Install Plugins

Browse available plugins:

/plugin

Install the plugins you need:

# Essential development plugins
/plugin install python-development          # Python with 5 specialized skills
/plugin install javascript-typescript       # JS/TS with 4 specialized skills
/plugin install backend-development         # Backend APIs with 3 architecture skills

# Infrastructure & operations
/plugin install kubernetes-operations       # K8s with 4 deployment skills
/plugin install cloud-infrastructure        # AWS/Azure/GCP with 4 cloud skills

# Security & quality
/plugin install security-scanning           # SAST with security skill
/plugin install code-review-ai             # AI-powered code review

# Full-stack orchestration
/plugin install full-stack-orchestration   # Multi-agent workflows

Each installed plugin loads only its specific agents, commands, and skills into Claude's context.

Documentation

Core Guides

Plugin Reference - Complete catalog of all 65 plugins
Agent Reference - All 91 agents organized by category
Agent Skills - 47 specialized skills with progressive disclosure
Usage Guide - Commands, workflows, and best practices
Architecture - Design principles and patterns

Quick Links

Installation - Get started in 2 steps
Essential Plugins - Top plugins for immediate productivity
Command Reference - All slash commands organized by category
Multi-Agent Workflows - Pre-configured orchestration examples
Model Configuration - Haiku/Sonnet hybrid orchestration

What's New

Agent Skills (47 skills across 14 plugins)

Specialized knowledge packages following Anthropic's progressive disclosure architecture:

Language Development:

Python (5 skills): async patterns, testing, packaging, performance, UV package manager
JavaScript/TypeScript (4 skills): advanced types, Node.js patterns, testing, modern ES6+

Infrastructure & DevOps:

Kubernetes (4 skills): manifests, Helm charts, GitOps, security policies
Cloud Infrastructure (4 skills): Terraform, multi-cloud, hybrid networking, cost optimization
CI/CD (4 skills): pipeline design, GitHub Actions, GitLab CI, secrets management

Development & Architecture:

Backend (3 skills): API design, architecture patterns, microservices
LLM Applications (4 skills): LangChain, prompt engineering, RAG, evaluation

Blockchain & Web3 (4 skills): DeFi protocols, NFT standards, Solidity security, Web3 testing

And more: Framework migration, observability, payment processing, ML operations, security scanning

→ View complete skills documentation

Three-Tier Model Strategy

Strategic model assignment for optimal performance and cost:

Tier	Model	Agents	Use Case
Tier 1	Opus 4.5	42	Critical architecture, security, ALL code review, production coding (language pros, frameworks)
Tier 2	Inherit	42	Complex tasks - user chooses model (AI/ML, backend, frontend/mobile, specialized)
Tier 3	Sonnet	51	Support with intelligence (docs, testing, debugging, network, API docs, DX, legacy, payments)
Tier 4	Haiku	18	Fast operational tasks (SEO, deployment, simple docs, sales, content, search)

Why Opus 4.5 for Critical Agents?

80.9% on SWE-bench (industry-leading)
65% fewer tokens for complex tasks
Best for architecture decisions and security audits

Tier 2 Flexibility (inherit): Agents marked inherit use your session's default model, letting you balance cost and capability:

Set via claude --model opus or claude --model sonnet when starting a session
Falls back to Sonnet 4.5 if no default specified
Perfect for frontend/mobile developers who want cost control
AI/ML engineers can choose Opus for complex model work

Cost Considerations:

Opus 4.5: $5/$25 per million input/output tokens - Premium for critical work
Sonnet 4.5: $3/$15 per million tokens - Balanced performance/cost
Haiku 4.5: $1/$5 per million tokens - Fast, cost-effective operations
Opus's 65% token reduction on complex tasks often offsets higher rate
Use inherit tier to control costs for high-volume use cases

Orchestration patterns combine models for efficiency:

Opus (architecture) → Sonnet (development) → Haiku (deployment)

→ View model configuration details

Popular Use Cases

Full-Stack Feature Development

/full-stack-orchestration:full-stack-feature "user authentication with OAuth2"

Coordinates 7+ agents: backend-architect → database-architect → frontend-developer → test-automator → security-auditor → deployment-engineer → observability-engineer

→ View all workflow examples

Security Hardening

/security-scanning:security-hardening --level comprehensive

Multi-agent security assessment with SAST, dependency scanning, and code review.

Python Development with Modern Tools

/python-development:python-scaffold fastapi-microservice

Creates production-ready FastAPI project with async patterns, activating skills:

async-python-patterns - AsyncIO and concurrency
python-testing-patterns - pytest and fixtures
uv-package-manager - Fast dependency management

Kubernetes Deployment

# Activates k8s skills automatically
"Create production Kubernetes deployment with Helm chart and GitOps"

Uses kubernetes-architect agent with 4 specialized skills for production-grade configs.

→ View complete usage guide

Plugin Categories

23 categories, 65 plugins:

🎨 Development (4) - debugging, backend, frontend, multi-platform
📚 Documentation (3) - code docs, API specs, diagrams, C4 architecture
🔄 Workflows (3) - git, full-stack, TDD
✅ Testing (2) - unit testing, TDD workflows
🔍 Quality (3) - code review, comprehensive review, performance
🤖 AI & ML (4) - LLM apps, agent orchestration, context, MLOps
📊 Data (2) - data engineering, data validation
🗄️ Database (2) - database design, migrations
🚨 Operations (4) - incident response, diagnostics, distributed debugging, observability
⚡ Performance (2) - application performance, database/cloud optimization
☁️ Infrastructure (5) - deployment, validation, Kubernetes, cloud, CI/CD
🔒 Security (4) - scanning, compliance, backend/API, frontend/mobile
💻 Languages (7) - Python, JS/TS, systems, JVM, scripting, functional, embedded
🔗 Blockchain (1) - smart contracts, DeFi, Web3
💰 Finance (1) - quantitative trading, risk management
💳 Payments (1) - Stripe, PayPal, billing
🎮 Gaming (1) - Unity, Minecraft plugins
📢 Marketing (4) - SEO content, technical SEO, SEO analysis, content marketing
💼 Business (3) - analytics, HR/legal, customer/sales
And more...

→ View complete plugin catalog

Architecture Highlights

Granular Design

Single responsibility - Each plugin does one thing well
Minimal token usage - Average 3.4 components per plugin
Composable - Mix and match for complex workflows
100% coverage - All 91 agents accessible across plugins

Progressive Disclosure (Skills)

Three-tier architecture for token efficiency:

Metadata - Name and activation criteria (always loaded)
Instructions - Core guidance (loaded when activated)
Resources - Examples and templates (loaded on demand)

Repository Structure

claude-agents/
├── .claude-plugin/
│   └── marketplace.json          # 65 plugins
├── plugins/
│   ├── python-development/
│   │   ├── agents/               # 3 Python experts
│   │   ├── commands/             # Scaffolding tool
│   │   └── skills/               # 5 specialized skills
│   ├── kubernetes-operations/
│   │   ├── agents/               # K8s architect
│   │   ├── commands/             # Deployment tools
│   │   └── skills/               # 4 K8s skills
│   └── ... (63 more plugins)
├── docs/                          # Comprehensive documentation
└── README.md                      # This file

→ View architecture details

Contributing

To add new agents, skills, or commands:

Identify or create the appropriate plugin directory in plugins/
Create .md files in the appropriate subdirectory:
- agents/ - For specialized agents
- commands/ - For tools and workflows
- skills/ - For modular knowledge packages
Follow naming conventions (lowercase, hyphen-separated)
Write clear activation criteria and comprehensive content
Update the plugin definition in .claude-plugin/marketplace.json

README.md

Claude Code Plugins: Orchestration and Automation

Overview

Key Features

How It Works

Quick Start

Step 1: Add the Marketplace

Step 2: Install Plugins

Documentation

Core Guides

Quick Links

What's New

Agent Skills (47 skills across 14 plugins)

Three-Tier Model Strategy

Popular Use Cases

Full-Stack Feature Development

Security Hardening

Python Development with Modern Tools

Kubernetes Deployment

Plugin Categories

Architecture Highlights

Granular Design

Progressive Disclosure (Skills)

Repository Structure

Contributing

Resources

Documentation

This Repository

License

Star History