agents

mirror of https://github.com/wshobson/agents.git synced 2026-03-18 09:37:15 +00:00

Author	SHA1	Message	Date
Seth Hobson	4d504ed8fa	fix: eliminate cross-plugin dependencies and modernize plugin.json across marketplace Rewrites 14 commands across 11 plugins to remove all cross-plugin subagent_type references (e.g., "unit-testing::test-automator"), which break when plugins are installed standalone. Each command now uses only local bundled agents or general-purpose with role context in the prompt. All rewritten commands follow conductor-style patterns: - CRITICAL BEHAVIORAL RULES with strong directives - State files for session tracking and resume support - Phase checkpoints requiring explicit user approval - File-based context passing between steps Also fixes 4 plugin.json files missing version/license fields and adds plugin.json for dotnet-contribution. Closes #433	2026-02-06 19:34:26 -05:00
Seth Hobson	1135ac6062	docs: update installation commands for llm-application-dev and conductor	2026-01-19 17:08:27 -05:00
Seth Hobson	56848874a2	style: format all files with prettier	2026-01-19 17:07:03 -05:00
Seth Hobson	1b9d881d11	fix(llm-application-dev): use auto-discovery pattern like conductor v2.0.2	2026-01-19 16:55:01 -05:00
Seth Hobson	16f8e8c66e	fix(llm-application-dev): add command frontmatter for slash command registration v2.0.1	2026-01-19 16:26:41 -05:00
Seth Hobson	8be0e8ac7a	feat(llm-application-dev): modernize to LangGraph and latest models v2.0.0 - Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns - Update model references to Claude 4.5 and GPT-5.2 - Add Voyage AI as primary embedding recommendation - Add structured outputs with Pydantic - Replace deprecated initialize_agent() with StateGraph - Fix security: use AST-based safe math instead of unsafe execution - Add plugin.json and README.md for consistency - Bump marketplace version to 1.3.3	2026-01-19 15:43:25 -05:00
google-labs-jules[bot]	a86384334b	⚡ Bolt: optimize prompt evaluation loop to skip redundant calls (#152 ) - Avoid re-evaluating the current prompt if metrics are already available from the previous iteration. - Pass metrics from the best variation to the next iteration. - Reduces N-1 expensive LLM calls in an N-iteration optimization loop. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2025-12-21 19:02:37 -05:00
google-labs-jules[bot]	fda45604b7	⚡ Bolt: Optimize PromptOptimizer thread pool usage (#147 ) * ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer 💡 What: Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. 🎯 Why: The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools. 📊 Impact: Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM. 🔬 Measurement: Ran a benchmark script executing `evaluate_prompt` 500 times. Before: 5.36s After: 3.76s * ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer 💡 What: Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method for proper cleanup. 🎯 Why: The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools. 📊 Impact: Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM. 🔬 Measurement: Ran a benchmark script executing `evaluate_prompt` 500 times. Before: 5.36s After: 3.76s * ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer 💡 What: Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method and wrapped execution in `try...finally` for proper resource management. 🎯 Why: The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools. 📊 Impact: Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM. 🔬 Measurement: Ran a benchmark script executing `evaluate_prompt` 500 times. Before: 5.36s After: 3.76s --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2025-12-20 21:28:39 -05:00
google-labs-jules[bot]	70cf3f3682	⚡ Bolt: Parallelize Prompt Evaluation in optimize-prompt.py (#145 ) * feat: Parallelize prompt evaluation in optimize-prompt.py - Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing - Significantly reduces total execution time when using high-latency LLM clients (network IO bound) - Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results - This prepares the script for real-world usage where sequential execution is a major bottleneck ⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests. * feat: Parallelize prompt evaluation in optimize-prompt.py - Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing - Significantly reduces total execution time when using high-latency LLM clients (network IO bound) - Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results - Ensure no generated artifacts (`optimization_results.json`) are committed ⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2025-12-19 09:12:15 -05:00
Seth Hobson	01d93fc227	feat: add 5 new specialized agents with 20 skills Add domain expert agents with comprehensive skill sets: - service-mesh-expert (cloud-infrastructure): Istio/Linkerd patterns, mTLS, observability - event-sourcing-architect (backend-development): CQRS, event stores, projections, sagas - vector-database-engineer (llm-application-dev): embeddings, similarity search, hybrid search - monorepo-architect (developer-essentials): Nx, Turborepo, Bazel, pnpm workspaces - threat-modeling-expert (security-scanning): STRIDE, attack trees, security requirements Update all documentation to reflect correct counts: - 67 plugins, 99 agents, 107 skills, 71 commands	2025-12-16 16:00:58 -05:00
Seth Hobson	c7ad381360	feat: implement three-tier model strategy with Opus 4.5 (#139 ) * feat: implement three-tier model strategy with Opus 4.5 This implements a strategic model selection approach based on agent complexity and use case, addressing Issue #136. Three-Tier Strategy: - Tier 1 (opus): 17 critical agents for architecture, security, code review - Tier 2 (inherit): 21 complex agents where users choose their model - Tier 3 (sonnet): 63 routine development agents (unchanged) - Tier 4 (haiku): 47 fast operational agents (unchanged) Why Opus 4.5 for Tier 1: - 80.9% on SWE-bench (industry-leading for code) - 65% fewer tokens for long-horizon tasks - Superior reasoning for architectural decisions Changes: - Update architect-review, cloud-architect, kubernetes-architect, database-architect, security-auditor, code-reviewer to opus - Update backend-architect, performance-engineer, ai-engineer, prompt-engineer, ml-engineer, mlops-engineer, data-scientist, blockchain-developer, quant-analyst, risk-manager, sql-pro, database-optimizer to inherit - Update README with three-tier model documentation Relates to #136 * feat: comprehensive model tier redistribution for Opus 4.5 This commit implements a strategic rebalancing of agent model assignments, significantly increasing the use of Opus 4.5 for critical coding tasks while ensuring Sonnet is used more than Haiku for support tasks. Final Distribution (153 total agent files): - Tier 1 Opus: 42 agents (27.5%) - All production coding + critical architecture - Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable - Tier 3 Sonnet: 38 agents (24.8%) - Support tasks needing intelligence - Tier 4 Haiku: 31 agents (20.3%) - Simple operational tasks Key Changes: Tier 1 (Opus) - Production Coding + Critical Review: - ALL code-reviewers (6 total): Ensures highest quality code review across all contexts (comprehensive, git PR, code docs, codebase cleanup, refactoring, TDD) - All major language pros (7): python, golang, rust, typescript, cpp, java, c - Framework specialists (6): django (2), fastapi (2), graphql-architect (2) - Complex specialists (6): terraform-specialist (3), tdd-orchestrator (2), data-engineer - Blockchain: blockchain-developer (smart contracts are critical) - Game dev (2): unity-developer, minecraft-bukkit-pro - Architecture (existing): architect-review, cloud-architect, kubernetes-architect, hybrid-cloud-architect, database-architect, security-auditor Tier 2 (Inherit) - User Flexibility: - Secondary languages (6): javascript, scala, csharp, ruby, php, elixir - All frontend/mobile (8): frontend-developer (4), mobile-developer (2), flutter-expert, ios-developer - Specialized (6): observability-engineer (2), temporal-python-pro, arm-cortex-expert, context-manager (2), database-optimizer (2) - AI/ML, backend-architect, performance-engineer, quant/risk (existing) Tier 3 (Sonnet) - Intelligent Support: - Documentation (4): docs-architect (2), tutorial-engineer (2) - Testing (2): test-automator (2) - Developer experience (3): dx-optimizer (2), business-analyst - Modernization (4): legacy-modernizer (3), database-admin - Other support agents (existing) Tier 4 (Haiku) - Simple Operations: - SEO/Marketing (10): All SEO agents, content, search - Deployment (4): deployment-engineer (4 instances) - Debugging (5): debugger (2), error-detective (3) - DevOps (3): devops-troubleshooter (3) - Other simple operational tasks Rationale: - Opus 4.5 achieves 80.9% on SWE-bench with 65% fewer tokens on complex tasks - Production code deserves the best model: all language pros now on Opus - All code review uses Opus for maximum quality and security - Sonnet > Haiku (38 vs 31) ensures better intelligence for support tasks - Inherit tier gives users cost control for frontend, mobile, and specialized tasks Related: #136, #132 * feat: upgrade final 13 agents from Haiku to Sonnet Based on research into Haiku 4.5 vs Sonnet 4.5 capabilities, upgraded agents requiring deep analytical intelligence from Haiku to Sonnet. Research Findings: - Haiku 4.5: 73.3% SWE-bench, 3-5x faster, 1/3 cost, sub-200ms responses - Best for Haiku: Real-time apps, data extraction, templates, high-volume ops - Best for Sonnet: Complex reasoning, root cause analysis, strategic planning Agents Upgraded (13 total): - Debugging (5): debugger (2), error-detective (3) - Complex root cause analysis - DevOps (3): devops-troubleshooter (3) - System diagnostics & troubleshooting - Network (2): network-engineer (2) - Complex network analysis & optimization - API Documentation (2): api-documenter (2) - Deep API understanding required - Payments (1): payment-integration - Critical financial integration Final Distribution (153 total): - Tier 1 Opus: 42 agents (27.5%) - Production coding + critical architecture - Tier 2 Inherit: 42 agents (27.5%) - Complex tasks, user-choosable - Tier 3 Sonnet: 51 agents (33.3%) - Support tasks needing intelligence - Tier 4 Haiku: 18 agents (11.8%) - Fast operational tasks only Haiku Now Reserved For: - SEO/Marketing (8): Pattern matching, data extraction, content templates - Deployment (4): Operational execution tasks - Simple Docs (3): reference-builder, mermaid-expert, c4-code - Sales/Support (2): High-volume, template-based interactions - Search (1): Knowledge retrieval Sonnet > Haiku as requested (51 vs 18) Sources: - https://www.creolestudios.com/claude-haiku-4-5-vs-sonnet-4-5-comparison/ - https://www.anthropic.com/news/claude-haiku-4-5 - https://caylent.com/blog/claude-haiku-4-5-deep-dive-cost-capabilities-and-the-multi-agent-opportunity Related: #136 * docs: add cost considerations and clarify inherit behavior Addresses PR feedback: - Added comprehensive cost comparison for all model tiers - Documented how 'inherit' model works (uses session default, falls back to Sonnet) - Explained cost optimization strategies - Clarified when Opus token efficiency offsets higher rate This helps users make informed decisions about model selection and cost control.	2025-12-10 15:52:06 -05:00
Kunal Shah	1305e48672	Replace GPT and Claude models to latest, better and cheaper models (#118 ) * Updated GPT and Claude models to latest, better and cheaper models * updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models	2025-11-16 20:22:36 -05:00
Seth Hobson	65e5cb093a	feat: add Agent Skills and restructure documentation - Add 47 Agent Skills across 14 plugins following Anthropic's specification - Python (5): async patterns, testing, packaging, performance, UV package manager - JavaScript/TypeScript (4): advanced types, Node.js patterns, testing, modern JS - Kubernetes (4): manifests, Helm charts, GitOps, security policies - Cloud Infrastructure (4): Terraform, multi-cloud, hybrid networking, cost optimization - CI/CD (4): pipeline design, GitHub Actions, GitLab CI, secrets management - Backend (3): API design, architecture patterns, microservices - LLM Applications (4): LangChain, prompt engineering, RAG, evaluation - Blockchain/Web3 (4): DeFi protocols, NFT standards, Solidity security, Web3 testing - Framework Migration (4): React, Angular, database, dependency upgrades - Observability (4): Prometheus, Grafana, distributed tracing, SLO - Payment Processing (4): Stripe, PayPal, PCI compliance, billing - API Scaffolding (1): FastAPI templates - ML Operations (1): ML pipeline workflow - Security (1): SAST configuration - Restructure documentation into /docs directory - agent-skills.md: Complete guide to all 47 skills - agents.md: All 85 agents with model configuration - plugins.md: Complete catalog of 63 plugins - usage.md: Commands, workflows, and best practices - architecture.md: Design principles and patterns - Update README.md - Add Agent Skills banner announcement - Reduce length by ~75% with links to detailed docs - Add What's New section showcasing Agent Skills - Add Popular Use Cases with real examples - Improve navigation with Core Guides and Quick Links - Update marketplace.json with skills arrays for 14 plugins All 47 skills follow Agent Skills Specification: - Required YAML frontmatter (name, description) - Use when activation clauses - Progressive disclosure architecture - Under 1024 character descriptions	2025-10-16 20:33:27 -04:00
Seth Hobson	8346c1f2f7	refactor: migrate to model-agnostic Sonnet/Haiku architecture - Migrate all 48 Opus agents to Sonnet - Optimize 35 execution-focused agents for Haiku - Update README with hybrid orchestration patterns - Simplify model configuration to use agnostic aliases Final distribution: 97 Sonnet / 47 Haiku agents	2025-10-15 14:06:54 -04:00
Seth Hobson	20d4472a3b	Restructure marketplace for isolated plugin architecture - Organize 62 plugins into isolated directories under plugins/ - Consolidate tools and workflows into commands/ following Anthropic conventions - Update marketplace.json with isolated source paths for each plugin - Revise README to reflect plugin-based structure and token efficiency - Remove shared resource directories (agents/, tools/, workflows/) Each plugin now contains only its specific agents and commands, enabling granular installation and minimal token usage. Installing a single plugin loads only its resources rather than the entire marketplace. Structure: plugins/{plugin-name}/{agents/,commands/}	2025-10-13 10:19:10 -04:00

15 Commits