agents

mirror of https://github.com/wshobson/agents.git synced 2026-03-18 17:47:16 +00:00

Author	SHA1	Message	Date
Seth Hobson	47a5dbc3f9	fix(skills): remove phantom resource references and fix CoC links (#447 ) Remove references to non-existent resource files (references/, assets/, scripts/, examples/) from 115 skill SKILL.md files. These sections pointed to directories and files that were never created, causing confusion when users install skills. Also fix broken Code of Conduct links in issue templates to use absolute GitHub URLs instead of relative paths that 404.	2026-03-07 10:53:17 -05:00
Seth Hobson	086557180a	chore: update model references to Claude 4.6 and GPT-5.2 - Claude Opus 4.5 → Opus 4.6, Claude Sonnet 4.5 → Sonnet 4.6 (Haiku stays 4.5) - Update claude-sonnet-4-5 model IDs to claude-sonnet-4-6 in code examples - Update SWE-bench stat from 80.9% to 80.8% for Opus 4.6 - Update GPT refs: GPT-5 → GPT-5.2, GPT-4o → gpt-5.2, GPT-4o-mini → GPT-5-mini - Fix GPT-5.2-mini → GPT-5-mini (correct model name per OpenAI) - Bump marketplace to v1.5.2 and affected plugin versions	2026-02-19 14:03:46 -05:00
Seth Hobson	56848874a2	style: format all files with prettier	2026-01-19 17:07:03 -05:00
Seth Hobson	8be0e8ac7a	feat(llm-application-dev): modernize to LangGraph and latest models v2.0.0 - Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns - Update model references to Claude 4.5 and GPT-5.2 - Add Voyage AI as primary embedding recommendation - Add structured outputs with Pydantic - Replace deprecated initialize_agent() with StateGraph - Fix security: use AST-based safe math instead of unsafe execution - Add plugin.json and README.md for consistency - Bump marketplace version to 1.3.3	2026-01-19 15:43:25 -05:00
google-labs-jules[bot]	a86384334b	⚡ Bolt: optimize prompt evaluation loop to skip redundant calls (#152 ) - Avoid re-evaluating the current prompt if metrics are already available from the previous iteration. - Pass metrics from the best variation to the next iteration. - Reduces N-1 expensive LLM calls in an N-iteration optimization loop. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2025-12-21 19:02:37 -05:00
google-labs-jules[bot]	fda45604b7	⚡ Bolt: Optimize PromptOptimizer thread pool usage (#147 ) * ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer 💡 What: Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. 🎯 Why: The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools. 📊 Impact: Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM. 🔬 Measurement: Ran a benchmark script executing `evaluate_prompt` 500 times. Before: 5.36s After: 3.76s * ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer 💡 What: Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method for proper cleanup. 🎯 Why: The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools. 📊 Impact: Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM. 🔬 Measurement: Ran a benchmark script executing `evaluate_prompt` 500 times. Before: 5.36s After: 3.76s * ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer 💡 What: Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method and wrapped execution in `try...finally` for proper resource management. 🎯 Why: The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools. 📊 Impact: Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM. 🔬 Measurement: Ran a benchmark script executing `evaluate_prompt` 500 times. Before: 5.36s After: 3.76s --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2025-12-20 21:28:39 -05:00
google-labs-jules[bot]	70cf3f3682	⚡ Bolt: Parallelize Prompt Evaluation in optimize-prompt.py (#145 ) * feat: Parallelize prompt evaluation in optimize-prompt.py - Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing - Significantly reduces total execution time when using high-latency LLM clients (network IO bound) - Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results - This prepares the script for real-world usage where sequential execution is a major bottleneck ⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests. * feat: Parallelize prompt evaluation in optimize-prompt.py - Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing - Significantly reduces total execution time when using high-latency LLM clients (network IO bound) - Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results - Ensure no generated artifacts (`optimization_results.json`) are committed ⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2025-12-19 09:12:15 -05:00
Seth Hobson	01d93fc227	feat: add 5 new specialized agents with 20 skills Add domain expert agents with comprehensive skill sets: - service-mesh-expert (cloud-infrastructure): Istio/Linkerd patterns, mTLS, observability - event-sourcing-architect (backend-development): CQRS, event stores, projections, sagas - vector-database-engineer (llm-application-dev): embeddings, similarity search, hybrid search - monorepo-architect (developer-essentials): Nx, Turborepo, Bazel, pnpm workspaces - threat-modeling-expert (security-scanning): STRIDE, attack trees, security requirements Update all documentation to reflect correct counts: - 67 plugins, 99 agents, 107 skills, 71 commands	2025-12-16 16:00:58 -05:00
Kunal Shah	1305e48672	Replace GPT and Claude models to latest, better and cheaper models (#118 ) * Updated GPT and Claude models to latest, better and cheaper models * updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models	2025-11-16 20:22:36 -05:00
Seth Hobson	65e5cb093a	feat: add Agent Skills and restructure documentation - Add 47 Agent Skills across 14 plugins following Anthropic's specification - Python (5): async patterns, testing, packaging, performance, UV package manager - JavaScript/TypeScript (4): advanced types, Node.js patterns, testing, modern JS - Kubernetes (4): manifests, Helm charts, GitOps, security policies - Cloud Infrastructure (4): Terraform, multi-cloud, hybrid networking, cost optimization - CI/CD (4): pipeline design, GitHub Actions, GitLab CI, secrets management - Backend (3): API design, architecture patterns, microservices - LLM Applications (4): LangChain, prompt engineering, RAG, evaluation - Blockchain/Web3 (4): DeFi protocols, NFT standards, Solidity security, Web3 testing - Framework Migration (4): React, Angular, database, dependency upgrades - Observability (4): Prometheus, Grafana, distributed tracing, SLO - Payment Processing (4): Stripe, PayPal, PCI compliance, billing - API Scaffolding (1): FastAPI templates - ML Operations (1): ML pipeline workflow - Security (1): SAST configuration - Restructure documentation into /docs directory - agent-skills.md: Complete guide to all 47 skills - agents.md: All 85 agents with model configuration - plugins.md: Complete catalog of 63 plugins - usage.md: Commands, workflows, and best practices - architecture.md: Design principles and patterns - Update README.md - Add Agent Skills banner announcement - Reduce length by ~75% with links to detailed docs - Add What's New section showcasing Agent Skills - Add Popular Use Cases with real examples - Improve navigation with Core Guides and Quick Links - Update marketplace.json with skills arrays for 14 plugins All 47 skills follow Agent Skills Specification: - Required YAML frontmatter (name, description) - Use when activation clauses - Progressive disclosure architecture - Under 1024 character descriptions	2025-10-16 20:33:27 -04:00

10 Commits