- Claude Opus 4.5 → Opus 4.6, Claude Sonnet 4.5 → Sonnet 4.6 (Haiku stays 4.5)
- Update claude-sonnet-4-5 model IDs to claude-sonnet-4-6 in code examples
- Update SWE-bench stat from 80.9% to 80.8% for Opus 4.6
- Update GPT refs: GPT-5 → GPT-5.2, GPT-4o → gpt-5.2, GPT-4o-mini → GPT-5-mini
- Fix GPT-5.2-mini → GPT-5-mini (correct model name per OpenAI)
- Bump marketplace to v1.5.2 and affected plugin versions
Sync marketplace.json versions with plugin.json for all 14 touched
plugins. Fix plugin.json versions for llm-application-dev (2.0.3),
startup-business-analyst (1.0.4), and ui-design (1.0.2) to match
marketplace lineage. Add dotnet-contribution to marketplace.
Rewrites 14 commands across 11 plugins to remove all cross-plugin
subagent_type references (e.g., "unit-testing::test-automator"), which
break when plugins are installed standalone. Each command now uses only
local bundled agents or general-purpose with role context in the prompt.
All rewritten commands follow conductor-style patterns:
- CRITICAL BEHAVIORAL RULES with strong directives
- State files for session tracking and resume support
- Phase checkpoints requiring explicit user approval
- File-based context passing between steps
Also fixes 4 plugin.json files missing version/license fields and adds
plugin.json for dotnet-contribution.
Closes#433
- Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns
- Update model references to Claude 4.5 and GPT-5.2
- Add Voyage AI as primary embedding recommendation
- Add structured outputs with Pydantic
- Replace deprecated initialize_agent() with StateGraph
- Fix security: use AST-based safe math instead of unsafe execution
- Add plugin.json and README.md for consistency
- Bump marketplace version to 1.3.3
- Avoid re-evaluating the current prompt if metrics are already available from the previous iteration.
- Pass metrics from the best variation to the next iteration.
- Reduces N-1 expensive LLM calls in an N-iteration optimization loop.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer
💡 What:
Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`.
🎯 Why:
The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools.
📊 Impact:
Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM.
🔬 Measurement:
Ran a benchmark script executing `evaluate_prompt` 500 times.
Before: 5.36s
After: 3.76s
* ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer
💡 What:
Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method for proper cleanup.
🎯 Why:
The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools.
📊 Impact:
Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM.
🔬 Measurement:
Ran a benchmark script executing `evaluate_prompt` 500 times.
Before: 5.36s
After: 3.76s
* ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer
💡 What:
Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method and wrapped execution in `try...finally` for proper resource management.
🎯 Why:
The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools.
📊 Impact:
Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM.
🔬 Measurement:
Ran a benchmark script executing `evaluate_prompt` 500 times.
Before: 5.36s
After: 3.76s
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* feat: Parallelize prompt evaluation in optimize-prompt.py
- Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing
- Significantly reduces total execution time when using high-latency LLM clients (network IO bound)
- Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results
- This prepares the script for real-world usage where sequential execution is a major bottleneck
⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests.
* feat: Parallelize prompt evaluation in optimize-prompt.py
- Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing
- Significantly reduces total execution time when using high-latency LLM clients (network IO bound)
- Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results
- Ensure no generated artifacts (`optimization_results.json`) are committed
⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests.
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* Updated GPT and Claude models to latest, better and cheaper models
* updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models
- Migrate all 48 Opus agents to Sonnet
- Optimize 35 execution-focused agents for Haiku
- Update README with hybrid orchestration patterns
- Simplify model configuration to use agnostic aliases
Final distribution: 97 Sonnet / 47 Haiku agents
- Organize 62 plugins into isolated directories under plugins/
- Consolidate tools and workflows into commands/ following Anthropic conventions
- Update marketplace.json with isolated source paths for each plugin
- Revise README to reflect plugin-based structure and token efficiency
- Remove shared resource directories (agents/, tools/, workflows/)
Each plugin now contains only its specific agents and commands, enabling
granular installation and minimal token usage. Installing a single plugin
loads only its resources rather than the entire marketplace.
Structure: plugins/{plugin-name}/{agents/,commands/}