- Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns
- Update model references to Claude 4.5 and GPT-5.2
- Add Voyage AI as primary embedding recommendation
- Add structured outputs with Pydantic
- Replace deprecated initialize_agent() with StateGraph
- Fix security: use AST-based safe math instead of unsafe execution
- Add plugin.json and README.md for consistency
- Bump marketplace version to 1.3.3
- Avoid re-evaluating the current prompt if metrics are already available from the previous iteration.
- Pass metrics from the best variation to the next iteration.
- Reduces N-1 expensive LLM calls in an N-iteration optimization loop.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer
💡 What:
Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`.
🎯 Why:
The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools.
📊 Impact:
Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM.
🔬 Measurement:
Ran a benchmark script executing `evaluate_prompt` 500 times.
Before: 5.36s
After: 3.76s
* ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer
💡 What:
Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method for proper cleanup.
🎯 Why:
The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools.
📊 Impact:
Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM.
🔬 Measurement:
Ran a benchmark script executing `evaluate_prompt` 500 times.
Before: 5.36s
After: 3.76s
* ⚡ Bolt: Reuse ThreadPoolExecutor in PromptOptimizer
💡 What:
Initialized `ThreadPoolExecutor` in `PromptOptimizer.__init__` and reused it in `evaluate_prompt`. Added a `shutdown` method and wrapped execution in `try...finally` for proper resource management.
🎯 Why:
The previous implementation created a new `ThreadPoolExecutor` for every call to `evaluate_prompt`. Since `evaluate_prompt` is called repeatedly inside the `optimize` loop (and for every variation), this caused significant overhead from repeatedly creating and destroying thread pools.
📊 Impact:
Benchmark showed a reduction in execution time from ~5.36s to ~3.76s (~30% improvement) for 500 iterations with a mocked LLM.
🔬 Measurement:
Ran a benchmark script executing `evaluate_prompt` 500 times.
Before: 5.36s
After: 3.76s
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* feat: Parallelize prompt evaluation in optimize-prompt.py
- Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing
- Significantly reduces total execution time when using high-latency LLM clients (network IO bound)
- Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results
- This prepares the script for real-world usage where sequential execution is a major bottleneck
⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests.
* feat: Parallelize prompt evaluation in optimize-prompt.py
- Update `PromptOptimizer.evaluate_prompt` to use `ThreadPoolExecutor` for concurrent test case processing
- Significantly reduces total execution time when using high-latency LLM clients (network IO bound)
- Maintain accurate metric aggregation (latency, accuracy, token count) from parallel results
- Ensure no generated artifacts (`optimization_results.json`) are committed
⚡ Bolt: Reduces total evaluation time from O(n) to O(1) latency-wise (bounded by max_workers) for concurrent requests.
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* Updated GPT and Claude models to latest, better and cheaper models
* updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models