chore: update model references to Claude 4.6 and GPT-5.2

- Claude Opus 4.5 → Opus 4.6, Claude Sonnet 4.5 → Sonnet 4.6 (Haiku stays 4.5) - Update claude-sonnet-4-5 model IDs to claude-sonnet-4-6 in code examples - Update SWE-bench stat from 80.9% to 80.8% for Opus 4.6 - Update GPT refs: GPT-5 → GPT-5.2, GPT-4o → gpt-5.2, GPT-4o-mini → GPT-5-mini - Fix GPT-5.2-mini → GPT-5-mini (correct model name per OpenAI) - Bump marketplace to v1.5.2 and affected plugin versions
2026-03-18 09:37:15 +00:00 · 2026-02-19 14:03:46 -05:00
parent 5d65aa1063
commit 086557180a
19 changed files with 62 additions and 62 deletions
--- a/plugins/llm-application-dev/commands/langchain-agent.md
+++ b/plugins/llm-application-dev/commands/langchain-agent.md
@@ -37,7 +37,7 @@ class AgentState(TypedDict):

 ### Model & Embeddings

- **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
+- **Primary LLM**: Claude Sonnet 4.6 (`claude-sonnet-4-6`)
 - **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
 - **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)

@@ -158,7 +158,7 @@ from langsmith.evaluation import evaluate
 # Run evaluation suite
 eval_config = RunEvalConfig(
    evaluators=["qa", "context_qa", "cot_qa"],
-    eval_llm=ChatAnthropic(model="claude-sonnet-4-5")
+    eval_llm=ChatAnthropic(model="claude-sonnet-4-6")
 )

 results = await evaluate(
@@ -209,7 +209,7 @@ async def call_with_retry():

 ## Implementation Checklist

- [ ] Initialize LLM with Claude Sonnet 4.5
+- [ ] Initialize LLM with Claude Sonnet 4.6
 - [ ] Setup Voyage AI embeddings (voyage-3-large)
 - [ ] Create tools with async support and error handling
 - [ ] Implement memory system (choose type based on use case)
--- a/plugins/llm-application-dev/commands/prompt-optimize.md
+++ b/plugins/llm-application-dev/commands/prompt-optimize.md
@@ -150,7 +150,7 @@ gpt5_optimized = """

 ````

-**Claude 4.5/4**
+**Claude 4.6/4.5**
 ```python
 claude_optimized = """
 <context>
@@ -607,7 +607,7 @@ testing_recommendations:
  metrics: ["accuracy", "satisfaction", "cost"]

 deployment_strategy:
-  model: "GPT-5.2 for quality, Claude 4.5 for safety"
+  model: "GPT-5.2 for quality, Claude 4.6 for safety"
  temperature: 0.7
  max_tokens: 2000
  monitoring: "Track success, latency, feedback"