From 1305e486724958adc4b14927dbc92413cd24f004 Mon Sep 17 00:00:00 2001
From: Kunal Shah <shahkunal9@gmail.com>
Date: Mon, 17 Nov 2025 09:22:36 +0800
Subject: [PATCH] Replace GPT and Claude models to latest, better and cheaper
 models (#118)

* Updated GPT and Claude models to latest, better and cheaper models

* updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models
---
 .../commands/multi-agent-optimize.md                 |  6 +++---
 plugins/code-review-ai/commands/ai-review.md         | 12 ++++++------
 plugins/llm-application-dev/agents/ai-engineer.md    |  6 +++---
 .../llm-application-dev/agents/prompt-engineer.md    |  2 +-
 .../llm-application-dev/commands/prompt-optimize.md  |  6 +++---
 .../skills/llm-evaluation/SKILL.md                   |  6 +++---
 .../references/chain-of-thought.md                   |  2 +-
 .../performance-testing-review/commands/ai-review.md | 12 ++++++------
 8 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/plugins/agent-orchestration/commands/multi-agent-optimize.md b/plugins/agent-orchestration/commands/multi-agent-optimize.md
index dc4f12a..1ee1b24 100644
--- a/plugins/agent-orchestration/commands/multi-agent-optimize.md
+++ b/plugins/agent-orchestration/commands/multi-agent-optimize.md
@@ -132,9 +132,9 @@ class CostOptimizer:
         self.token_budget = 100000  # Monthly budget
         self.token_usage = 0
         self.model_costs = {
-            'gpt-4': 0.03,
-            'claude-3-sonnet': 0.015,
-            'claude-3-haiku': 0.0025
+            'gpt-5': 0.03,
+            'claude-4-sonnet': 0.015,
+            'claude-4-haiku': 0.0025
         }
 
     def select_optimal_model(self, complexity):
diff --git a/plugins/code-review-ai/commands/ai-review.md b/plugins/code-review-ai/commands/ai-review.md
index db5de35..51571bf 100644
--- a/plugins/code-review-ai/commands/ai-review.md
+++ b/plugins/code-review-ai/commands/ai-review.md
@@ -1,6 +1,6 @@
 # AI-Powered Code Review Specialist
 
-You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-4, Claude 3.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
+You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
 
 ## Context
 
@@ -30,7 +30,7 @@ Execute in parallel:
 
 ### AI-Assisted Review
 ```python
-# Context-aware review prompt for Claude 3.5 Sonnet
+# Context-aware review prompt for Claude 4.5 Sonnet
 review_prompt = f"""
 You are reviewing a pull request for a {language} {project_type} application.
 
@@ -59,8 +59,8 @@ Format as JSON array.
 ```
 
 ### Model Selection (2025)
-- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 3.5 Sonnet
-- **Deep reasoning**: Claude 3.7 Sonnet or GPT-4.5 (200K+ tokens)
+- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku
+- **Deep reasoning**: Claude 4.5 Sonnet or GPT-5 (200K+ tokens)
 - **Code generation**: GitHub Copilot or Qodo
 - **Multi-language**: Qodo or CodeAnt AI (30+ languages)
 
@@ -284,7 +284,7 @@ jobs:
           codeql database create codeql-db --language=javascript,python
           semgrep scan --config=auto --sarif --output=semgrep.sarif
 
-      - name: AI-Enhanced Review (GPT-4)
+      - name: AI-Enhanced Review (GPT-5)
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
         run: |
@@ -417,7 +417,7 @@ if __name__ == '__main__':
 
 Comprehensive AI code review combining:
 1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
-2. State-of-the-art LLMs (GPT-4, Claude 3.5 Sonnet)
+2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet)
 3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
 4. 30+ language support with language-specific linters
 5. Actionable review comments with severity and fix examples
diff --git a/plugins/llm-application-dev/agents/ai-engineer.md b/plugins/llm-application-dev/agents/ai-engineer.md
index d84b45c..fbda80b 100644
--- a/plugins/llm-application-dev/agents/ai-engineer.md
+++ b/plugins/llm-application-dev/agents/ai-engineer.md
@@ -13,7 +13,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 
 ### LLM Integration & Model Management
 - OpenAI GPT-4o/4o-mini, o1-preview, o1-mini with function calling and structured outputs
-- Anthropic Claude 3.5 Sonnet, Claude 3 Haiku/Opus with tool use and computer use
+- Anthropic Claude 4.5 Sonnet/Haiku, Claude 4.1 Opus with tool use and computer use
 - Open-source models: Llama 3.1/3.2, Mixtral 8x7B/8x22B, Qwen 2.5, DeepSeek-V2
 - Local deployment with Ollama, vLLM, TGI (Text Generation Inference)
 - Model serving with TorchServe, MLflow, BentoML for production deployment
@@ -68,7 +68,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases
 
 ### Multimodal AI Integration
-- Vision models: GPT-4V, Claude 3 Vision, LLaVA, CLIP for image understanding
+- Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding
 - Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
 - Document AI: OCR, table extraction, layout understanding with models like LayoutLM
 - Video analysis and processing for multimedia applications
@@ -111,7 +111,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Balances cutting-edge techniques with proven, stable solutions
 
 ## Knowledge Base
-- Latest LLM developments and model capabilities (GPT-4o, Claude 3.5, Llama 3.2)
+- Latest LLM developments and model capabilities (GPT-4o, Claude 4.5, Llama 3.2)
 - Modern vector database architectures and optimization techniques
 - Production AI system design patterns and best practices
 - AI safety and security considerations for enterprise deployments
diff --git a/plugins/llm-application-dev/agents/prompt-engineer.md b/plugins/llm-application-dev/agents/prompt-engineer.md
index aba9c33..f209499 100644
--- a/plugins/llm-application-dev/agents/prompt-engineer.md
+++ b/plugins/llm-application-dev/agents/prompt-engineer.md
@@ -53,7 +53,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Multi-turn conversation management
 - Image and multimodal prompt engineering
 
-#### Anthropic Claude (3.5 Sonnet, Haiku, Opus)
+#### Anthropic Claude (4.5 Sonnet, Haiku, Opus)
 - Constitutional AI alignment with Claude's training
 - Tool use optimization for complex workflows
 - Computer use prompting for automation tasks
diff --git a/plugins/llm-application-dev/commands/prompt-optimize.md b/plugins/llm-application-dev/commands/prompt-optimize.md
index 1b46e49..0afc30b 100644
--- a/plugins/llm-application-dev/commands/prompt-optimize.md
+++ b/plugins/llm-application-dev/commands/prompt-optimize.md
@@ -113,7 +113,7 @@ Final Response: [Refined]
 
 ### 5. Model-Specific Optimization
 
-**GPT-4/GPT-4o**
+**GPT-5/GPT-4o**
 ```python
 gpt4_optimized = """
 ##CONTEXT##
@@ -136,7 +136,7 @@ gpt4_optimized = """
 """
 ```
 
-**Claude 3.5/4**
+**Claude 4.5/4**
 ```python
 claude_optimized = """
 <context>
@@ -566,7 +566,7 @@ testing_recommendations:
   metrics: ["accuracy", "satisfaction", "cost"]
 
 deployment_strategy:
-  model: "GPT-4 for quality, Claude for safety"
+  model: "GPT-5 for quality, Claude for safety"
   temperature: 0.7
   max_tokens: 2000
   monitoring: "Track success, latency, feedback"
diff --git a/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md b/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md
index 06a1e71..23638d9 100644
--- a/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md
+++ b/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md
@@ -186,7 +186,7 @@ def calculate_factuality(claim, knowledge_base):
 ### Single Output Evaluation
 ```python
 def llm_judge_quality(response, question):
-    """Use GPT-4 to judge response quality."""
+    """Use GPT-5 to judge response quality."""
     prompt = f"""Rate the following response on a scale of 1-10 for:
 1. Accuracy (factually correct)
 2. Helpfulness (answers the question)
@@ -205,7 +205,7 @@ Provide ratings in JSON format:
 """
 
     result = openai.ChatCompletion.create(
-        model="gpt-4",
+        model="gpt-5",
         messages=[{"role": "user", "content": prompt}],
         temperature=0
     )
@@ -236,7 +236,7 @@ Answer with JSON:
 """
 
     result = openai.ChatCompletion.create(
-        model="gpt-4",
+        model="gpt-5",
         messages=[{"role": "user", "content": prompt}],
         temperature=0
     )
diff --git a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md
index 4f48d4d..459a361 100644
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md
@@ -65,7 +65,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
     responses = []
     for _ in range(n):
         response = openai.ChatCompletion.create(
-            model="gpt-4",
+            model="gpt-5",
             messages=[{"role": "user", "content": prompt}],
             temperature=temperature
         )
diff --git a/plugins/performance-testing-review/commands/ai-review.md b/plugins/performance-testing-review/commands/ai-review.md
index db5de35..26c9657 100644
--- a/plugins/performance-testing-review/commands/ai-review.md
+++ b/plugins/performance-testing-review/commands/ai-review.md
@@ -1,6 +1,6 @@
 # AI-Powered Code Review Specialist
 
-You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-4, Claude 3.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
+You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
 
 ## Context
 
@@ -30,7 +30,7 @@ Execute in parallel:
 
 ### AI-Assisted Review
 ```python
-# Context-aware review prompt for Claude 3.5 Sonnet
+# Context-aware review prompt for Claude 4.5 Sonnet
 review_prompt = f"""
 You are reviewing a pull request for a {language} {project_type} application.
 
@@ -59,8 +59,8 @@ Format as JSON array.
 ```
 
 ### Model Selection (2025)
-- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 3.5 Sonnet
-- **Deep reasoning**: Claude 3.7 Sonnet or GPT-4.5 (200K+ tokens)
+- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku
+- **Deep reasoning**: Claude 4.5 Sonnet or GPT-4.5 (200K+ tokens)
 - **Code generation**: GitHub Copilot or Qodo
 - **Multi-language**: Qodo or CodeAnt AI (30+ languages)
 
@@ -284,7 +284,7 @@ jobs:
           codeql database create codeql-db --language=javascript,python
           semgrep scan --config=auto --sarif --output=semgrep.sarif
 
-      - name: AI-Enhanced Review (GPT-4)
+      - name: AI-Enhanced Review (GPT-5)
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
         run: |
@@ -417,7 +417,7 @@ if __name__ == '__main__':
 
 Comprehensive AI code review combining:
 1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
-2. State-of-the-art LLMs (GPT-4, Claude 3.5 Sonnet)
+2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet)
 3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
 4. 30+ language support with language-specific linters
 5. Actionable review comments with severity and fix examples