From 1305e486724958adc4b14927dbc92413cd24f004 Mon Sep 17 00:00:00 2001 From: Kunal Shah Date: Mon, 17 Nov 2025 09:22:36 +0800 Subject: [PATCH] Replace GPT and Claude models to latest, better and cheaper models (#118) * Updated GPT and Claude models to latest, better and cheaper models * updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models --- .../commands/multi-agent-optimize.md | 6 +++--- plugins/code-review-ai/commands/ai-review.md | 12 ++++++------ plugins/llm-application-dev/agents/ai-engineer.md | 6 +++--- .../llm-application-dev/agents/prompt-engineer.md | 2 +- .../llm-application-dev/commands/prompt-optimize.md | 6 +++--- .../skills/llm-evaluation/SKILL.md | 6 +++--- .../references/chain-of-thought.md | 2 +- .../performance-testing-review/commands/ai-review.md | 12 ++++++------ 8 files changed, 26 insertions(+), 26 deletions(-) diff --git a/plugins/agent-orchestration/commands/multi-agent-optimize.md b/plugins/agent-orchestration/commands/multi-agent-optimize.md index dc4f12a..1ee1b24 100644 --- a/plugins/agent-orchestration/commands/multi-agent-optimize.md +++ b/plugins/agent-orchestration/commands/multi-agent-optimize.md @@ -132,9 +132,9 @@ class CostOptimizer: self.token_budget = 100000 # Monthly budget self.token_usage = 0 self.model_costs = { - 'gpt-4': 0.03, - 'claude-3-sonnet': 0.015, - 'claude-3-haiku': 0.0025 + 'gpt-5': 0.03, + 'claude-4-sonnet': 0.015, + 'claude-4-haiku': 0.0025 } def select_optimal_model(self, complexity): diff --git a/plugins/code-review-ai/commands/ai-review.md b/plugins/code-review-ai/commands/ai-review.md index db5de35..51571bf 100644 --- a/plugins/code-review-ai/commands/ai-review.md +++ b/plugins/code-review-ai/commands/ai-review.md @@ -1,6 +1,6 @@ # AI-Powered Code Review Specialist -You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-4, Claude 3.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues. +You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues. ## Context @@ -30,7 +30,7 @@ Execute in parallel: ### AI-Assisted Review ```python -# Context-aware review prompt for Claude 3.5 Sonnet +# Context-aware review prompt for Claude 4.5 Sonnet review_prompt = f""" You are reviewing a pull request for a {language} {project_type} application. @@ -59,8 +59,8 @@ Format as JSON array. ``` ### Model Selection (2025) -- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 3.5 Sonnet -- **Deep reasoning**: Claude 3.7 Sonnet or GPT-4.5 (200K+ tokens) +- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku +- **Deep reasoning**: Claude 4.5 Sonnet or GPT-5 (200K+ tokens) - **Code generation**: GitHub Copilot or Qodo - **Multi-language**: Qodo or CodeAnt AI (30+ languages) @@ -284,7 +284,7 @@ jobs: codeql database create codeql-db --language=javascript,python semgrep scan --config=auto --sarif --output=semgrep.sarif - - name: AI-Enhanced Review (GPT-4) + - name: AI-Enhanced Review (GPT-5) env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | @@ -417,7 +417,7 @@ if __name__ == '__main__': Comprehensive AI code review combining: 1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep) -2. State-of-the-art LLMs (GPT-4, Claude 3.5 Sonnet) +2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet) 3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps) 4. 30+ language support with language-specific linters 5. Actionable review comments with severity and fix examples diff --git a/plugins/llm-application-dev/agents/ai-engineer.md b/plugins/llm-application-dev/agents/ai-engineer.md index d84b45c..fbda80b 100644 --- a/plugins/llm-application-dev/agents/ai-engineer.md +++ b/plugins/llm-application-dev/agents/ai-engineer.md @@ -13,7 +13,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and ### LLM Integration & Model Management - OpenAI GPT-4o/4o-mini, o1-preview, o1-mini with function calling and structured outputs -- Anthropic Claude 3.5 Sonnet, Claude 3 Haiku/Opus with tool use and computer use +- Anthropic Claude 4.5 Sonnet/Haiku, Claude 4.1 Opus with tool use and computer use - Open-source models: Llama 3.1/3.2, Mixtral 8x7B/8x22B, Qwen 2.5, DeepSeek-V2 - Local deployment with Ollama, vLLM, TGI (Text Generation Inference) - Model serving with TorchServe, MLflow, BentoML for production deployment @@ -68,7 +68,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and - Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases ### Multimodal AI Integration -- Vision models: GPT-4V, Claude 3 Vision, LLaVA, CLIP for image understanding +- Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding - Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech - Document AI: OCR, table extraction, layout understanding with models like LayoutLM - Video analysis and processing for multimedia applications @@ -111,7 +111,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and - Balances cutting-edge techniques with proven, stable solutions ## Knowledge Base -- Latest LLM developments and model capabilities (GPT-4o, Claude 3.5, Llama 3.2) +- Latest LLM developments and model capabilities (GPT-4o, Claude 4.5, Llama 3.2) - Modern vector database architectures and optimization techniques - Production AI system design patterns and best practices - AI safety and security considerations for enterprise deployments diff --git a/plugins/llm-application-dev/agents/prompt-engineer.md b/plugins/llm-application-dev/agents/prompt-engineer.md index aba9c33..f209499 100644 --- a/plugins/llm-application-dev/agents/prompt-engineer.md +++ b/plugins/llm-application-dev/agents/prompt-engineer.md @@ -53,7 +53,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM - Multi-turn conversation management - Image and multimodal prompt engineering -#### Anthropic Claude (3.5 Sonnet, Haiku, Opus) +#### Anthropic Claude (4.5 Sonnet, Haiku, Opus) - Constitutional AI alignment with Claude's training - Tool use optimization for complex workflows - Computer use prompting for automation tasks diff --git a/plugins/llm-application-dev/commands/prompt-optimize.md b/plugins/llm-application-dev/commands/prompt-optimize.md index 1b46e49..0afc30b 100644 --- a/plugins/llm-application-dev/commands/prompt-optimize.md +++ b/plugins/llm-application-dev/commands/prompt-optimize.md @@ -113,7 +113,7 @@ Final Response: [Refined] ### 5. Model-Specific Optimization -**GPT-4/GPT-4o** +**GPT-5/GPT-4o** ```python gpt4_optimized = """ ##CONTEXT## @@ -136,7 +136,7 @@ gpt4_optimized = """ """ ``` -**Claude 3.5/4** +**Claude 4.5/4** ```python claude_optimized = """ @@ -566,7 +566,7 @@ testing_recommendations: metrics: ["accuracy", "satisfaction", "cost"] deployment_strategy: - model: "GPT-4 for quality, Claude for safety" + model: "GPT-5 for quality, Claude for safety" temperature: 0.7 max_tokens: 2000 monitoring: "Track success, latency, feedback" diff --git a/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md b/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md index 06a1e71..23638d9 100644 --- a/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md +++ b/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md @@ -186,7 +186,7 @@ def calculate_factuality(claim, knowledge_base): ### Single Output Evaluation ```python def llm_judge_quality(response, question): - """Use GPT-4 to judge response quality.""" + """Use GPT-5 to judge response quality.""" prompt = f"""Rate the following response on a scale of 1-10 for: 1. Accuracy (factually correct) 2. Helpfulness (answers the question) @@ -205,7 +205,7 @@ Provide ratings in JSON format: """ result = openai.ChatCompletion.create( - model="gpt-4", + model="gpt-5", messages=[{"role": "user", "content": prompt}], temperature=0 ) @@ -236,7 +236,7 @@ Answer with JSON: """ result = openai.ChatCompletion.create( - model="gpt-4", + model="gpt-5", messages=[{"role": "user", "content": prompt}], temperature=0 ) diff --git a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md index 4f48d4d..459a361 100644 --- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md +++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md @@ -65,7 +65,7 @@ def self_consistency_cot(query, n=5, temperature=0.7): responses = [] for _ in range(n): response = openai.ChatCompletion.create( - model="gpt-4", + model="gpt-5", messages=[{"role": "user", "content": prompt}], temperature=temperature ) diff --git a/plugins/performance-testing-review/commands/ai-review.md b/plugins/performance-testing-review/commands/ai-review.md index db5de35..26c9657 100644 --- a/plugins/performance-testing-review/commands/ai-review.md +++ b/plugins/performance-testing-review/commands/ai-review.md @@ -1,6 +1,6 @@ # AI-Powered Code Review Specialist -You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-4, Claude 3.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues. +You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues. ## Context @@ -30,7 +30,7 @@ Execute in parallel: ### AI-Assisted Review ```python -# Context-aware review prompt for Claude 3.5 Sonnet +# Context-aware review prompt for Claude 4.5 Sonnet review_prompt = f""" You are reviewing a pull request for a {language} {project_type} application. @@ -59,8 +59,8 @@ Format as JSON array. ``` ### Model Selection (2025) -- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 3.5 Sonnet -- **Deep reasoning**: Claude 3.7 Sonnet or GPT-4.5 (200K+ tokens) +- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku +- **Deep reasoning**: Claude 4.5 Sonnet or GPT-4.5 (200K+ tokens) - **Code generation**: GitHub Copilot or Qodo - **Multi-language**: Qodo or CodeAnt AI (30+ languages) @@ -284,7 +284,7 @@ jobs: codeql database create codeql-db --language=javascript,python semgrep scan --config=auto --sarif --output=semgrep.sarif - - name: AI-Enhanced Review (GPT-4) + - name: AI-Enhanced Review (GPT-5) env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | @@ -417,7 +417,7 @@ if __name__ == '__main__': Comprehensive AI code review combining: 1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep) -2. State-of-the-art LLMs (GPT-4, Claude 3.5 Sonnet) +2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet) 3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps) 4. 30+ language support with language-specific linters 5. Actionable review comments with severity and fix examples