mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 09:37:15 +00:00
Replace GPT and Claude models to latest, better and cheaper models (#118)
* Updated GPT and Claude models to latest, better and cheaper models * updated more files to use GPT-5 and Sonnet/Haiku 4.5 because theu are the latest, cheaper and better models
This commit is contained in:
@@ -132,9 +132,9 @@ class CostOptimizer:
|
|||||||
self.token_budget = 100000 # Monthly budget
|
self.token_budget = 100000 # Monthly budget
|
||||||
self.token_usage = 0
|
self.token_usage = 0
|
||||||
self.model_costs = {
|
self.model_costs = {
|
||||||
'gpt-4': 0.03,
|
'gpt-5': 0.03,
|
||||||
'claude-3-sonnet': 0.015,
|
'claude-4-sonnet': 0.015,
|
||||||
'claude-3-haiku': 0.0025
|
'claude-4-haiku': 0.0025
|
||||||
}
|
}
|
||||||
|
|
||||||
def select_optimal_model(self, complexity):
|
def select_optimal_model(self, complexity):
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# AI-Powered Code Review Specialist
|
# AI-Powered Code Review Specialist
|
||||||
|
|
||||||
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-4, Claude 3.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
@@ -30,7 +30,7 @@ Execute in parallel:
|
|||||||
|
|
||||||
### AI-Assisted Review
|
### AI-Assisted Review
|
||||||
```python
|
```python
|
||||||
# Context-aware review prompt for Claude 3.5 Sonnet
|
# Context-aware review prompt for Claude 4.5 Sonnet
|
||||||
review_prompt = f"""
|
review_prompt = f"""
|
||||||
You are reviewing a pull request for a {language} {project_type} application.
|
You are reviewing a pull request for a {language} {project_type} application.
|
||||||
|
|
||||||
@@ -59,8 +59,8 @@ Format as JSON array.
|
|||||||
```
|
```
|
||||||
|
|
||||||
### Model Selection (2025)
|
### Model Selection (2025)
|
||||||
- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 3.5 Sonnet
|
- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku
|
||||||
- **Deep reasoning**: Claude 3.7 Sonnet or GPT-4.5 (200K+ tokens)
|
- **Deep reasoning**: Claude 4.5 Sonnet or GPT-5 (200K+ tokens)
|
||||||
- **Code generation**: GitHub Copilot or Qodo
|
- **Code generation**: GitHub Copilot or Qodo
|
||||||
- **Multi-language**: Qodo or CodeAnt AI (30+ languages)
|
- **Multi-language**: Qodo or CodeAnt AI (30+ languages)
|
||||||
|
|
||||||
@@ -284,7 +284,7 @@ jobs:
|
|||||||
codeql database create codeql-db --language=javascript,python
|
codeql database create codeql-db --language=javascript,python
|
||||||
semgrep scan --config=auto --sarif --output=semgrep.sarif
|
semgrep scan --config=auto --sarif --output=semgrep.sarif
|
||||||
|
|
||||||
- name: AI-Enhanced Review (GPT-4)
|
- name: AI-Enhanced Review (GPT-5)
|
||||||
env:
|
env:
|
||||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||||
run: |
|
run: |
|
||||||
@@ -417,7 +417,7 @@ if __name__ == '__main__':
|
|||||||
|
|
||||||
Comprehensive AI code review combining:
|
Comprehensive AI code review combining:
|
||||||
1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
|
1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
|
||||||
2. State-of-the-art LLMs (GPT-4, Claude 3.5 Sonnet)
|
2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet)
|
||||||
3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
|
3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
|
||||||
4. 30+ language support with language-specific linters
|
4. 30+ language support with language-specific linters
|
||||||
5. Actionable review comments with severity and fix examples
|
5. Actionable review comments with severity and fix examples
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
|||||||
|
|
||||||
### LLM Integration & Model Management
|
### LLM Integration & Model Management
|
||||||
- OpenAI GPT-4o/4o-mini, o1-preview, o1-mini with function calling and structured outputs
|
- OpenAI GPT-4o/4o-mini, o1-preview, o1-mini with function calling and structured outputs
|
||||||
- Anthropic Claude 3.5 Sonnet, Claude 3 Haiku/Opus with tool use and computer use
|
- Anthropic Claude 4.5 Sonnet/Haiku, Claude 4.1 Opus with tool use and computer use
|
||||||
- Open-source models: Llama 3.1/3.2, Mixtral 8x7B/8x22B, Qwen 2.5, DeepSeek-V2
|
- Open-source models: Llama 3.1/3.2, Mixtral 8x7B/8x22B, Qwen 2.5, DeepSeek-V2
|
||||||
- Local deployment with Ollama, vLLM, TGI (Text Generation Inference)
|
- Local deployment with Ollama, vLLM, TGI (Text Generation Inference)
|
||||||
- Model serving with TorchServe, MLflow, BentoML for production deployment
|
- Model serving with TorchServe, MLflow, BentoML for production deployment
|
||||||
@@ -68,7 +68,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
|||||||
- Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases
|
- Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases
|
||||||
|
|
||||||
### Multimodal AI Integration
|
### Multimodal AI Integration
|
||||||
- Vision models: GPT-4V, Claude 3 Vision, LLaVA, CLIP for image understanding
|
- Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding
|
||||||
- Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
|
- Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
|
||||||
- Document AI: OCR, table extraction, layout understanding with models like LayoutLM
|
- Document AI: OCR, table extraction, layout understanding with models like LayoutLM
|
||||||
- Video analysis and processing for multimedia applications
|
- Video analysis and processing for multimedia applications
|
||||||
@@ -111,7 +111,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
|||||||
- Balances cutting-edge techniques with proven, stable solutions
|
- Balances cutting-edge techniques with proven, stable solutions
|
||||||
|
|
||||||
## Knowledge Base
|
## Knowledge Base
|
||||||
- Latest LLM developments and model capabilities (GPT-4o, Claude 3.5, Llama 3.2)
|
- Latest LLM developments and model capabilities (GPT-4o, Claude 4.5, Llama 3.2)
|
||||||
- Modern vector database architectures and optimization techniques
|
- Modern vector database architectures and optimization techniques
|
||||||
- Production AI system design patterns and best practices
|
- Production AI system design patterns and best practices
|
||||||
- AI safety and security considerations for enterprise deployments
|
- AI safety and security considerations for enterprise deployments
|
||||||
|
|||||||
@@ -53,7 +53,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
|||||||
- Multi-turn conversation management
|
- Multi-turn conversation management
|
||||||
- Image and multimodal prompt engineering
|
- Image and multimodal prompt engineering
|
||||||
|
|
||||||
#### Anthropic Claude (3.5 Sonnet, Haiku, Opus)
|
#### Anthropic Claude (4.5 Sonnet, Haiku, Opus)
|
||||||
- Constitutional AI alignment with Claude's training
|
- Constitutional AI alignment with Claude's training
|
||||||
- Tool use optimization for complex workflows
|
- Tool use optimization for complex workflows
|
||||||
- Computer use prompting for automation tasks
|
- Computer use prompting for automation tasks
|
||||||
|
|||||||
@@ -113,7 +113,7 @@ Final Response: [Refined]
|
|||||||
|
|
||||||
### 5. Model-Specific Optimization
|
### 5. Model-Specific Optimization
|
||||||
|
|
||||||
**GPT-4/GPT-4o**
|
**GPT-5/GPT-4o**
|
||||||
```python
|
```python
|
||||||
gpt4_optimized = """
|
gpt4_optimized = """
|
||||||
##CONTEXT##
|
##CONTEXT##
|
||||||
@@ -136,7 +136,7 @@ gpt4_optimized = """
|
|||||||
"""
|
"""
|
||||||
```
|
```
|
||||||
|
|
||||||
**Claude 3.5/4**
|
**Claude 4.5/4**
|
||||||
```python
|
```python
|
||||||
claude_optimized = """
|
claude_optimized = """
|
||||||
<context>
|
<context>
|
||||||
@@ -566,7 +566,7 @@ testing_recommendations:
|
|||||||
metrics: ["accuracy", "satisfaction", "cost"]
|
metrics: ["accuracy", "satisfaction", "cost"]
|
||||||
|
|
||||||
deployment_strategy:
|
deployment_strategy:
|
||||||
model: "GPT-4 for quality, Claude for safety"
|
model: "GPT-5 for quality, Claude for safety"
|
||||||
temperature: 0.7
|
temperature: 0.7
|
||||||
max_tokens: 2000
|
max_tokens: 2000
|
||||||
monitoring: "Track success, latency, feedback"
|
monitoring: "Track success, latency, feedback"
|
||||||
|
|||||||
@@ -186,7 +186,7 @@ def calculate_factuality(claim, knowledge_base):
|
|||||||
### Single Output Evaluation
|
### Single Output Evaluation
|
||||||
```python
|
```python
|
||||||
def llm_judge_quality(response, question):
|
def llm_judge_quality(response, question):
|
||||||
"""Use GPT-4 to judge response quality."""
|
"""Use GPT-5 to judge response quality."""
|
||||||
prompt = f"""Rate the following response on a scale of 1-10 for:
|
prompt = f"""Rate the following response on a scale of 1-10 for:
|
||||||
1. Accuracy (factually correct)
|
1. Accuracy (factually correct)
|
||||||
2. Helpfulness (answers the question)
|
2. Helpfulness (answers the question)
|
||||||
@@ -205,7 +205,7 @@ Provide ratings in JSON format:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
result = openai.ChatCompletion.create(
|
result = openai.ChatCompletion.create(
|
||||||
model="gpt-4",
|
model="gpt-5",
|
||||||
messages=[{"role": "user", "content": prompt}],
|
messages=[{"role": "user", "content": prompt}],
|
||||||
temperature=0
|
temperature=0
|
||||||
)
|
)
|
||||||
@@ -236,7 +236,7 @@ Answer with JSON:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
result = openai.ChatCompletion.create(
|
result = openai.ChatCompletion.create(
|
||||||
model="gpt-4",
|
model="gpt-5",
|
||||||
messages=[{"role": "user", "content": prompt}],
|
messages=[{"role": "user", "content": prompt}],
|
||||||
temperature=0
|
temperature=0
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -65,7 +65,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
|
|||||||
responses = []
|
responses = []
|
||||||
for _ in range(n):
|
for _ in range(n):
|
||||||
response = openai.ChatCompletion.create(
|
response = openai.ChatCompletion.create(
|
||||||
model="gpt-4",
|
model="gpt-5",
|
||||||
messages=[{"role": "user", "content": prompt}],
|
messages=[{"role": "user", "content": prompt}],
|
||||||
temperature=temperature
|
temperature=temperature
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# AI-Powered Code Review Specialist
|
# AI-Powered Code Review Specialist
|
||||||
|
|
||||||
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-4, Claude 3.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
@@ -30,7 +30,7 @@ Execute in parallel:
|
|||||||
|
|
||||||
### AI-Assisted Review
|
### AI-Assisted Review
|
||||||
```python
|
```python
|
||||||
# Context-aware review prompt for Claude 3.5 Sonnet
|
# Context-aware review prompt for Claude 4.5 Sonnet
|
||||||
review_prompt = f"""
|
review_prompt = f"""
|
||||||
You are reviewing a pull request for a {language} {project_type} application.
|
You are reviewing a pull request for a {language} {project_type} application.
|
||||||
|
|
||||||
@@ -59,8 +59,8 @@ Format as JSON array.
|
|||||||
```
|
```
|
||||||
|
|
||||||
### Model Selection (2025)
|
### Model Selection (2025)
|
||||||
- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 3.5 Sonnet
|
- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku
|
||||||
- **Deep reasoning**: Claude 3.7 Sonnet or GPT-4.5 (200K+ tokens)
|
- **Deep reasoning**: Claude 4.5 Sonnet or GPT-4.5 (200K+ tokens)
|
||||||
- **Code generation**: GitHub Copilot or Qodo
|
- **Code generation**: GitHub Copilot or Qodo
|
||||||
- **Multi-language**: Qodo or CodeAnt AI (30+ languages)
|
- **Multi-language**: Qodo or CodeAnt AI (30+ languages)
|
||||||
|
|
||||||
@@ -284,7 +284,7 @@ jobs:
|
|||||||
codeql database create codeql-db --language=javascript,python
|
codeql database create codeql-db --language=javascript,python
|
||||||
semgrep scan --config=auto --sarif --output=semgrep.sarif
|
semgrep scan --config=auto --sarif --output=semgrep.sarif
|
||||||
|
|
||||||
- name: AI-Enhanced Review (GPT-4)
|
- name: AI-Enhanced Review (GPT-5)
|
||||||
env:
|
env:
|
||||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||||
run: |
|
run: |
|
||||||
@@ -417,7 +417,7 @@ if __name__ == '__main__':
|
|||||||
|
|
||||||
Comprehensive AI code review combining:
|
Comprehensive AI code review combining:
|
||||||
1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
|
1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
|
||||||
2. State-of-the-art LLMs (GPT-4, Claude 3.5 Sonnet)
|
2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet)
|
||||||
3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
|
3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
|
||||||
4. 30+ language support with language-specific linters
|
4. 30+ language support with language-specific linters
|
||||||
5. Actionable review comments with severity and fix examples
|
5. Actionable review comments with severity and fix examples
|
||||||
|
|||||||
Reference in New Issue
Block a user