mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 01:27:16 +00:00
Compare commits
2 Commits
5d65aa1063
...
682abfcdeb
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
682abfcdeb | ||
|
|
086557180a |
@@ -6,8 +6,8 @@
|
||||
"url": "https://github.com/wshobson"
|
||||
},
|
||||
"metadata": {
|
||||
"description": "Production-ready workflow orchestration with 73 focused plugins, 112 specialized agents, and 146 skills - optimized for granular installation and minimal token usage",
|
||||
"version": "1.5.1"
|
||||
"description": "Production-ready workflow orchestration with 72 focused plugins, 112 specialized agents, and 146 skills - optimized for granular installation and minimal token usage",
|
||||
"version": "1.5.3"
|
||||
},
|
||||
"plugins": [
|
||||
{
|
||||
@@ -114,19 +114,6 @@
|
||||
"license": "MIT",
|
||||
"category": "workflows"
|
||||
},
|
||||
{
|
||||
"name": "code-review-ai",
|
||||
"source": "./plugins/code-review-ai",
|
||||
"description": "AI-powered architectural review and code quality analysis",
|
||||
"version": "1.2.0",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
"email": "seth@major7apps.com"
|
||||
},
|
||||
"homepage": "https://github.com/wshobson/agents",
|
||||
"license": "MIT",
|
||||
"category": "quality"
|
||||
},
|
||||
{
|
||||
"name": "code-refactoring",
|
||||
"source": "./plugins/code-refactoring",
|
||||
@@ -181,8 +168,8 @@
|
||||
},
|
||||
{
|
||||
"name": "llm-application-dev",
|
||||
"description": "LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2",
|
||||
"version": "2.0.3",
|
||||
"description": "LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.6 and GPT-5.2",
|
||||
"version": "2.0.4",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
"email": "seth@major7apps.com"
|
||||
@@ -196,7 +183,7 @@
|
||||
"name": "agent-orchestration",
|
||||
"source": "./plugins/agent-orchestration",
|
||||
"description": "Multi-agent system optimization, agent improvement workflows, and context management",
|
||||
"version": "1.2.0",
|
||||
"version": "1.2.1",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
"email": "seth@major7apps.com"
|
||||
@@ -404,7 +391,7 @@
|
||||
"name": "performance-testing-review",
|
||||
"source": "./plugins/performance-testing-review",
|
||||
"description": "Performance analysis, test coverage review, and AI-powered code quality assessment",
|
||||
"version": "1.2.0",
|
||||
"version": "1.2.1",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
"email": "seth@major7apps.com"
|
||||
|
||||
32
README.md
32
README.md
@@ -1,18 +1,18 @@
|
||||
# Claude Code Plugins: Orchestration and Automation
|
||||
|
||||
> **⚡ Updated for Opus 4.5, Sonnet 4.5 & Haiku 4.5** — Three-tier model strategy for optimal performance
|
||||
> **⚡ Updated for Opus 4.6, Sonnet 4.6 & Haiku 4.5** — Three-tier model strategy for optimal performance
|
||||
|
||||
[](https://smithery.ai/skills?ns=wshobson&utm_source=github&utm_medium=badge)
|
||||
|
||||
> **🎯 Agent Skills Enabled** — 146 specialized skills extend Claude's capabilities across plugins with progressive disclosure
|
||||
|
||||
A comprehensive production-ready system combining **112 specialized AI agents**, **16 multi-agent workflow orchestrators**, **146 agent skills**, and **79 development tools** organized into **73 focused, single-purpose plugins** for [Claude Code](https://docs.claude.com/en/docs/claude-code/overview).
|
||||
A comprehensive production-ready system combining **112 specialized AI agents**, **16 multi-agent workflow orchestrators**, **146 agent skills**, and **79 development tools** organized into **72 focused, single-purpose plugins** for [Claude Code](https://docs.claude.com/en/docs/claude-code/overview).
|
||||
|
||||
## Overview
|
||||
|
||||
This unified repository provides everything needed for intelligent automation and multi-agent orchestration across modern software development:
|
||||
|
||||
- **73 Focused Plugins** - Granular, single-purpose plugins optimized for minimal token usage and composability
|
||||
- **72 Focused Plugins** - Granular, single-purpose plugins optimized for minimal token usage and composability
|
||||
- **112 Specialized Agents** - Domain experts with deep knowledge across architecture, languages, infrastructure, quality, data/AI, documentation, business operations, and SEO
|
||||
- **146 Agent Skills** - Modular knowledge packages with progressive disclosure for specialized expertise
|
||||
- **16 Workflow Orchestrators** - Multi-agent coordination systems for complex operations like full-stack development, security hardening, ML pipelines, and incident response
|
||||
@@ -20,7 +20,7 @@ This unified repository provides everything needed for intelligent automation an
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Granular Plugin Architecture**: 73 focused plugins optimized for minimal token usage
|
||||
- **Granular Plugin Architecture**: 72 focused plugins optimized for minimal token usage
|
||||
- **Comprehensive Tooling**: 79 development tools including test generation, scaffolding, and security scanning
|
||||
- **100% Agent Coverage**: All plugins include specialized agents
|
||||
- **Agent Skills**: 146 specialized skills following for progressive disclosure and token efficiency
|
||||
@@ -49,7 +49,7 @@ Add this marketplace to Claude Code:
|
||||
/plugin marketplace add wshobson/agents
|
||||
```
|
||||
|
||||
This makes all 73 plugins available for installation, but **does not load any agents or tools** into your context.
|
||||
This makes all 72 plugins available for installation, but **does not load any agents or tools** into your context.
|
||||
|
||||
### Step 2: Install Plugins
|
||||
|
||||
@@ -73,7 +73,7 @@ Install the plugins you need:
|
||||
|
||||
# Security & quality
|
||||
/plugin install security-scanning # SAST with security skill
|
||||
/plugin install code-review-ai # AI-powered code review
|
||||
/plugin install comprehensive-review # Multi-perspective code analysis
|
||||
|
||||
# Full-stack orchestration
|
||||
/plugin install full-stack-orchestration # Multi-agent workflows
|
||||
@@ -114,7 +114,7 @@ rm -rf ~/.claude/plugins/cache/claude-code-workflows && rm ~/.claude/plugins/ins
|
||||
|
||||
### Core Guides
|
||||
|
||||
- **[Plugin Reference](docs/plugins.md)** - Complete catalog of all 73 plugins
|
||||
- **[Plugin Reference](docs/plugins.md)** - Complete catalog of all 72 plugins
|
||||
- **[Agent Reference](docs/agents.md)** - All 112 agents organized by category
|
||||
- **[Agent Skills](docs/agent-skills.md)** - 146 specialized skills with progressive disclosure
|
||||
- **[Usage Guide](docs/usage.md)** - Commands, workflows, and best practices
|
||||
@@ -203,14 +203,14 @@ Strategic model assignment for optimal performance and cost:
|
||||
|
||||
| Tier | Model | Agents | Use Case |
|
||||
| ---------- | -------- | ------ | ----------------------------------------------------------------------------------------------- |
|
||||
| **Tier 1** | Opus 4.5 | 42 | Critical architecture, security, ALL code review, production coding (language pros, frameworks) |
|
||||
| **Tier 1** | Opus 4.6 | 42 | Critical architecture, security, ALL code review, production coding (language pros, frameworks) |
|
||||
| **Tier 2** | Inherit | 42 | Complex tasks - user chooses model (AI/ML, backend, frontend/mobile, specialized) |
|
||||
| **Tier 3** | Sonnet | 51 | Support with intelligence (docs, testing, debugging, network, API docs, DX, legacy, payments) |
|
||||
| **Tier 4** | Haiku | 18 | Fast operational tasks (SEO, deployment, simple docs, sales, content, search) |
|
||||
|
||||
**Why Opus 4.5 for Critical Agents?**
|
||||
**Why Opus 4.6 for Critical Agents?**
|
||||
|
||||
- 80.9% on SWE-bench (industry-leading)
|
||||
- 80.8% on SWE-bench (industry-leading)
|
||||
- 65% fewer tokens for complex tasks
|
||||
- Best for architecture decisions and security audits
|
||||
|
||||
@@ -218,14 +218,14 @@ Strategic model assignment for optimal performance and cost:
|
||||
Agents marked `inherit` use your session's default model, letting you balance cost and capability:
|
||||
|
||||
- Set via `claude --model opus` or `claude --model sonnet` when starting a session
|
||||
- Falls back to Sonnet 4.5 if no default specified
|
||||
- Falls back to Sonnet 4.6 if no default specified
|
||||
- Perfect for frontend/mobile developers who want cost control
|
||||
- AI/ML engineers can choose Opus for complex model work
|
||||
|
||||
**Cost Considerations:**
|
||||
|
||||
- **Opus 4.5**: $5/$25 per million input/output tokens - Premium for critical work
|
||||
- **Sonnet 4.5**: $3/$15 per million tokens - Balanced performance/cost
|
||||
- **Opus 4.6**: $5/$25 per million input/output tokens - Premium for critical work
|
||||
- **Sonnet 4.6**: $3/$15 per million tokens - Balanced performance/cost
|
||||
- **Haiku 4.5**: $1/$5 per million tokens - Fast, cost-effective operations
|
||||
- Opus's 65% token reduction on complex tasks often offsets higher rate
|
||||
- Use `inherit` tier to control costs for high-volume use cases
|
||||
@@ -283,13 +283,13 @@ Uses kubernetes-architect agent with 4 specialized skills for production-grade c
|
||||
|
||||
## Plugin Categories
|
||||
|
||||
**24 categories, 73 plugins:**
|
||||
**24 categories, 72 plugins:**
|
||||
|
||||
- 🎨 **Development** (4) - debugging, backend, frontend, multi-platform
|
||||
- 📚 **Documentation** (3) - code docs, API specs, diagrams, C4 architecture
|
||||
- 🔄 **Workflows** (5) - git, full-stack, TDD, **Conductor** (context-driven development), **Agent Teams** (multi-agent orchestration)
|
||||
- ✅ **Testing** (2) - unit testing, TDD workflows
|
||||
- 🔍 **Quality** (3) - code review, comprehensive review, performance
|
||||
- 🔍 **Quality** (2) - comprehensive review, performance
|
||||
- 🤖 **AI & ML** (4) - LLM apps, agent orchestration, context, MLOps
|
||||
- 📊 **Data** (2) - data engineering, data validation
|
||||
- 🗄️ **Database** (2) - database design, migrations
|
||||
@@ -330,7 +330,7 @@ Three-tier architecture for token efficiency:
|
||||
```
|
||||
claude-agents/
|
||||
├── .claude-plugin/
|
||||
│ └── marketplace.json # 73 plugins
|
||||
│ └── marketplace.json # 72 plugins
|
||||
├── plugins/
|
||||
│ ├── python-development/
|
||||
│ │ ├── agents/ # 3 Python experts
|
||||
|
||||
@@ -334,7 +334,7 @@ Feature Development Workflow:
|
||||
1. backend-development:feature-development
|
||||
2. security-scanning:security-hardening
|
||||
3. unit-testing:test-generate
|
||||
4. code-review-ai:ai-review
|
||||
4. comprehensive-review:full-review
|
||||
5. cicd-automation:workflow-automate
|
||||
6. observability-monitoring:monitor-setup
|
||||
```
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Complete Plugin Reference
|
||||
|
||||
Browse all **72 focused, single-purpose plugins** organized by category.
|
||||
Browse all **71 focused, single-purpose plugins** organized by category.
|
||||
|
||||
## Quick Start - Essential Plugins
|
||||
|
||||
@@ -68,14 +68,6 @@ Multi-agent coordination from backend → frontend → testing → security →
|
||||
|
||||
Generate pytest (Python) and Jest (JavaScript) unit tests automatically with comprehensive edge case coverage.
|
||||
|
||||
**code-review-ai** - AI-powered code review
|
||||
|
||||
```bash
|
||||
/plugin install code-review-ai
|
||||
```
|
||||
|
||||
Architectural analysis, security assessment, and code quality review with actionable feedback.
|
||||
|
||||
### Infrastructure & Operations
|
||||
|
||||
**cloud-infrastructure** - Cloud architecture design
|
||||
@@ -150,11 +142,10 @@ Next.js, React + Vite, and Node.js project setup with pnpm and TypeScript best p
|
||||
| **unit-testing** | Automated unit test generation (Python/JavaScript) | `/plugin install unit-testing` |
|
||||
| **tdd-workflows** | Test-driven development methodology | `/plugin install tdd-workflows` |
|
||||
|
||||
### 🔍 Quality (3 plugins)
|
||||
### 🔍 Quality (2 plugins)
|
||||
|
||||
| Plugin | Description | Install |
|
||||
| ------------------------------ | --------------------------------------------- | -------------------------------------------- |
|
||||
| **code-review-ai** | AI-powered architectural review | `/plugin install code-review-ai` |
|
||||
| **comprehensive-review** | Multi-perspective code analysis | `/plugin install comprehensive-review` |
|
||||
| **performance-testing-review** | Performance analysis and test coverage review | `/plugin install performance-testing-review` |
|
||||
|
||||
|
||||
@@ -70,7 +70,6 @@ Claude Code automatically selects and coordinates the appropriate agents based o
|
||||
|
||||
| Command | Description |
|
||||
| ----------------------------------- | -------------------------- |
|
||||
| `/code-review-ai:ai-review` | AI-powered code review |
|
||||
| `/comprehensive-review:full-review` | Multi-perspective analysis |
|
||||
| `/comprehensive-review:pr-enhance` | Enhance pull requests |
|
||||
|
||||
@@ -361,7 +360,7 @@ Compose multiple plugins for complex scenarios:
|
||||
/unit-testing:test-generate
|
||||
|
||||
# 4. Review the implementation
|
||||
/code-review-ai:ai-review
|
||||
/comprehensive-review:full-review
|
||||
|
||||
# 5. Set up CI/CD
|
||||
/cicd-automation:workflow-automate
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "agent-orchestration",
|
||||
"version": "1.2.0",
|
||||
"version": "1.2.1",
|
||||
"description": "Multi-agent system optimization, agent improvement workflows, and context management",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
|
||||
@@ -146,7 +146,7 @@ class CostOptimizer:
|
||||
self.token_budget = 100000 # Monthly budget
|
||||
self.token_usage = 0
|
||||
self.model_costs = {
|
||||
'gpt-5': 0.03,
|
||||
'gpt-5.2': 0.03,
|
||||
'claude-4-sonnet': 0.015,
|
||||
'claude-4-haiku': 0.0025
|
||||
}
|
||||
|
||||
@@ -1,10 +0,0 @@
|
||||
{
|
||||
"name": "code-review-ai",
|
||||
"version": "1.2.0",
|
||||
"description": "AI-powered architectural review and code quality analysis",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
"email": "seth@major7apps.com"
|
||||
},
|
||||
"license": "MIT"
|
||||
}
|
||||
@@ -1,161 +0,0 @@
|
||||
---
|
||||
name: architect-review
|
||||
description: Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural decisions.
|
||||
model: opus
|
||||
---
|
||||
|
||||
You are a master software architect specializing in modern software architecture patterns, clean architecture principles, and distributed systems design.
|
||||
|
||||
## Expert Purpose
|
||||
|
||||
Elite software architect focused on ensuring architectural integrity, scalability, and maintainability across complex distributed systems. Masters modern architecture patterns including microservices, event-driven architecture, domain-driven design, and clean architecture principles. Provides comprehensive architectural reviews and guidance for building robust, future-proof software systems.
|
||||
|
||||
## Capabilities
|
||||
|
||||
### Modern Architecture Patterns
|
||||
|
||||
- Clean Architecture and Hexagonal Architecture implementation
|
||||
- Microservices architecture with proper service boundaries
|
||||
- Event-driven architecture (EDA) with event sourcing and CQRS
|
||||
- Domain-Driven Design (DDD) with bounded contexts and ubiquitous language
|
||||
- Serverless architecture patterns and Function-as-a-Service design
|
||||
- API-first design with GraphQL, REST, and gRPC best practices
|
||||
- Layered architecture with proper separation of concerns
|
||||
|
||||
### Distributed Systems Design
|
||||
|
||||
- Service mesh architecture with Istio, Linkerd, and Consul Connect
|
||||
- Event streaming with Apache Kafka, Apache Pulsar, and NATS
|
||||
- Distributed data patterns including Saga, Outbox, and Event Sourcing
|
||||
- Circuit breaker, bulkhead, and timeout patterns for resilience
|
||||
- Distributed caching strategies with Redis Cluster and Hazelcast
|
||||
- Load balancing and service discovery patterns
|
||||
- Distributed tracing and observability architecture
|
||||
|
||||
### SOLID Principles & Design Patterns
|
||||
|
||||
- Single Responsibility, Open/Closed, Liskov Substitution principles
|
||||
- Interface Segregation and Dependency Inversion implementation
|
||||
- Repository, Unit of Work, and Specification patterns
|
||||
- Factory, Strategy, Observer, and Command patterns
|
||||
- Decorator, Adapter, and Facade patterns for clean interfaces
|
||||
- Dependency Injection and Inversion of Control containers
|
||||
- Anti-corruption layers and adapter patterns
|
||||
|
||||
### Cloud-Native Architecture
|
||||
|
||||
- Container orchestration with Kubernetes and Docker Swarm
|
||||
- Cloud provider patterns for AWS, Azure, and Google Cloud Platform
|
||||
- Infrastructure as Code with Terraform, Pulumi, and CloudFormation
|
||||
- GitOps and CI/CD pipeline architecture
|
||||
- Auto-scaling patterns and resource optimization
|
||||
- Multi-cloud and hybrid cloud architecture strategies
|
||||
- Edge computing and CDN integration patterns
|
||||
|
||||
### Security Architecture
|
||||
|
||||
- Zero Trust security model implementation
|
||||
- OAuth2, OpenID Connect, and JWT token management
|
||||
- API security patterns including rate limiting and throttling
|
||||
- Data encryption at rest and in transit
|
||||
- Secret management with HashiCorp Vault and cloud key services
|
||||
- Security boundaries and defense in depth strategies
|
||||
- Container and Kubernetes security best practices
|
||||
|
||||
### Performance & Scalability
|
||||
|
||||
- Horizontal and vertical scaling patterns
|
||||
- Caching strategies at multiple architectural layers
|
||||
- Database scaling with sharding, partitioning, and read replicas
|
||||
- Content Delivery Network (CDN) integration
|
||||
- Asynchronous processing and message queue patterns
|
||||
- Connection pooling and resource management
|
||||
- Performance monitoring and APM integration
|
||||
|
||||
### Data Architecture
|
||||
|
||||
- Polyglot persistence with SQL and NoSQL databases
|
||||
- Data lake, data warehouse, and data mesh architectures
|
||||
- Event sourcing and Command Query Responsibility Segregation (CQRS)
|
||||
- Database per service pattern in microservices
|
||||
- Master-slave and master-master replication patterns
|
||||
- Distributed transaction patterns and eventual consistency
|
||||
- Data streaming and real-time processing architectures
|
||||
|
||||
### Quality Attributes Assessment
|
||||
|
||||
- Reliability, availability, and fault tolerance evaluation
|
||||
- Scalability and performance characteristics analysis
|
||||
- Security posture and compliance requirements
|
||||
- Maintainability and technical debt assessment
|
||||
- Testability and deployment pipeline evaluation
|
||||
- Monitoring, logging, and observability capabilities
|
||||
- Cost optimization and resource efficiency analysis
|
||||
|
||||
### Modern Development Practices
|
||||
|
||||
- Test-Driven Development (TDD) and Behavior-Driven Development (BDD)
|
||||
- DevSecOps integration and shift-left security practices
|
||||
- Feature flags and progressive deployment strategies
|
||||
- Blue-green and canary deployment patterns
|
||||
- Infrastructure immutability and cattle vs. pets philosophy
|
||||
- Platform engineering and developer experience optimization
|
||||
- Site Reliability Engineering (SRE) principles and practices
|
||||
|
||||
### Architecture Documentation
|
||||
|
||||
- C4 model for software architecture visualization
|
||||
- Architecture Decision Records (ADRs) and documentation
|
||||
- System context diagrams and container diagrams
|
||||
- Component and deployment view documentation
|
||||
- API documentation with OpenAPI/Swagger specifications
|
||||
- Architecture governance and review processes
|
||||
- Technical debt tracking and remediation planning
|
||||
|
||||
## Behavioral Traits
|
||||
|
||||
- Champions clean, maintainable, and testable architecture
|
||||
- Emphasizes evolutionary architecture and continuous improvement
|
||||
- Prioritizes security, performance, and scalability from day one
|
||||
- Advocates for proper abstraction levels without over-engineering
|
||||
- Promotes team alignment through clear architectural principles
|
||||
- Considers long-term maintainability over short-term convenience
|
||||
- Balances technical excellence with business value delivery
|
||||
- Encourages documentation and knowledge sharing practices
|
||||
- Stays current with emerging architecture patterns and technologies
|
||||
- Focuses on enabling change rather than preventing it
|
||||
|
||||
## Knowledge Base
|
||||
|
||||
- Modern software architecture patterns and anti-patterns
|
||||
- Cloud-native technologies and container orchestration
|
||||
- Distributed systems theory and CAP theorem implications
|
||||
- Microservices patterns from Martin Fowler and Sam Newman
|
||||
- Domain-Driven Design from Eric Evans and Vaughn Vernon
|
||||
- Clean Architecture from Robert C. Martin (Uncle Bob)
|
||||
- Building Microservices and System Design principles
|
||||
- Site Reliability Engineering and platform engineering practices
|
||||
- Event-driven architecture and event sourcing patterns
|
||||
- Modern observability and monitoring best practices
|
||||
|
||||
## Response Approach
|
||||
|
||||
1. **Analyze architectural context** and identify the system's current state
|
||||
2. **Assess architectural impact** of proposed changes (High/Medium/Low)
|
||||
3. **Evaluate pattern compliance** against established architecture principles
|
||||
4. **Identify architectural violations** and anti-patterns
|
||||
5. **Recommend improvements** with specific refactoring suggestions
|
||||
6. **Consider scalability implications** for future growth
|
||||
7. **Document decisions** with architectural decision records when needed
|
||||
8. **Provide implementation guidance** with concrete next steps
|
||||
|
||||
## Example Interactions
|
||||
|
||||
- "Review this microservice design for proper bounded context boundaries"
|
||||
- "Assess the architectural impact of adding event sourcing to our system"
|
||||
- "Evaluate this API design for REST and GraphQL best practices"
|
||||
- "Review our service mesh implementation for security and performance"
|
||||
- "Analyze this database schema for microservices data isolation"
|
||||
- "Assess the architectural trade-offs of serverless vs. containerized deployment"
|
||||
- "Review this event-driven system design for proper decoupling"
|
||||
- "Evaluate our CI/CD pipeline architecture for scalability and security"
|
||||
@@ -1,457 +0,0 @@
|
||||
# AI-Powered Code Review Specialist
|
||||
|
||||
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
||||
|
||||
## Context
|
||||
|
||||
Multi-layered code review workflows integrating with CI/CD pipelines, providing instant feedback on pull requests with human oversight for architectural decisions. Reviews across 30+ languages combine rule-based analysis with AI-assisted contextual understanding.
|
||||
|
||||
## Requirements
|
||||
|
||||
Review: **$ARGUMENTS**
|
||||
|
||||
Perform comprehensive analysis: security, performance, architecture, maintainability, testing, and AI/ML-specific concerns. Generate review comments with line references, code examples, and actionable recommendations.
|
||||
|
||||
## Automated Code Review Workflow
|
||||
|
||||
### Initial Triage
|
||||
|
||||
1. Parse diff to determine modified files and affected components
|
||||
2. Match file types to optimal static analysis tools
|
||||
3. Scale analysis based on PR size (superficial >1000 lines, deep <200 lines)
|
||||
4. Classify change type: feature, bug fix, refactoring, or breaking change
|
||||
|
||||
### Multi-Tool Static Analysis
|
||||
|
||||
Execute in parallel:
|
||||
|
||||
- **CodeQL**: Deep vulnerability analysis (SQL injection, XSS, auth bypasses)
|
||||
- **SonarQube**: Code smells, complexity, duplication, maintainability
|
||||
- **Semgrep**: Organization-specific rules and security policies
|
||||
- **Snyk/Dependabot**: Supply chain security
|
||||
- **GitGuardian/TruffleHog**: Secret detection
|
||||
|
||||
### AI-Assisted Review
|
||||
|
||||
```python
|
||||
# Context-aware review prompt for Claude 4.5 Sonnet
|
||||
review_prompt = f"""
|
||||
You are reviewing a pull request for a {language} {project_type} application.
|
||||
|
||||
**Change Summary:** {pr_description}
|
||||
**Modified Code:** {code_diff}
|
||||
**Static Analysis:** {sonarqube_issues}, {codeql_alerts}
|
||||
**Architecture:** {system_architecture_summary}
|
||||
|
||||
Focus on:
|
||||
1. Security vulnerabilities missed by static tools
|
||||
2. Performance implications at scale
|
||||
3. Edge cases and error handling gaps
|
||||
4. API contract compatibility
|
||||
5. Testability and missing coverage
|
||||
6. Architectural alignment
|
||||
|
||||
For each issue:
|
||||
- Specify file path and line numbers
|
||||
- Classify severity: CRITICAL/HIGH/MEDIUM/LOW
|
||||
- Explain problem (1-2 sentences)
|
||||
- Provide concrete fix example
|
||||
- Link relevant documentation
|
||||
|
||||
Format as JSON array.
|
||||
"""
|
||||
```
|
||||
|
||||
### Model Selection (2025)
|
||||
|
||||
- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku
|
||||
- **Deep reasoning**: Claude 4.5 Sonnet or GPT-5 (200K+ tokens)
|
||||
- **Code generation**: GitHub Copilot or Qodo
|
||||
- **Multi-language**: Qodo or CodeAnt AI (30+ languages)
|
||||
|
||||
### Review Routing
|
||||
|
||||
```typescript
|
||||
interface ReviewRoutingStrategy {
|
||||
async routeReview(pr: PullRequest): Promise<ReviewEngine> {
|
||||
const metrics = await this.analyzePRComplexity(pr);
|
||||
|
||||
if (metrics.filesChanged > 50 || metrics.linesChanged > 1000) {
|
||||
return new HumanReviewRequired("Too large for automation");
|
||||
}
|
||||
|
||||
if (metrics.securitySensitive || metrics.affectsAuth) {
|
||||
return new AIEngine("claude-3.7-sonnet", {
|
||||
temperature: 0.1,
|
||||
maxTokens: 4000,
|
||||
systemPrompt: SECURITY_FOCUSED_PROMPT
|
||||
});
|
||||
}
|
||||
|
||||
if (metrics.testCoverageGap > 20) {
|
||||
return new QodoEngine({ mode: "test-generation", coverageTarget: 80 });
|
||||
}
|
||||
|
||||
return new AIEngine("gpt-4o", { temperature: 0.3, maxTokens: 2000 });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Architecture Analysis
|
||||
|
||||
### Architectural Coherence
|
||||
|
||||
1. **Dependency Direction**: Inner layers don't depend on outer layers
|
||||
2. **SOLID Principles**:
|
||||
- Single Responsibility, Open/Closed, Liskov Substitution
|
||||
- Interface Segregation, Dependency Inversion
|
||||
3. **Anti-patterns**:
|
||||
- Singleton (global state), God objects (>500 lines, >20 methods)
|
||||
- Anemic models, Shotgun surgery
|
||||
|
||||
### Microservices Review
|
||||
|
||||
```go
|
||||
type MicroserviceReviewChecklist struct {
|
||||
CheckServiceCohesion bool // Single capability per service?
|
||||
CheckDataOwnership bool // Each service owns database?
|
||||
CheckAPIVersioning bool // Semantic versioning?
|
||||
CheckBackwardCompatibility bool // Breaking changes flagged?
|
||||
CheckCircuitBreakers bool // Resilience patterns?
|
||||
CheckIdempotency bool // Duplicate event handling?
|
||||
}
|
||||
|
||||
func (r *MicroserviceReviewer) AnalyzeServiceBoundaries(code string) []Issue {
|
||||
issues := []Issue{}
|
||||
|
||||
if detectsSharedDatabase(code) {
|
||||
issues = append(issues, Issue{
|
||||
Severity: "HIGH",
|
||||
Category: "Architecture",
|
||||
Message: "Services sharing database violates bounded context",
|
||||
Fix: "Implement database-per-service with eventual consistency",
|
||||
})
|
||||
}
|
||||
|
||||
if hasBreakingAPIChanges(code) && !hasDeprecationWarnings(code) {
|
||||
issues = append(issues, Issue{
|
||||
Severity: "CRITICAL",
|
||||
Category: "API Design",
|
||||
Message: "Breaking change without deprecation period",
|
||||
Fix: "Maintain backward compatibility via versioning (v1, v2)",
|
||||
})
|
||||
}
|
||||
|
||||
return issues
|
||||
}
|
||||
```
|
||||
|
||||
## Security Vulnerability Detection
|
||||
|
||||
### Multi-Layered Security
|
||||
|
||||
**SAST Layer**: CodeQL, Semgrep, Bandit/Brakeman/Gosec
|
||||
|
||||
**AI-Enhanced Threat Modeling**:
|
||||
|
||||
```python
|
||||
security_analysis_prompt = """
|
||||
Analyze authentication code for vulnerabilities:
|
||||
{code_snippet}
|
||||
|
||||
Check for:
|
||||
1. Authentication bypass, broken access control (IDOR)
|
||||
2. JWT token validation flaws
|
||||
3. Session fixation/hijacking, timing attacks
|
||||
4. Missing rate limiting, insecure password storage
|
||||
5. Credential stuffing protection gaps
|
||||
|
||||
Provide: CWE identifier, CVSS score, exploit scenario, remediation code
|
||||
"""
|
||||
|
||||
findings = claude.analyze(security_analysis_prompt, temperature=0.1)
|
||||
```
|
||||
|
||||
**Secret Scanning**:
|
||||
|
||||
```bash
|
||||
trufflehog git file://. --json | \
|
||||
jq '.[] | select(.Verified == true) | {
|
||||
secret_type: .DetectorName,
|
||||
file: .SourceMetadata.Data.Filename,
|
||||
severity: "CRITICAL"
|
||||
}'
|
||||
```
|
||||
|
||||
### OWASP Top 10 (2025)
|
||||
|
||||
1. **A01 - Broken Access Control**: Missing authorization, IDOR
|
||||
2. **A02 - Cryptographic Failures**: Weak hashing, insecure RNG
|
||||
3. **A03 - Injection**: SQL, NoSQL, command injection via taint analysis
|
||||
4. **A04 - Insecure Design**: Missing threat modeling
|
||||
5. **A05 - Security Misconfiguration**: Default credentials
|
||||
6. **A06 - Vulnerable Components**: Snyk/Dependabot for CVEs
|
||||
7. **A07 - Authentication Failures**: Weak session management
|
||||
8. **A08 - Data Integrity Failures**: Unsigned JWTs
|
||||
9. **A09 - Logging Failures**: Missing audit logs
|
||||
10. **A10 - SSRF**: Unvalidated user-controlled URLs
|
||||
|
||||
## Performance Review
|
||||
|
||||
### Performance Profiling
|
||||
|
||||
```javascript
|
||||
class PerformanceReviewAgent {
|
||||
async analyzePRPerformance(prNumber) {
|
||||
const baseline = await this.loadBaselineMetrics("main");
|
||||
const prBranch = await this.runBenchmarks(`pr-${prNumber}`);
|
||||
|
||||
const regressions = this.detectRegressions(baseline, prBranch, {
|
||||
cpuThreshold: 10,
|
||||
memoryThreshold: 15,
|
||||
latencyThreshold: 20,
|
||||
});
|
||||
|
||||
if (regressions.length > 0) {
|
||||
await this.postReviewComment(prNumber, {
|
||||
severity: "HIGH",
|
||||
title: "⚠️ Performance Regression Detected",
|
||||
body: this.formatRegressionReport(regressions),
|
||||
suggestions: await this.aiGenerateOptimizations(regressions),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Scalability Red Flags
|
||||
|
||||
- **N+1 Queries**, **Missing Indexes**, **Synchronous External Calls**
|
||||
- **In-Memory State**, **Unbounded Collections**, **Missing Pagination**
|
||||
- **No Connection Pooling**, **No Rate Limiting**
|
||||
|
||||
```python
|
||||
def detect_n_plus_1_queries(code_ast):
|
||||
issues = []
|
||||
for loop in find_loops(code_ast):
|
||||
db_calls = find_database_calls_in_scope(loop.body)
|
||||
if len(db_calls) > 0:
|
||||
issues.append({
|
||||
'severity': 'HIGH',
|
||||
'line': loop.line_number,
|
||||
'message': f'N+1 query: {len(db_calls)} DB calls in loop',
|
||||
'fix': 'Use eager loading (JOIN) or batch loading'
|
||||
})
|
||||
return issues
|
||||
```
|
||||
|
||||
## Review Comment Generation
|
||||
|
||||
### Structured Format
|
||||
|
||||
```typescript
|
||||
interface ReviewComment {
|
||||
path: string;
|
||||
line: number;
|
||||
severity: "CRITICAL" | "HIGH" | "MEDIUM" | "LOW" | "INFO";
|
||||
category: "Security" | "Performance" | "Bug" | "Maintainability";
|
||||
title: string;
|
||||
description: string;
|
||||
codeExample?: string;
|
||||
references?: string[];
|
||||
autoFixable: boolean;
|
||||
cwe?: string;
|
||||
cvss?: number;
|
||||
effort: "trivial" | "easy" | "medium" | "hard";
|
||||
}
|
||||
|
||||
const comment: ReviewComment = {
|
||||
path: "src/auth/login.ts",
|
||||
line: 42,
|
||||
severity: "CRITICAL",
|
||||
category: "Security",
|
||||
title: "SQL Injection in Login Query",
|
||||
description: `String concatenation with user input enables SQL injection.
|
||||
**Attack Vector:** Input 'admin' OR '1'='1' bypasses authentication.
|
||||
**Impact:** Complete auth bypass, unauthorized access.`,
|
||||
codeExample: `
|
||||
// ❌ Vulnerable
|
||||
const query = \`SELECT * FROM users WHERE username = '\${username}'\`;
|
||||
|
||||
// ✅ Secure
|
||||
const query = 'SELECT * FROM users WHERE username = ?';
|
||||
const result = await db.execute(query, [username]);
|
||||
`,
|
||||
references: ["https://cwe.mitre.org/data/definitions/89.html"],
|
||||
autoFixable: false,
|
||||
cwe: "CWE-89",
|
||||
cvss: 9.8,
|
||||
effort: "easy",
|
||||
};
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
```yaml
|
||||
name: AI Code Review
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, synchronize, reopened]
|
||||
|
||||
jobs:
|
||||
ai-review:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Static Analysis
|
||||
run: |
|
||||
sonar-scanner -Dsonar.pullrequest.key=${{ github.event.number }}
|
||||
codeql database create codeql-db --language=javascript,python
|
||||
semgrep scan --config=auto --sarif --output=semgrep.sarif
|
||||
|
||||
- name: AI-Enhanced Review (GPT-5)
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
run: |
|
||||
python scripts/ai_review.py \
|
||||
--pr-number ${{ github.event.number }} \
|
||||
--model gpt-4o \
|
||||
--static-analysis-results codeql.sarif,semgrep.sarif
|
||||
|
||||
- name: Post Comments
|
||||
uses: actions/github-script@v7
|
||||
with:
|
||||
script: |
|
||||
const comments = JSON.parse(fs.readFileSync('review-comments.json'));
|
||||
for (const comment of comments) {
|
||||
await github.rest.pulls.createReviewComment({
|
||||
owner: context.repo.owner,
|
||||
repo: context.repo.repo,
|
||||
pull_number: context.issue.number,
|
||||
body: comment.body, path: comment.path, line: comment.line
|
||||
});
|
||||
}
|
||||
|
||||
- name: Quality Gate
|
||||
run: |
|
||||
CRITICAL=$(jq '[.[] | select(.severity == "CRITICAL")] | length' review-comments.json)
|
||||
if [ $CRITICAL -gt 0 ]; then
|
||||
echo "❌ Found $CRITICAL critical issues"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
## Complete Example: AI Review Automation
|
||||
|
||||
````python
|
||||
#!/usr/bin/env python3
|
||||
import os, json, subprocess
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Dict, Any
|
||||
from anthropic import Anthropic
|
||||
|
||||
@dataclass
|
||||
class ReviewIssue:
|
||||
file_path: str; line: int; severity: str
|
||||
category: str; title: str; description: str
|
||||
code_example: str = ""; auto_fixable: bool = False
|
||||
|
||||
class CodeReviewOrchestrator:
|
||||
def __init__(self, pr_number: int, repo: str):
|
||||
self.pr_number = pr_number; self.repo = repo
|
||||
self.github_token = os.environ['GITHUB_TOKEN']
|
||||
self.anthropic_client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
|
||||
self.issues: List[ReviewIssue] = []
|
||||
|
||||
def run_static_analysis(self) -> Dict[str, Any]:
|
||||
results = {}
|
||||
|
||||
# SonarQube
|
||||
subprocess.run(['sonar-scanner', f'-Dsonar.projectKey={self.repo}'], check=True)
|
||||
|
||||
# Semgrep
|
||||
semgrep_output = subprocess.check_output(['semgrep', 'scan', '--config=auto', '--json'])
|
||||
results['semgrep'] = json.loads(semgrep_output)
|
||||
|
||||
return results
|
||||
|
||||
def ai_review(self, diff: str, static_results: Dict) -> List[ReviewIssue]:
|
||||
prompt = f"""Review this PR comprehensively.
|
||||
|
||||
**Diff:** {diff[:15000]}
|
||||
**Static Analysis:** {json.dumps(static_results, indent=2)[:5000]}
|
||||
|
||||
Focus: Security, Performance, Architecture, Bug risks, Maintainability
|
||||
|
||||
Return JSON array:
|
||||
[{{
|
||||
"file_path": "src/auth.py", "line": 42, "severity": "CRITICAL",
|
||||
"category": "Security", "title": "Brief summary",
|
||||
"description": "Detailed explanation", "code_example": "Fix code"
|
||||
}}]
|
||||
"""
|
||||
|
||||
response = self.anthropic_client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=8000, temperature=0.2,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
|
||||
content = response.content[0].text
|
||||
if '```json' in content:
|
||||
content = content.split('```json')[1].split('```')[0]
|
||||
|
||||
return [ReviewIssue(**issue) for issue in json.loads(content.strip())]
|
||||
|
||||
def post_review_comments(self, issues: List[ReviewIssue]):
|
||||
summary = "## 🤖 AI Code Review\n\n"
|
||||
by_severity = {}
|
||||
for issue in issues:
|
||||
by_severity.setdefault(issue.severity, []).append(issue)
|
||||
|
||||
for severity in ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW']:
|
||||
count = len(by_severity.get(severity, []))
|
||||
if count > 0:
|
||||
summary += f"- **{severity}**: {count}\n"
|
||||
|
||||
critical_count = len(by_severity.get('CRITICAL', []))
|
||||
review_data = {
|
||||
'body': summary,
|
||||
'event': 'REQUEST_CHANGES' if critical_count > 0 else 'COMMENT',
|
||||
'comments': [issue.to_github_comment() for issue in issues]
|
||||
}
|
||||
|
||||
# Post to GitHub API
|
||||
print(f"✅ Posted review with {len(issues)} comments")
|
||||
|
||||
if __name__ == '__main__':
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--pr-number', type=int, required=True)
|
||||
parser.add_argument('--repo', required=True)
|
||||
args = parser.parse_args()
|
||||
|
||||
reviewer = CodeReviewOrchestrator(args.pr_number, args.repo)
|
||||
static_results = reviewer.run_static_analysis()
|
||||
diff = reviewer.get_pr_diff()
|
||||
ai_issues = reviewer.ai_review(diff, static_results)
|
||||
reviewer.post_review_comments(ai_issues)
|
||||
````
|
||||
|
||||
## Summary
|
||||
|
||||
Comprehensive AI code review combining:
|
||||
|
||||
1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
|
||||
2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet)
|
||||
3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
|
||||
4. 30+ language support with language-specific linters
|
||||
5. Actionable review comments with severity and fix examples
|
||||
6. DORA metrics tracking for review effectiveness
|
||||
7. Quality gates preventing low-quality code
|
||||
8. Auto-test generation via Qodo/CodiumAI
|
||||
|
||||
Use this tool to transform code review from manual process to automated AI-assisted quality assurance catching issues early with instant feedback.
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"name": "llm-application-dev",
|
||||
"description": "LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2",
|
||||
"version": "2.0.3",
|
||||
"description": "LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.6 and GPT-5.2",
|
||||
"version": "2.0.4",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
"email": "seth@major7apps.com"
|
||||
|
||||
@@ -5,7 +5,7 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
|
||||
## Version 2.0.0 Highlights
|
||||
|
||||
- **LangGraph Integration**: Updated from deprecated LangChain patterns to LangGraph StateGraph workflows
|
||||
- **Modern Model Support**: Claude Opus/Sonnet/Haiku 4.5 and GPT-5.2/GPT-5.2-mini
|
||||
- **Modern Model Support**: Claude Opus 4.6/Sonnet 4.6/Haiku 4.5 and GPT-5.2/GPT-5-mini
|
||||
- **Voyage AI Embeddings**: Recommended embedding models for Claude applications
|
||||
- **Structured Outputs**: Pydantic-based structured output patterns
|
||||
|
||||
@@ -71,7 +71,7 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
|
||||
### 2.0.0 (January 2026)
|
||||
|
||||
- **Breaking**: Migrated from LangChain 0.x to LangChain 1.x/LangGraph
|
||||
- **Breaking**: Updated model references to Claude 4.5 and GPT-5.2
|
||||
- **Breaking**: Updated model references to Claude 4.6 and GPT-5.2
|
||||
- Added Voyage AI as primary embedding recommendation for Claude apps
|
||||
- Added LangGraph StateGraph patterns replacing deprecated `initialize_agent()`
|
||||
- Added structured outputs with Pydantic
|
||||
|
||||
@@ -14,8 +14,8 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
|
||||
### LLM Integration & Model Management
|
||||
|
||||
- OpenAI GPT-5.2/GPT-5.2-mini with function calling and structured outputs
|
||||
- Anthropic Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5 with tool use and computer use
|
||||
- OpenAI GPT-5.2/GPT-5-mini with function calling and structured outputs
|
||||
- Anthropic Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5 with tool use and computer use
|
||||
- Open-source models: Llama 3.3, Mixtral 8x22B, Qwen 2.5, DeepSeek-V3
|
||||
- Local deployment with Ollama, vLLM, TGI (Text Generation Inference)
|
||||
- Model serving with TorchServe, MLflow, BentoML for production deployment
|
||||
@@ -76,7 +76,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
|
||||
### Multimodal AI Integration
|
||||
|
||||
- Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding
|
||||
- Vision models: GPT-5.2, Claude 4 Vision, LLaVA, CLIP for image understanding
|
||||
- Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
|
||||
- Document AI: OCR, table extraction, layout understanding with models like LayoutLM
|
||||
- Video analysis and processing for multimedia applications
|
||||
@@ -124,7 +124,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
|
||||
## Knowledge Base
|
||||
|
||||
- Latest LLM developments and model capabilities (GPT-5.2, Claude 4.5, Llama 3.3)
|
||||
- Latest LLM developments and model capabilities (GPT-5.2, Claude 4.6, Llama 3.3)
|
||||
- Modern vector database architectures and optimization techniques
|
||||
- Production AI system design patterns and best practices
|
||||
- AI safety and security considerations for enterprise deployments
|
||||
|
||||
@@ -48,7 +48,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
|
||||
### Model-Specific Optimization
|
||||
|
||||
#### OpenAI Models (GPT-5.2, GPT-5.2-mini)
|
||||
#### OpenAI Models (GPT-5.2, GPT-5-mini)
|
||||
|
||||
- Function calling optimization and structured outputs
|
||||
- JSON mode utilization for reliable data extraction
|
||||
@@ -58,7 +58,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Multi-turn conversation management
|
||||
- Image and multimodal prompt engineering
|
||||
|
||||
#### Anthropic Claude (Claude Opus 4.5, Sonnet 4.5, Haiku 4.5)
|
||||
#### Anthropic Claude (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5)
|
||||
|
||||
- Constitutional AI alignment with Claude's training
|
||||
- Tool use optimization for complex workflows
|
||||
|
||||
@@ -37,7 +37,7 @@ class AgentState(TypedDict):
|
||||
|
||||
### Model & Embeddings
|
||||
|
||||
- **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
|
||||
- **Primary LLM**: Claude Sonnet 4.6 (`claude-sonnet-4-6`)
|
||||
- **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
|
||||
- **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)
|
||||
|
||||
@@ -158,7 +158,7 @@ from langsmith.evaluation import evaluate
|
||||
# Run evaluation suite
|
||||
eval_config = RunEvalConfig(
|
||||
evaluators=["qa", "context_qa", "cot_qa"],
|
||||
eval_llm=ChatAnthropic(model="claude-sonnet-4-5")
|
||||
eval_llm=ChatAnthropic(model="claude-sonnet-4-6")
|
||||
)
|
||||
|
||||
results = await evaluate(
|
||||
@@ -209,7 +209,7 @@ async def call_with_retry():
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Initialize LLM with Claude Sonnet 4.5
|
||||
- [ ] Initialize LLM with Claude Sonnet 4.6
|
||||
- [ ] Setup Voyage AI embeddings (voyage-3-large)
|
||||
- [ ] Create tools with async support and error handling
|
||||
- [ ] Implement memory system (choose type based on use case)
|
||||
|
||||
@@ -150,7 +150,7 @@ gpt5_optimized = """
|
||||
|
||||
````
|
||||
|
||||
**Claude 4.5/4**
|
||||
**Claude 4.6/4.5**
|
||||
```python
|
||||
claude_optimized = """
|
||||
<context>
|
||||
@@ -607,7 +607,7 @@ testing_recommendations:
|
||||
metrics: ["accuracy", "satisfaction", "cost"]
|
||||
|
||||
deployment_strategy:
|
||||
model: "GPT-5.2 for quality, Claude 4.5 for safety"
|
||||
model: "GPT-5.2 for quality, Claude 4.6 for safety"
|
||||
temperature: 0.7
|
||||
max_tokens: 2000
|
||||
monitoring: "Track success, latency, feedback"
|
||||
|
||||
@@ -115,8 +115,8 @@ from langchain_core.tools import tool
|
||||
import ast
|
||||
import operator
|
||||
|
||||
# Initialize LLM (Claude Sonnet 4.5 recommended)
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5", temperature=0)
|
||||
# Initialize LLM (Claude Sonnet 4.6 recommended)
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0)
|
||||
|
||||
# Define tools with Pydantic schemas
|
||||
@tool
|
||||
@@ -201,7 +201,7 @@ class RAGState(TypedDict):
|
||||
answer: str
|
||||
|
||||
# Initialize components
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-6")
|
||||
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
|
||||
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
|
||||
@@ -489,7 +489,7 @@ os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
|
||||
os.environ["LANGCHAIN_PROJECT"] = "my-project"
|
||||
|
||||
# All LangChain/LangGraph operations are automatically traced
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-6")
|
||||
```
|
||||
|
||||
### Custom Callback Handler
|
||||
@@ -530,7 +530,7 @@ result = await agent.ainvoke(
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5", streaming=True)
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-6", streaming=True)
|
||||
|
||||
# Stream tokens
|
||||
async for chunk in llm.astream("Tell me a story"):
|
||||
|
||||
@@ -283,7 +283,7 @@ Provide ratings in JSON format:
|
||||
}}"""
|
||||
|
||||
message = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=500,
|
||||
system=system,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
@@ -329,7 +329,7 @@ Answer with JSON:
|
||||
}}"""
|
||||
|
||||
message = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=500,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
@@ -375,7 +375,7 @@ Respond in JSON:
|
||||
}}"""
|
||||
|
||||
message = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=500,
|
||||
messages=[{"role": "user", "content": prompt}]
|
||||
)
|
||||
@@ -605,7 +605,7 @@ experiment_results = await evaluate(
|
||||
data=dataset.name,
|
||||
evaluators=evaluators,
|
||||
experiment_prefix="v1.0.0",
|
||||
metadata={"model": "claude-sonnet-4-5", "version": "1.0.0"}
|
||||
metadata={"model": "claude-sonnet-4-6", "version": "1.0.0"}
|
||||
)
|
||||
|
||||
print(f"Mean score: {experiment_results.aggregate_metrics['qa']['mean']}")
|
||||
|
||||
@@ -81,7 +81,7 @@ class SQLQuery(BaseModel):
|
||||
tables_used: list[str] = Field(description="List of tables referenced")
|
||||
|
||||
# Initialize model with structured output
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-6")
|
||||
structured_llm = llm.with_structured_output(SQLQuery)
|
||||
|
||||
# Create prompt template
|
||||
@@ -124,7 +124,7 @@ async def analyze_sentiment(text: str) -> SentimentAnalysis:
|
||||
client = Anthropic()
|
||||
|
||||
message = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=500,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
@@ -427,7 +427,7 @@ client = Anthropic()
|
||||
|
||||
# Use prompt caching for repeated system prompts
|
||||
response = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=1000,
|
||||
system=[
|
||||
{
|
||||
|
||||
@@ -68,7 +68,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
|
||||
responses = []
|
||||
for _ in range(n):
|
||||
response = openai.ChatCompletion.create(
|
||||
model="gpt-5",
|
||||
model="gpt-5.2",
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
temperature=temperature
|
||||
)
|
||||
|
||||
@@ -85,7 +85,7 @@ class RAGState(TypedDict):
|
||||
answer: str
|
||||
|
||||
# Initialize components
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-6")
|
||||
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
|
||||
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "performance-testing-review",
|
||||
"version": "1.2.0",
|
||||
"version": "1.2.1",
|
||||
"description": "Performance analysis, test coverage review, and AI-powered code quality assessment",
|
||||
"author": {
|
||||
"name": "Seth Hobson",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# AI-Powered Code Review Specialist
|
||||
|
||||
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, Claude 4.5 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
||||
You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5.2, Claude 4.6 Sonnet) with battle-tested platforms (SonarQube, CodeQL, Semgrep) to identify bugs, vulnerabilities, and performance issues.
|
||||
|
||||
## Context
|
||||
|
||||
@@ -34,7 +34,7 @@ Execute in parallel:
|
||||
### AI-Assisted Review
|
||||
|
||||
```python
|
||||
# Context-aware review prompt for Claude 4.5 Sonnet
|
||||
# Context-aware review prompt for Claude 4.6 Sonnet
|
||||
review_prompt = f"""
|
||||
You are reviewing a pull request for a {language} {project_type} application.
|
||||
|
||||
@@ -64,8 +64,8 @@ Format as JSON array.
|
||||
|
||||
### Model Selection (2025)
|
||||
|
||||
- **Fast reviews (<200 lines)**: GPT-4o-mini or Claude 4.5 Haiku
|
||||
- **Deep reasoning**: Claude 4.5 Sonnet or GPT-4.5 (200K+ tokens)
|
||||
- **Fast reviews (<200 lines)**: GPT-5-mini or Claude 4.5 Haiku
|
||||
- **Deep reasoning**: Claude 4.6 Sonnet or GPT-5.2 (200K+ tokens)
|
||||
- **Code generation**: GitHub Copilot or Qodo
|
||||
- **Multi-language**: Qodo or CodeAnt AI (30+ languages)
|
||||
|
||||
@@ -92,7 +92,7 @@ interface ReviewRoutingStrategy {
|
||||
return new QodoEngine({ mode: "test-generation", coverageTarget: 80 });
|
||||
}
|
||||
|
||||
return new AIEngine("gpt-4o", { temperature: 0.3, maxTokens: 2000 });
|
||||
return new AIEngine("gpt-5.2", { temperature: 0.3, maxTokens: 2000 });
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -312,13 +312,13 @@ jobs:
|
||||
codeql database create codeql-db --language=javascript,python
|
||||
semgrep scan --config=auto --sarif --output=semgrep.sarif
|
||||
|
||||
- name: AI-Enhanced Review (GPT-5)
|
||||
- name: AI-Enhanced Review (GPT-5.2)
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
run: |
|
||||
python scripts/ai_review.py \
|
||||
--pr-number ${{ github.event.number }} \
|
||||
--model gpt-4o \
|
||||
--model gpt-5.2 \
|
||||
--static-analysis-results codeql.sarif,semgrep.sarif
|
||||
|
||||
- name: Post Comments
|
||||
@@ -446,7 +446,7 @@ if __name__ == '__main__':
|
||||
Comprehensive AI code review combining:
|
||||
|
||||
1. Multi-tool static analysis (SonarQube, CodeQL, Semgrep)
|
||||
2. State-of-the-art LLMs (GPT-5, Claude 4.5 Sonnet)
|
||||
2. State-of-the-art LLMs (GPT-5.2, Claude 4.6 Sonnet)
|
||||
3. Seamless CI/CD integration (GitHub Actions, GitLab, Azure DevOps)
|
||||
4. 30+ language support with language-specific linters
|
||||
5. Actionable review comments with severity and fix examples
|
||||
|
||||
Reference in New Issue
Block a user