mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 09:37:15 +00:00
style: format all files with prettier
This commit is contained in:
@@ -12,12 +12,14 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
|
||||
## Features
|
||||
|
||||
### Core Capabilities
|
||||
|
||||
- **RAG Systems**: Production retrieval-augmented generation with hybrid search
|
||||
- **Vector Search**: Pinecone, Qdrant, Weaviate, Milvus, pgvector optimization
|
||||
- **Agent Architectures**: LangGraph-based agents with memory and tool use
|
||||
- **Prompt Engineering**: Advanced prompting techniques with model-specific optimization
|
||||
|
||||
### Key Technologies
|
||||
|
||||
- LangChain 1.x / LangGraph for agent workflows
|
||||
- Voyage AI, OpenAI, and open-source embedding models
|
||||
- HNSW, IVF, and Product Quantization index strategies
|
||||
@@ -25,31 +27,31 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
|
||||
|
||||
## Agents
|
||||
|
||||
| Agent | Description |
|
||||
|-------|-------------|
|
||||
| `ai-engineer` | Production-grade LLM applications, RAG systems, and agent architectures |
|
||||
| `prompt-engineer` | Advanced prompting techniques, constitutional AI, and model optimization |
|
||||
| Agent | Description |
|
||||
| -------------------------- | -------------------------------------------------------------------------- |
|
||||
| `ai-engineer` | Production-grade LLM applications, RAG systems, and agent architectures |
|
||||
| `prompt-engineer` | Advanced prompting techniques, constitutional AI, and model optimization |
|
||||
| `vector-database-engineer` | Vector search implementation, embedding strategies, and semantic retrieval |
|
||||
|
||||
## Skills
|
||||
|
||||
| Skill | Description |
|
||||
|-------|-------------|
|
||||
| `langchain-architecture` | LangGraph StateGraph patterns, memory, and tool integration |
|
||||
| `rag-implementation` | RAG systems with hybrid search and reranking |
|
||||
| `llm-evaluation` | Evaluation frameworks for LLM applications |
|
||||
| `prompt-engineering-patterns` | Chain-of-thought, few-shot, and structured outputs |
|
||||
| `embedding-strategies` | Embedding model selection and optimization |
|
||||
| `similarity-search-patterns` | Vector similarity search implementation |
|
||||
| `vector-index-tuning` | HNSW, IVF, and quantization optimization |
|
||||
| `hybrid-search-implementation` | Vector + keyword search fusion |
|
||||
| Skill | Description |
|
||||
| ------------------------------ | ----------------------------------------------------------- |
|
||||
| `langchain-architecture` | LangGraph StateGraph patterns, memory, and tool integration |
|
||||
| `rag-implementation` | RAG systems with hybrid search and reranking |
|
||||
| `llm-evaluation` | Evaluation frameworks for LLM applications |
|
||||
| `prompt-engineering-patterns` | Chain-of-thought, few-shot, and structured outputs |
|
||||
| `embedding-strategies` | Embedding model selection and optimization |
|
||||
| `similarity-search-patterns` | Vector similarity search implementation |
|
||||
| `vector-index-tuning` | HNSW, IVF, and quantization optimization |
|
||||
| `hybrid-search-implementation` | Vector + keyword search fusion |
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/llm-application-dev:langchain-agent` | Create LangGraph-based agent |
|
||||
| `/llm-application-dev:ai-assistant` | Build AI assistant application |
|
||||
| Command | Description |
|
||||
| -------------------------------------- | ------------------------------- |
|
||||
| `/llm-application-dev:langchain-agent` | Create LangGraph-based agent |
|
||||
| `/llm-application-dev:ai-assistant` | Build AI assistant application |
|
||||
| `/llm-application-dev:prompt-optimize` | Optimize prompts for production |
|
||||
|
||||
## Installation
|
||||
@@ -69,6 +71,7 @@ Or copy to your project's `.claude-plugin/` directory.
|
||||
## Changelog
|
||||
|
||||
### 2.0.0 (January 2026)
|
||||
|
||||
- **Breaking**: Migrated from LangChain 0.x to LangChain 1.x/LangGraph
|
||||
- **Breaking**: Updated model references to Claude 4.5 and GPT-5.2
|
||||
- Added Voyage AI as primary embedding recommendation for Claude apps
|
||||
@@ -79,6 +82,7 @@ Or copy to your project's `.claude-plugin/` directory.
|
||||
- Updated hybrid search with modern Pinecone client API
|
||||
|
||||
### 1.2.2
|
||||
|
||||
- Minor bug fixes and documentation updates
|
||||
|
||||
## License
|
||||
|
||||
@@ -7,11 +7,13 @@ model: inherit
|
||||
You are an AI engineer specializing in production-grade LLM applications, generative AI systems, and intelligent agent architectures.
|
||||
|
||||
## Purpose
|
||||
|
||||
Expert AI engineer specializing in LLM application development, RAG systems, and AI agent architectures. Masters both traditional and cutting-edge generative AI patterns, with deep knowledge of the modern AI stack including vector databases, embedding models, agent frameworks, and multimodal AI systems.
|
||||
|
||||
## Capabilities
|
||||
|
||||
### LLM Integration & Model Management
|
||||
|
||||
- OpenAI GPT-5.2/GPT-5.2-mini with function calling and structured outputs
|
||||
- Anthropic Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5 with tool use and computer use
|
||||
- Open-source models: Llama 3.3, Mixtral 8x22B, Qwen 2.5, DeepSeek-V3
|
||||
@@ -21,6 +23,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Cost optimization through model selection and caching strategies
|
||||
|
||||
### Advanced RAG Systems
|
||||
|
||||
- Production RAG architectures with multi-stage retrieval pipelines
|
||||
- Vector databases: Pinecone, Qdrant, Weaviate, Chroma, Milvus, pgvector
|
||||
- Embedding models: Voyage AI voyage-3-large (recommended for Claude), OpenAI text-embedding-3-large/small, Cohere embed-v3, BGE-large
|
||||
@@ -32,6 +35,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Advanced RAG patterns: GraphRAG, HyDE, RAG-Fusion, self-RAG
|
||||
|
||||
### Agent Frameworks & Orchestration
|
||||
|
||||
- LangGraph (LangChain 1.x) for complex agent workflows with StateGraph and durable execution
|
||||
- LlamaIndex for data-centric AI applications and advanced retrieval
|
||||
- CrewAI for multi-agent collaboration and specialized agent roles
|
||||
@@ -42,6 +46,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Agent evaluation and monitoring with LangSmith
|
||||
|
||||
### Vector Search & Embeddings
|
||||
|
||||
- Embedding model selection and fine-tuning for domain-specific tasks
|
||||
- Vector indexing strategies: HNSW, IVF, LSH for different scale requirements
|
||||
- Similarity metrics: cosine, dot product, Euclidean for various use cases
|
||||
@@ -50,6 +55,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Vector database optimization: indexing, sharding, and caching strategies
|
||||
|
||||
### Prompt Engineering & Optimization
|
||||
|
||||
- Advanced prompting techniques: chain-of-thought, tree-of-thoughts, self-consistency
|
||||
- Few-shot and in-context learning optimization
|
||||
- Prompt templates with dynamic variable injection and conditioning
|
||||
@@ -59,6 +65,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Multi-modal prompting for vision and audio models
|
||||
|
||||
### Production AI Systems
|
||||
|
||||
- LLM serving with FastAPI, async processing, and load balancing
|
||||
- Streaming responses and real-time inference optimization
|
||||
- Caching strategies: semantic caching, response memoization, embedding caching
|
||||
@@ -68,6 +75,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases
|
||||
|
||||
### Multimodal AI Integration
|
||||
|
||||
- Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding
|
||||
- Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
|
||||
- Document AI: OCR, table extraction, layout understanding with models like LayoutLM
|
||||
@@ -75,6 +83,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Cross-modal embeddings and unified vector spaces
|
||||
|
||||
### AI Safety & Governance
|
||||
|
||||
- Content moderation with OpenAI Moderation API and custom classifiers
|
||||
- Prompt injection detection and prevention strategies
|
||||
- PII detection and redaction in AI workflows
|
||||
@@ -83,6 +92,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Responsible AI practices and ethical considerations
|
||||
|
||||
### Data Processing & Pipeline Management
|
||||
|
||||
- Document processing: PDF extraction, web scraping, API integrations
|
||||
- Data preprocessing: cleaning, normalization, deduplication
|
||||
- Pipeline orchestration with Apache Airflow, Dagster, Prefect
|
||||
@@ -91,6 +101,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- ETL/ELT processes for AI data preparation
|
||||
|
||||
### Integration & API Development
|
||||
|
||||
- RESTful API design for AI services with FastAPI, Flask
|
||||
- GraphQL APIs for flexible AI data querying
|
||||
- Webhook integration and event-driven architectures
|
||||
@@ -99,6 +110,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- API security: OAuth, JWT, API key management
|
||||
|
||||
## Behavioral Traits
|
||||
|
||||
- Prioritizes production reliability and scalability over proof-of-concept implementations
|
||||
- Implements comprehensive error handling and graceful degradation
|
||||
- Focuses on cost optimization and efficient resource utilization
|
||||
@@ -111,6 +123,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Balances cutting-edge techniques with proven, stable solutions
|
||||
|
||||
## Knowledge Base
|
||||
|
||||
- Latest LLM developments and model capabilities (GPT-5.2, Claude 4.5, Llama 3.3)
|
||||
- Modern vector database architectures and optimization techniques
|
||||
- Production AI system design patterns and best practices
|
||||
@@ -123,6 +136,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- Prompt engineering and optimization methodologies
|
||||
|
||||
## Response Approach
|
||||
|
||||
1. **Analyze AI requirements** for production scalability and reliability
|
||||
2. **Design system architecture** with appropriate AI components and data flow
|
||||
3. **Implement production-ready code** with comprehensive error handling
|
||||
@@ -133,6 +147,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
8. **Provide testing strategies** including adversarial and edge cases
|
||||
|
||||
## Example Interactions
|
||||
|
||||
- "Build a production RAG system for enterprise knowledge base with hybrid search"
|
||||
- "Implement a multi-agent customer service system with escalation workflows"
|
||||
- "Design a cost-optimized LLM inference pipeline with caching and load balancing"
|
||||
@@ -140,4 +155,4 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
|
||||
- "Build an AI agent that can browse the web and perform research tasks"
|
||||
- "Implement semantic search with reranking for improved retrieval accuracy"
|
||||
- "Design an A/B testing framework for comparing different LLM prompts"
|
||||
- "Create a real-time AI content moderation system with custom classifiers"
|
||||
- "Create a real-time AI content moderation system with custom classifiers"
|
||||
|
||||
@@ -9,6 +9,7 @@ You are an expert prompt engineer specializing in crafting effective prompts for
|
||||
IMPORTANT: When creating prompts, ALWAYS display the complete prompt text in a clearly marked section. Never describe a prompt without showing it. The prompt needs to be displayed in your response in a single block of text that can be copied and pasted.
|
||||
|
||||
## Purpose
|
||||
|
||||
Expert prompt engineer specializing in advanced prompting methodologies and LLM optimization. Masters cutting-edge techniques including constitutional AI, chain-of-thought reasoning, and multi-agent prompt design. Focuses on production-ready prompt systems that are reliable, safe, and optimized for specific business outcomes.
|
||||
|
||||
## Capabilities
|
||||
@@ -16,6 +17,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
### Advanced Prompting Techniques
|
||||
|
||||
#### Chain-of-Thought & Reasoning
|
||||
|
||||
- Chain-of-thought (CoT) prompting for complex reasoning tasks
|
||||
- Few-shot chain-of-thought with carefully crafted examples
|
||||
- Zero-shot chain-of-thought with "Let's think step by step"
|
||||
@@ -25,6 +27,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Program-aided language models (PAL) for computational tasks
|
||||
|
||||
#### Constitutional AI & Safety
|
||||
|
||||
- Constitutional AI principles for self-correction and alignment
|
||||
- Critique and revise patterns for output improvement
|
||||
- Safety prompting techniques to prevent harmful outputs
|
||||
@@ -34,6 +37,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Red teaming prompts for adversarial testing
|
||||
|
||||
#### Meta-Prompting & Self-Improvement
|
||||
|
||||
- Meta-prompting for prompt optimization and generation
|
||||
- Self-reflection and self-evaluation prompt patterns
|
||||
- Auto-prompting for dynamic prompt generation
|
||||
@@ -45,6 +49,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
### Model-Specific Optimization
|
||||
|
||||
#### OpenAI Models (GPT-5.2, GPT-5.2-mini)
|
||||
|
||||
- Function calling optimization and structured outputs
|
||||
- JSON mode utilization for reliable data extraction
|
||||
- System message design for consistent behavior
|
||||
@@ -54,6 +59,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Image and multimodal prompt engineering
|
||||
|
||||
#### Anthropic Claude (Claude Opus 4.5, Sonnet 4.5, Haiku 4.5)
|
||||
|
||||
- Constitutional AI alignment with Claude's training
|
||||
- Tool use optimization for complex workflows
|
||||
- Computer use prompting for automation tasks
|
||||
@@ -63,6 +69,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Safety considerations specific to Claude's capabilities
|
||||
|
||||
#### Open Source Models (Llama, Mixtral, Qwen)
|
||||
|
||||
- Model-specific prompt formatting and special tokens
|
||||
- Fine-tuning prompt strategies for domain adaptation
|
||||
- Instruction-following optimization for different architectures
|
||||
@@ -74,6 +81,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
### Production Prompt Systems
|
||||
|
||||
#### Prompt Templates & Management
|
||||
|
||||
- Dynamic prompt templating with variable injection
|
||||
- Conditional prompt logic based on context
|
||||
- Multi-language prompt adaptation and localization
|
||||
@@ -83,6 +91,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Rollback strategies for prompt deployments
|
||||
|
||||
#### RAG & Knowledge Integration
|
||||
|
||||
- Retrieval-augmented generation prompt optimization
|
||||
- Context compression and relevance filtering
|
||||
- Query understanding and expansion prompts
|
||||
@@ -92,6 +101,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Knowledge graph integration prompts
|
||||
|
||||
#### Agent & Multi-Agent Prompting
|
||||
|
||||
- Agent role definition and persona creation
|
||||
- Multi-agent collaboration and communication protocols
|
||||
- Task decomposition and workflow orchestration
|
||||
@@ -103,6 +113,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
### Specialized Applications
|
||||
|
||||
#### Business & Enterprise
|
||||
|
||||
- Customer service chatbot optimization
|
||||
- Sales and marketing copy generation
|
||||
- Legal document analysis and generation
|
||||
@@ -112,6 +123,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Compliance and regulatory content generation
|
||||
|
||||
#### Creative & Content
|
||||
|
||||
- Creative writing and storytelling prompts
|
||||
- Content marketing and SEO optimization
|
||||
- Brand voice and tone consistency
|
||||
@@ -121,6 +133,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Translation and localization prompts
|
||||
|
||||
#### Technical & Code
|
||||
|
||||
- Code generation and optimization prompts
|
||||
- Technical documentation and API documentation
|
||||
- Debugging and error analysis assistance
|
||||
@@ -132,6 +145,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
### Evaluation & Testing
|
||||
|
||||
#### Performance Metrics
|
||||
|
||||
- Task-specific accuracy and quality metrics
|
||||
- Response time and efficiency measurements
|
||||
- Cost optimization and token usage analysis
|
||||
@@ -141,6 +155,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Edge case and robustness assessment
|
||||
|
||||
#### Testing Methodologies
|
||||
|
||||
- Red team testing for prompt vulnerabilities
|
||||
- Adversarial prompt testing and jailbreak attempts
|
||||
- Cross-model performance comparison
|
||||
@@ -152,6 +167,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
### Advanced Patterns & Architectures
|
||||
|
||||
#### Prompt Chaining & Workflows
|
||||
|
||||
- Sequential prompt chaining for complex tasks
|
||||
- Parallel prompt execution and result aggregation
|
||||
- Conditional branching based on intermediate outputs
|
||||
@@ -161,6 +177,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Workflow optimization and performance tuning
|
||||
|
||||
#### Multimodal & Cross-Modal
|
||||
|
||||
- Vision-language model prompt optimization
|
||||
- Image understanding and analysis prompts
|
||||
- Document AI and OCR integration prompts
|
||||
@@ -170,6 +187,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Multimodal creative and generative prompts
|
||||
|
||||
## Behavioral Traits
|
||||
|
||||
- Always displays complete prompt text, never just descriptions
|
||||
- Focuses on production reliability and safety over experimental techniques
|
||||
- Considers token efficiency and cost optimization in all prompt designs
|
||||
@@ -182,6 +200,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Emphasizes reproducibility and version control for prompt systems
|
||||
|
||||
## Knowledge Base
|
||||
|
||||
- Latest research in prompt engineering and LLM optimization
|
||||
- Model-specific capabilities and limitations across providers
|
||||
- Production deployment patterns and best practices
|
||||
@@ -194,6 +213,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
- Emerging trends in AI and prompt engineering
|
||||
|
||||
## Response Approach
|
||||
|
||||
1. **Understand the specific use case** and requirements for the prompt
|
||||
2. **Analyze target model capabilities** and optimization opportunities
|
||||
3. **Design prompt architecture** with appropriate techniques and patterns
|
||||
@@ -208,27 +228,32 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
|
||||
When creating any prompt, you MUST include:
|
||||
|
||||
### The Prompt
|
||||
|
||||
```
|
||||
[Display the complete prompt text here - this is the most important part]
|
||||
```
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
- Key techniques used and why they were chosen
|
||||
- Model-specific optimizations and considerations
|
||||
- Expected behavior and output format
|
||||
- Parameter recommendations (temperature, max tokens, etc.)
|
||||
|
||||
### Testing & Evaluation
|
||||
|
||||
- Suggested test cases and evaluation metrics
|
||||
- Edge cases and potential failure modes
|
||||
- A/B testing recommendations for optimization
|
||||
|
||||
### Usage Guidelines
|
||||
|
||||
- When and how to use this prompt effectively
|
||||
- Customization options and variable parameters
|
||||
- Integration considerations for production systems
|
||||
|
||||
## Example Interactions
|
||||
|
||||
- "Create a constitutional AI prompt for content moderation that self-corrects problematic outputs"
|
||||
- "Design a chain-of-thought prompt for financial analysis that shows clear reasoning steps"
|
||||
- "Build a multi-agent prompt system for customer service with escalation workflows"
|
||||
@@ -248,4 +273,4 @@ Verify you have:
|
||||
☐ Included testing and evaluation recommendations
|
||||
☐ Considered safety and ethical implications
|
||||
|
||||
Remember: The best prompt is one that consistently produces the desired output with minimal post-processing. ALWAYS show the prompt, never just describe it.
|
||||
Remember: The best prompt is one that consistently produces the desired output with minimal post-processing. ALWAYS show the prompt, never just describe it.
|
||||
|
||||
@@ -15,6 +15,7 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
## Capabilities
|
||||
|
||||
### Vector Database Selection & Architecture
|
||||
|
||||
- **Pinecone**: Managed serverless, auto-scaling, metadata filtering
|
||||
- **Qdrant**: High-performance, Rust-based, complex filtering
|
||||
- **Weaviate**: GraphQL API, hybrid search, multi-tenancy
|
||||
@@ -23,6 +24,7 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
- **Chroma**: Lightweight, local development, embeddings built-in
|
||||
|
||||
### Embedding Model Selection
|
||||
|
||||
- **Voyage AI**: voyage-3-large (recommended for Claude apps), voyage-code-3, voyage-finance-2, voyage-law-2
|
||||
- **OpenAI**: text-embedding-3-large (3072 dims), text-embedding-3-small (1536 dims)
|
||||
- **Open Source**: BGE-large-en-v1.5, E5-large-v2, multilingual-e5-large
|
||||
@@ -30,6 +32,7 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
- Domain-specific fine-tuning strategies
|
||||
|
||||
### Index Configuration & Optimization
|
||||
|
||||
- **HNSW**: High recall, adjustable M and efConstruction parameters
|
||||
- **IVF**: Large-scale datasets, nlist/nprobe tuning
|
||||
- **Product Quantization (PQ)**: Memory optimization for billions of vectors
|
||||
@@ -37,6 +40,7 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
- Index selection based on recall/latency/memory tradeoffs
|
||||
|
||||
### Hybrid Search Implementation
|
||||
|
||||
- Vector + BM25 keyword search fusion
|
||||
- Reciprocal Rank Fusion (RRF) scoring
|
||||
- Weighted combination strategies
|
||||
@@ -44,6 +48,7 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
- Reranking with cross-encoders
|
||||
|
||||
### Document Processing Pipeline
|
||||
|
||||
- Chunking strategies: recursive, semantic, token-based
|
||||
- Metadata extraction and enrichment
|
||||
- Embedding batching and async processing
|
||||
@@ -51,6 +56,7 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
- Document versioning and deduplication
|
||||
|
||||
### Production Operations
|
||||
|
||||
- Monitoring: latency percentiles, recall metrics
|
||||
- Scaling: sharding, replication, auto-scaling
|
||||
- Backup and disaster recovery
|
||||
@@ -71,24 +77,28 @@ Specializes in designing and implementing production-grade vector search systems
|
||||
## Best Practices
|
||||
|
||||
### Embedding Selection
|
||||
|
||||
- Use Voyage AI for Claude-based applications (officially recommended by Anthropic)
|
||||
- Match embedding dimensions to use case (512-1024 for most, 3072 for maximum quality)
|
||||
- Consider domain-specific models for code, legal, finance
|
||||
- Test embedding quality on representative queries
|
||||
|
||||
### Chunking
|
||||
|
||||
- Chunk size 500-1000 tokens for most use cases
|
||||
- 10-20% overlap to preserve context boundaries
|
||||
- Use semantic chunking for complex documents
|
||||
- Include metadata for filtering and debugging
|
||||
|
||||
### Index Tuning
|
||||
|
||||
- Start with HNSW for most use cases (good recall/latency balance)
|
||||
- Use IVF+PQ for >10M vectors with memory constraints
|
||||
- Benchmark recall@10 vs latency for your specific queries
|
||||
- Monitor and re-tune as data grows
|
||||
|
||||
### Production
|
||||
|
||||
- Implement metadata filtering to reduce search space
|
||||
- Cache frequent queries and embeddings
|
||||
- Plan for index rebuilding (blue-green deployments)
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -24,6 +24,7 @@ Build sophisticated AI agent system for: $ARGUMENTS
|
||||
## Essential Architecture
|
||||
|
||||
### LangGraph State Management
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph, MessagesState, START, END
|
||||
from langgraph.prebuilt import create_react_agent
|
||||
@@ -35,6 +36,7 @@ class AgentState(TypedDict):
|
||||
```
|
||||
|
||||
### Model & Embeddings
|
||||
|
||||
- **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
|
||||
- **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
|
||||
- **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)
|
||||
@@ -84,6 +86,7 @@ base_retriever = vectorstore.as_retriever(
|
||||
```
|
||||
|
||||
### Advanced RAG Patterns
|
||||
|
||||
- **HyDE**: Generate hypothetical documents for better retrieval
|
||||
- **RAG Fusion**: Multiple query perspectives for comprehensive results
|
||||
- **Reranking**: Use Cohere Rerank for relevance optimization
|
||||
@@ -117,6 +120,7 @@ tool = StructuredTool.from_function(
|
||||
## Production Deployment
|
||||
|
||||
### FastAPI Server with Streaming
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI
|
||||
from fastapi.responses import StreamingResponse
|
||||
@@ -132,12 +136,14 @@ async def invoke_agent(request: AgentRequest):
|
||||
```
|
||||
|
||||
### Monitoring & Observability
|
||||
|
||||
- **LangSmith**: Trace all agent executions
|
||||
- **Prometheus**: Track metrics (requests, latency, errors)
|
||||
- **Structured Logging**: Use `structlog` for consistent logs
|
||||
- **Health Checks**: Validate LLM, tools, memory, and external services
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
- **Caching**: Redis for response caching with TTL
|
||||
- **Connection Pooling**: Reuse vector DB connections
|
||||
- **Load Balancing**: Multiple agent workers with round-robin routing
|
||||
@@ -165,6 +171,7 @@ results = await evaluate(
|
||||
## Key Patterns
|
||||
|
||||
### State Graph Pattern
|
||||
|
||||
```python
|
||||
builder = StateGraph(MessagesState)
|
||||
builder.add_node("node1", node1_func)
|
||||
@@ -176,6 +183,7 @@ agent = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
### Async Pattern
|
||||
|
||||
```python
|
||||
async def process_request(message: str, session_id: str):
|
||||
result = await agent.ainvoke(
|
||||
@@ -186,6 +194,7 @@ async def process_request(message: str, session_id: str):
|
||||
```
|
||||
|
||||
### Error Handling Pattern
|
||||
|
||||
```python
|
||||
from tenacity import retry, stop_after_attempt, wait_exponential
|
||||
|
||||
|
||||
@@ -22,12 +22,14 @@ $ARGUMENTS
|
||||
Evaluate the prompt across key dimensions:
|
||||
|
||||
**Assessment Framework**
|
||||
|
||||
- Clarity score (1-10) and ambiguity points
|
||||
- Structure: logical flow and section boundaries
|
||||
- Model alignment: capability utilization and token efficiency
|
||||
- Performance: success rate, failure modes, edge case handling
|
||||
|
||||
**Decomposition**
|
||||
|
||||
- Core objective and constraints
|
||||
- Output format requirements
|
||||
- Explicit vs implicit expectations
|
||||
@@ -36,6 +38,7 @@ Evaluate the prompt across key dimensions:
|
||||
### 2. Apply Chain-of-Thought Enhancement
|
||||
|
||||
**Standard CoT Pattern**
|
||||
|
||||
```python
|
||||
# Before: Simple instruction
|
||||
prompt = "Analyze this customer feedback and determine sentiment"
|
||||
@@ -56,11 +59,13 @@ Step 1 - Key emotional phrases:
|
||||
```
|
||||
|
||||
**Zero-Shot CoT**
|
||||
|
||||
```python
|
||||
enhanced = original + "\n\nLet's approach this step-by-step, breaking down the problem into smaller components and reasoning through each carefully."
|
||||
```
|
||||
|
||||
**Tree-of-Thoughts**
|
||||
|
||||
```python
|
||||
tot_prompt = """
|
||||
Explore multiple solution paths:
|
||||
@@ -79,6 +84,7 @@ Select best approach and implement.
|
||||
### 3. Implement Few-Shot Learning
|
||||
|
||||
**Strategic Example Selection**
|
||||
|
||||
```python
|
||||
few_shot = """
|
||||
Example 1 (Simple case):
|
||||
@@ -100,6 +106,7 @@ Now apply to: {actual_input}
|
||||
### 4. Apply Constitutional AI Patterns
|
||||
|
||||
**Self-Critique Loop**
|
||||
|
||||
```python
|
||||
constitutional = """
|
||||
{initial_instruction}
|
||||
@@ -119,7 +126,8 @@ Final Response: [Refined]
|
||||
### 5. Model-Specific Optimization
|
||||
|
||||
**GPT-5.2**
|
||||
```python
|
||||
|
||||
````python
|
||||
gpt5_optimized = """
|
||||
##CONTEXT##
|
||||
{structured_context}
|
||||
@@ -134,12 +142,13 @@ gpt5_optimized = """
|
||||
##OUTPUT FORMAT##
|
||||
```json
|
||||
{"structured": "response"}
|
||||
```
|
||||
````
|
||||
|
||||
##EXAMPLES##
|
||||
{few_shot_examples}
|
||||
"""
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
**Claude 4.5/4**
|
||||
```python
|
||||
@@ -162,9 +171,10 @@ claude_optimized = """
|
||||
{xml_structured_response}
|
||||
</output_format>
|
||||
"""
|
||||
```
|
||||
````
|
||||
|
||||
**Gemini Pro/Ultra**
|
||||
|
||||
```python
|
||||
gemini_optimized = """
|
||||
**System Context:** {background}
|
||||
@@ -188,6 +198,7 @@ gemini_optimized = """
|
||||
### 6. RAG Integration
|
||||
|
||||
**RAG-Optimized Prompt**
|
||||
|
||||
```python
|
||||
rag_prompt = """
|
||||
## Context Documents
|
||||
@@ -210,6 +221,7 @@ Example: "Based on [Source 1], {answer}. [Source 3] corroborates: {detail}. No i
|
||||
### 7. Evaluation Framework
|
||||
|
||||
**Testing Protocol**
|
||||
|
||||
```python
|
||||
evaluation = """
|
||||
## Test Cases (20 total)
|
||||
@@ -227,6 +239,7 @@ evaluation = """
|
||||
```
|
||||
|
||||
**LLM-as-Judge**
|
||||
|
||||
```python
|
||||
judge_prompt = """
|
||||
Evaluate AI response quality.
|
||||
@@ -252,6 +265,7 @@ Recommendation: Accept/Revise/Reject
|
||||
### 8. Production Deployment
|
||||
|
||||
**Prompt Versioning**
|
||||
|
||||
```python
|
||||
class PromptVersion:
|
||||
def __init__(self, base_prompt):
|
||||
@@ -270,6 +284,7 @@ class PromptVersion:
|
||||
```
|
||||
|
||||
**Error Handling**
|
||||
|
||||
```python
|
||||
robust_prompt = """
|
||||
{main_instruction}
|
||||
@@ -291,15 +306,18 @@ Provide partial solution with boundaries and next steps if full task cannot be c
|
||||
### Example 1: Customer Support
|
||||
|
||||
**Before**
|
||||
|
||||
```
|
||||
Answer customer questions about our product.
|
||||
```
|
||||
|
||||
**After**
|
||||
```markdown
|
||||
|
||||
````markdown
|
||||
You are a senior customer support specialist for TechCorp with 5+ years experience.
|
||||
|
||||
## Context
|
||||
|
||||
- Product: {product_name}
|
||||
- Customer Tier: {tier}
|
||||
- Issue Category: {category}
|
||||
@@ -307,9 +325,11 @@ You are a senior customer support specialist for TechCorp with 5+ years experien
|
||||
## Framework
|
||||
|
||||
### 1. Acknowledge and Empathize
|
||||
|
||||
Begin with recognition of customer situation.
|
||||
|
||||
### 2. Diagnostic Reasoning
|
||||
|
||||
<thinking>
|
||||
1. Identify core issue
|
||||
2. Consider common causes
|
||||
@@ -318,23 +338,27 @@ Begin with recognition of customer situation.
|
||||
</thinking>
|
||||
|
||||
### 3. Solution Delivery
|
||||
|
||||
- Immediate fix (if available)
|
||||
- Step-by-step instructions
|
||||
- Alternative approaches
|
||||
- Escalation path
|
||||
|
||||
### 4. Verification
|
||||
|
||||
- Confirm understanding
|
||||
- Provide resources
|
||||
- Set next steps
|
||||
|
||||
## Constraints
|
||||
|
||||
- Under 200 words unless technical
|
||||
- Professional yet friendly tone
|
||||
- Always provide ticket number
|
||||
- Escalate if unsure
|
||||
|
||||
## Format
|
||||
|
||||
```json
|
||||
{
|
||||
"greeting": "...",
|
||||
@@ -343,14 +367,18 @@ Begin with recognition of customer situation.
|
||||
"follow_up": "..."
|
||||
}
|
||||
```
|
||||
````
|
||||
|
||||
```
|
||||
|
||||
### Example 2: Data Analysis
|
||||
|
||||
**Before**
|
||||
```
|
||||
|
||||
Analyze this sales data and provide insights.
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
**After**
|
||||
```python
|
||||
@@ -404,16 +432,20 @@ recommendations:
|
||||
immediate: []
|
||||
short_term: []
|
||||
long_term: []
|
||||
```
|
||||
````
|
||||
|
||||
"""
|
||||
|
||||
```
|
||||
|
||||
### Example 3: Code Generation
|
||||
|
||||
**Before**
|
||||
```
|
||||
|
||||
Write a Python function to process user data.
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
**After**
|
||||
```python
|
||||
@@ -473,15 +505,17 @@ def process_user_data(raw_data: Dict[str, Any]) -> Union[ProcessedUser, Dict[str
|
||||
name=sanitize_string(raw_data['name'], 100),
|
||||
metadata={k: v for k, v in raw_data.items() if k not in required}
|
||||
)
|
||||
```
|
||||
````
|
||||
|
||||
### Self-Review
|
||||
|
||||
✓ Input validation and sanitization
|
||||
✓ Injection prevention
|
||||
✓ Error handling
|
||||
✓ Performance: O(n) complexity
|
||||
"""
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
### Example 4: Meta-Prompt Generator
|
||||
|
||||
@@ -530,18 +564,20 @@ ELSE: APPLY hybrid
|
||||
Overall: []/50
|
||||
Recommendation: use_as_is | iterate | redesign
|
||||
"""
|
||||
```
|
||||
````
|
||||
|
||||
## Output Format
|
||||
|
||||
Deliver comprehensive optimization report:
|
||||
|
||||
### Optimized Prompt
|
||||
|
||||
```markdown
|
||||
[Complete production-ready prompt with all enhancements]
|
||||
```
|
||||
|
||||
### Optimization Report
|
||||
|
||||
```yaml
|
||||
analysis:
|
||||
original_assessment:
|
||||
@@ -583,6 +619,7 @@ next_steps:
|
||||
```
|
||||
|
||||
### Usage Guidelines
|
||||
|
||||
1. **Implementation**: Use optimized prompt exactly
|
||||
2. **Parameters**: Apply recommended settings
|
||||
3. **Testing**: Run test cases before production
|
||||
|
||||
@@ -20,18 +20,18 @@ Guide to selecting and optimizing embedding models for vector search application
|
||||
|
||||
### 1. Embedding Model Comparison (2026)
|
||||
|
||||
| Model | Dimensions | Max Tokens | Best For |
|
||||
|-------|------------|------------|----------|
|
||||
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
|
||||
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
|
||||
| **voyage-code-3** | 1024 | 32000 | Code search |
|
||||
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
|
||||
| **voyage-law-2** | 1024 | 32000 | Legal documents |
|
||||
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
|
||||
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
|
||||
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
|
||||
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
|
||||
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
|
||||
| Model | Dimensions | Max Tokens | Best For |
|
||||
| -------------------------- | ---------- | ---------- | ----------------------------------- |
|
||||
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
|
||||
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
|
||||
| **voyage-code-3** | 1024 | 32000 | Code search |
|
||||
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
|
||||
| **voyage-law-2** | 1024 | 32000 | Legal documents |
|
||||
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
|
||||
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
|
||||
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
|
||||
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
|
||||
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
|
||||
|
||||
### 2. Embedding Pipeline
|
||||
|
||||
@@ -583,6 +583,7 @@ def compare_embedding_models(
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Match model to use case**: Code vs prose vs multilingual
|
||||
- **Chunk thoughtfully**: Preserve semantic boundaries
|
||||
- **Normalize embeddings**: For cosine similarity search
|
||||
@@ -591,6 +592,7 @@ def compare_embedding_models(
|
||||
- **Use Voyage AI for Claude apps**: Recommended by Anthropic
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't ignore token limits**: Truncation loses information
|
||||
- **Don't mix embedding models**: Incompatible vector spaces
|
||||
- **Don't skip preprocessing**: Garbage in, garbage out
|
||||
|
||||
@@ -27,12 +27,12 @@ Query → ┬─► Vector Search ──► Candidates ─┐
|
||||
|
||||
### 2. Fusion Methods
|
||||
|
||||
| Method | Description | Best For |
|
||||
|--------|-------------|----------|
|
||||
| **RRF** | Reciprocal Rank Fusion | General purpose |
|
||||
| **Linear** | Weighted sum of scores | Tunable balance |
|
||||
| Method | Description | Best For |
|
||||
| ----------------- | ------------------------ | --------------- |
|
||||
| **RRF** | Reciprocal Rank Fusion | General purpose |
|
||||
| **Linear** | Weighted sum of scores | Tunable balance |
|
||||
| **Cross-encoder** | Rerank with neural model | Highest quality |
|
||||
| **Cascade** | Filter then rerank | Efficiency |
|
||||
| **Cascade** | Filter then rerank | Efficiency |
|
||||
|
||||
## Templates
|
||||
|
||||
@@ -549,6 +549,7 @@ class HybridRAGPipeline:
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Tune weights empirically** - Test on your data
|
||||
- **Use RRF for simplicity** - Works well without tuning
|
||||
- **Add reranking** - Significant quality improvement
|
||||
@@ -556,6 +557,7 @@ class HybridRAGPipeline:
|
||||
- **A/B test** - Measure real user impact
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't assume one size fits all** - Different queries need different weights
|
||||
- **Don't skip keyword search** - Handles exact matches better
|
||||
- **Don't over-fetch** - Balance recall vs latency
|
||||
|
||||
@@ -33,9 +33,11 @@ langchain-pinecone # Pinecone vector store
|
||||
## Core Concepts
|
||||
|
||||
### 1. LangGraph Agents
|
||||
|
||||
LangGraph is the standard for building agents in 2026. It provides:
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **StateGraph**: Explicit state management with typed state
|
||||
- **Durable Execution**: Agents persist through failures
|
||||
- **Human-in-the-Loop**: Inspect and modify state at any point
|
||||
@@ -43,12 +45,14 @@ LangGraph is the standard for building agents in 2026. It provides:
|
||||
- **Checkpointing**: Save and resume agent state
|
||||
|
||||
**Agent Patterns:**
|
||||
|
||||
- **ReAct**: Reasoning + Acting with `create_react_agent`
|
||||
- **Plan-and-Execute**: Separate planning and execution nodes
|
||||
- **Multi-Agent**: Supervisor routing between specialized agents
|
||||
- **Tool-Calling**: Structured tool invocation with Pydantic schemas
|
||||
|
||||
### 2. State Management
|
||||
|
||||
LangGraph uses TypedDict for explicit state:
|
||||
|
||||
```python
|
||||
@@ -69,6 +73,7 @@ class CustomState(TypedDict):
|
||||
```
|
||||
|
||||
### 3. Memory Systems
|
||||
|
||||
Modern memory implementations:
|
||||
|
||||
- **ConversationBufferMemory**: Stores all messages (short conversations)
|
||||
@@ -78,15 +83,18 @@ Modern memory implementations:
|
||||
- **LangGraph Checkpointers**: Persistent state across sessions
|
||||
|
||||
### 4. Document Processing
|
||||
|
||||
Loading, transforming, and storing documents:
|
||||
|
||||
**Components:**
|
||||
|
||||
- **Document Loaders**: Load from various sources
|
||||
- **Text Splitters**: Chunk documents intelligently
|
||||
- **Vector Stores**: Store and retrieve embeddings
|
||||
- **Retrievers**: Fetch relevant documents
|
||||
|
||||
### 5. Callbacks & Tracing
|
||||
|
||||
LangSmith is the standard for observability:
|
||||
|
||||
- Request/response logging
|
||||
|
||||
@@ -20,9 +20,11 @@ Master comprehensive evaluation strategies for LLM applications, from automated
|
||||
## Core Evaluation Types
|
||||
|
||||
### 1. Automated Metrics
|
||||
|
||||
Fast, repeatable, scalable evaluation using computed scores.
|
||||
|
||||
**Text Generation:**
|
||||
|
||||
- **BLEU**: N-gram overlap (translation)
|
||||
- **ROUGE**: Recall-oriented (summarization)
|
||||
- **METEOR**: Semantic similarity
|
||||
@@ -30,21 +32,25 @@ Fast, repeatable, scalable evaluation using computed scores.
|
||||
- **Perplexity**: Language model confidence
|
||||
|
||||
**Classification:**
|
||||
|
||||
- **Accuracy**: Percentage correct
|
||||
- **Precision/Recall/F1**: Class-specific performance
|
||||
- **Confusion Matrix**: Error patterns
|
||||
- **AUC-ROC**: Ranking quality
|
||||
|
||||
**Retrieval (RAG):**
|
||||
|
||||
- **MRR**: Mean Reciprocal Rank
|
||||
- **NDCG**: Normalized Discounted Cumulative Gain
|
||||
- **Precision@K**: Relevant in top K
|
||||
- **Recall@K**: Coverage in top K
|
||||
|
||||
### 2. Human Evaluation
|
||||
|
||||
Manual assessment for quality aspects difficult to automate.
|
||||
|
||||
**Dimensions:**
|
||||
|
||||
- **Accuracy**: Factual correctness
|
||||
- **Coherence**: Logical flow
|
||||
- **Relevance**: Answers the question
|
||||
@@ -53,9 +59,11 @@ Manual assessment for quality aspects difficult to automate.
|
||||
- **Helpfulness**: Useful to the user
|
||||
|
||||
### 3. LLM-as-Judge
|
||||
|
||||
Use stronger LLMs to evaluate weaker model outputs.
|
||||
|
||||
**Approaches:**
|
||||
|
||||
- **Pointwise**: Score individual responses
|
||||
- **Pairwise**: Compare two responses
|
||||
- **Reference-based**: Compare to gold standard
|
||||
@@ -134,6 +142,7 @@ results = await suite.evaluate(model=your_model, test_cases=test_cases)
|
||||
## Automated Metrics Implementation
|
||||
|
||||
### BLEU Score
|
||||
|
||||
```python
|
||||
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
|
||||
|
||||
@@ -149,6 +158,7 @@ def calculate_bleu(reference: str, hypothesis: str, **kwargs) -> float:
|
||||
```
|
||||
|
||||
### ROUGE Score
|
||||
|
||||
```python
|
||||
from rouge_score import rouge_scorer
|
||||
|
||||
@@ -168,6 +178,7 @@ def calculate_rouge(reference: str, hypothesis: str, **kwargs) -> dict:
|
||||
```
|
||||
|
||||
### BERTScore
|
||||
|
||||
```python
|
||||
from bert_score import score
|
||||
|
||||
@@ -192,6 +203,7 @@ def calculate_bertscore(
|
||||
```
|
||||
|
||||
### Custom Metrics
|
||||
|
||||
```python
|
||||
def calculate_groundedness(response: str, context: str, **kwargs) -> float:
|
||||
"""Check if response is grounded in provided context."""
|
||||
@@ -232,6 +244,7 @@ def calculate_factuality(claim: str, sources: list[str], **kwargs) -> float:
|
||||
## LLM-as-Judge Patterns
|
||||
|
||||
### Single Output Evaluation
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic
|
||||
from pydantic import BaseModel, Field
|
||||
@@ -280,6 +293,7 @@ Provide ratings in JSON format:
|
||||
```
|
||||
|
||||
### Pairwise Comparison
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Literal
|
||||
@@ -324,6 +338,7 @@ Answer with JSON:
|
||||
```
|
||||
|
||||
### Reference-Based Evaluation
|
||||
|
||||
```python
|
||||
class ReferenceEvaluation(BaseModel):
|
||||
semantic_similarity: float = Field(ge=0, le=1)
|
||||
@@ -371,6 +386,7 @@ Respond in JSON:
|
||||
## Human Evaluation Frameworks
|
||||
|
||||
### Annotation Guidelines
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional
|
||||
@@ -412,6 +428,7 @@ class AnnotationTask:
|
||||
```
|
||||
|
||||
### Inter-Rater Agreement
|
||||
|
||||
```python
|
||||
from sklearn.metrics import cohen_kappa_score
|
||||
|
||||
@@ -444,6 +461,7 @@ def calculate_agreement(
|
||||
## A/B Testing
|
||||
|
||||
### Statistical Testing Framework
|
||||
|
||||
```python
|
||||
from scipy import stats
|
||||
import numpy as np
|
||||
@@ -504,6 +522,7 @@ class ABTest:
|
||||
## Regression Testing
|
||||
|
||||
### Regression Detection
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
|
||||
@@ -595,6 +614,7 @@ print(f"Mean score: {experiment_results.aggregate_metrics['qa']['mean']}")
|
||||
## Benchmarking
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
import numpy as np
|
||||
|
||||
@@ -21,6 +21,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
## Core Capabilities
|
||||
|
||||
### 1. Few-Shot Learning
|
||||
|
||||
- Example selection strategies (semantic similarity, diversity sampling)
|
||||
- Balancing example count with context window constraints
|
||||
- Constructing effective demonstrations with input-output pairs
|
||||
@@ -28,6 +29,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Handling edge cases through strategic example selection
|
||||
|
||||
### 2. Chain-of-Thought Prompting
|
||||
|
||||
- Step-by-step reasoning elicitation
|
||||
- Zero-shot CoT with "Let's think step by step"
|
||||
- Few-shot CoT with reasoning traces
|
||||
@@ -35,12 +37,14 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Verification and validation steps
|
||||
|
||||
### 3. Structured Outputs
|
||||
|
||||
- JSON mode for reliable parsing
|
||||
- Pydantic schema enforcement
|
||||
- Type-safe response handling
|
||||
- Error handling for malformed outputs
|
||||
|
||||
### 4. Prompt Optimization
|
||||
|
||||
- Iterative refinement workflows
|
||||
- A/B testing prompt variations
|
||||
- Measuring prompt performance metrics (accuracy, consistency, latency)
|
||||
@@ -48,6 +52,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Handling edge cases and failure modes
|
||||
|
||||
### 5. Template Systems
|
||||
|
||||
- Variable interpolation and formatting
|
||||
- Conditional prompt sections
|
||||
- Multi-turn conversation templates
|
||||
@@ -55,6 +60,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Modular prompt components
|
||||
|
||||
### 6. System Prompt Design
|
||||
|
||||
- Setting model behavior and constraints
|
||||
- Defining output formats and structure
|
||||
- Establishing role and expertise
|
||||
@@ -395,6 +401,7 @@ Response:"""
|
||||
## Performance Optimization
|
||||
|
||||
### Token Efficiency
|
||||
|
||||
```python
|
||||
# Before: Verbose prompt (150+ tokens)
|
||||
verbose_prompt = """
|
||||
@@ -457,6 +464,7 @@ response = client.messages.create(
|
||||
## Success Metrics
|
||||
|
||||
Track these KPIs for your prompts:
|
||||
|
||||
- **Accuracy**: Correctness of outputs
|
||||
- **Consistency**: Reproducibility across similar inputs
|
||||
- **Latency**: Response time (P50, P95, P99)
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Classification Templates
|
||||
|
||||
### Sentiment Analysis
|
||||
|
||||
```
|
||||
Classify the sentiment of the following text as Positive, Negative, or Neutral.
|
||||
|
||||
@@ -12,6 +13,7 @@ Sentiment:
|
||||
```
|
||||
|
||||
### Intent Detection
|
||||
|
||||
```
|
||||
Determine the user's intent from the following message.
|
||||
|
||||
@@ -23,6 +25,7 @@ Intent:
|
||||
```
|
||||
|
||||
### Topic Classification
|
||||
|
||||
```
|
||||
Classify the following article into one of these categories: {categories}
|
||||
|
||||
@@ -35,6 +38,7 @@ Category:
|
||||
## Extraction Templates
|
||||
|
||||
### Named Entity Recognition
|
||||
|
||||
```
|
||||
Extract all named entities from the text and categorize them.
|
||||
|
||||
@@ -50,6 +54,7 @@ Entities (JSON format):
|
||||
```
|
||||
|
||||
### Structured Data Extraction
|
||||
|
||||
```
|
||||
Extract structured information from the job posting.
|
||||
|
||||
@@ -70,6 +75,7 @@ Extracted Information (JSON):
|
||||
## Generation Templates
|
||||
|
||||
### Email Generation
|
||||
|
||||
```
|
||||
Write a professional {email_type} email.
|
||||
|
||||
@@ -84,6 +90,7 @@ Body:
|
||||
```
|
||||
|
||||
### Code Generation
|
||||
|
||||
```
|
||||
Generate {language} code for the following task:
|
||||
|
||||
@@ -101,6 +108,7 @@ Code:
|
||||
```
|
||||
|
||||
### Creative Writing
|
||||
|
||||
```
|
||||
Write a {length}-word {style} story about {topic}.
|
||||
|
||||
@@ -115,6 +123,7 @@ Story:
|
||||
## Transformation Templates
|
||||
|
||||
### Summarization
|
||||
|
||||
```
|
||||
Summarize the following text in {num_sentences} sentences.
|
||||
|
||||
@@ -125,6 +134,7 @@ Summary:
|
||||
```
|
||||
|
||||
### Translation with Context
|
||||
|
||||
```
|
||||
Translate the following {source_lang} text to {target_lang}.
|
||||
|
||||
@@ -137,6 +147,7 @@ Translation:
|
||||
```
|
||||
|
||||
### Format Conversion
|
||||
|
||||
```
|
||||
Convert the following {source_format} to {target_format}.
|
||||
|
||||
@@ -149,6 +160,7 @@ Output ({target_format}):
|
||||
## Analysis Templates
|
||||
|
||||
### Code Review
|
||||
|
||||
```
|
||||
Review the following code for:
|
||||
1. Bugs and errors
|
||||
@@ -163,6 +175,7 @@ Review:
|
||||
```
|
||||
|
||||
### SWOT Analysis
|
||||
|
||||
```
|
||||
Conduct a SWOT analysis for: {subject}
|
||||
|
||||
@@ -185,6 +198,7 @@ Threats:
|
||||
## Question Answering Templates
|
||||
|
||||
### RAG Template
|
||||
|
||||
```
|
||||
Answer the question based on the provided context. If the context doesn't contain enough information, say so.
|
||||
|
||||
@@ -197,6 +211,7 @@ Answer:
|
||||
```
|
||||
|
||||
### Multi-Turn Q&A
|
||||
|
||||
```
|
||||
Previous conversation:
|
||||
{conversation_history}
|
||||
@@ -209,6 +224,7 @@ Answer (continue naturally from conversation):
|
||||
## Specialized Templates
|
||||
|
||||
### SQL Query Generation
|
||||
|
||||
```
|
||||
Generate a SQL query for the following request.
|
||||
|
||||
@@ -221,6 +237,7 @@ SQL Query:
|
||||
```
|
||||
|
||||
### Regex Pattern Creation
|
||||
|
||||
```
|
||||
Create a regex pattern to match: {requirement}
|
||||
|
||||
@@ -234,6 +251,7 @@ Regex pattern:
|
||||
```
|
||||
|
||||
### API Documentation
|
||||
|
||||
```
|
||||
Generate API documentation for this function:
|
||||
|
||||
|
||||
@@ -7,6 +7,7 @@ Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from LLMs, drama
|
||||
## Core Techniques
|
||||
|
||||
### Zero-Shot CoT
|
||||
|
||||
Add a simple trigger phrase to elicit reasoning:
|
||||
|
||||
```python
|
||||
@@ -29,6 +30,7 @@ prompt = zero_shot_cot(query)
|
||||
```
|
||||
|
||||
### Few-Shot CoT
|
||||
|
||||
Provide examples with explicit reasoning chains:
|
||||
|
||||
```python
|
||||
@@ -53,6 +55,7 @@ A: Let's think step by step:"""
|
||||
```
|
||||
|
||||
### Self-Consistency
|
||||
|
||||
Generate multiple reasoning paths and take the majority vote:
|
||||
|
||||
```python
|
||||
@@ -85,6 +88,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
|
||||
## Advanced Patterns
|
||||
|
||||
### Least-to-Most Prompting
|
||||
|
||||
Break complex problems into simpler subproblems:
|
||||
|
||||
```python
|
||||
@@ -125,6 +129,7 @@ Final Answer:"""
|
||||
```
|
||||
|
||||
### Tree-of-Thought (ToT)
|
||||
|
||||
Explore multiple reasoning branches:
|
||||
|
||||
```python
|
||||
@@ -176,6 +181,7 @@ Score:"""
|
||||
```
|
||||
|
||||
### Verification Step
|
||||
|
||||
Add explicit verification to catch errors:
|
||||
|
||||
```python
|
||||
@@ -220,6 +226,7 @@ Corrected solution:"""
|
||||
## Domain-Specific CoT
|
||||
|
||||
### Math Problems
|
||||
|
||||
```python
|
||||
math_cot_template = """
|
||||
Problem: {problem}
|
||||
@@ -248,6 +255,7 @@ Answer: {final_answer}
|
||||
```
|
||||
|
||||
### Code Debugging
|
||||
|
||||
```python
|
||||
debug_cot_template = """
|
||||
Code with error:
|
||||
@@ -278,6 +286,7 @@ Fixed code:
|
||||
```
|
||||
|
||||
### Logical Reasoning
|
||||
|
||||
```python
|
||||
logic_cot_template = """
|
||||
Premises:
|
||||
@@ -305,6 +314,7 @@ Answer: {final_answer}
|
||||
## Performance Optimization
|
||||
|
||||
### Caching Reasoning Patterns
|
||||
|
||||
```python
|
||||
class ReasoningCache:
|
||||
def __init__(self):
|
||||
@@ -328,6 +338,7 @@ class ReasoningCache:
|
||||
```
|
||||
|
||||
### Adaptive Reasoning Depth
|
||||
|
||||
```python
|
||||
def adaptive_cot(problem, initial_depth=3):
|
||||
depth = initial_depth
|
||||
@@ -378,6 +389,7 @@ def evaluate_cot_quality(reasoning_chain):
|
||||
## When to Use CoT
|
||||
|
||||
**Use CoT for:**
|
||||
|
||||
- Math and arithmetic problems
|
||||
- Logical reasoning tasks
|
||||
- Multi-step planning
|
||||
@@ -385,6 +397,7 @@ def evaluate_cot_quality(reasoning_chain):
|
||||
- Complex decision making
|
||||
|
||||
**Skip CoT for:**
|
||||
|
||||
- Simple factual queries
|
||||
- Direct lookups
|
||||
- Creative writing
|
||||
|
||||
@@ -7,6 +7,7 @@ Few-shot learning enables LLMs to perform tasks by providing a small number of e
|
||||
## Example Selection Strategies
|
||||
|
||||
### 1. Semantic Similarity
|
||||
|
||||
Select examples most similar to the input query using embedding-based retrieval.
|
||||
|
||||
```python
|
||||
@@ -29,6 +30,7 @@ class SemanticExampleSelector:
|
||||
**Best For**: Question answering, text classification, extraction tasks
|
||||
|
||||
### 2. Diversity Sampling
|
||||
|
||||
Maximize coverage of different patterns and edge cases.
|
||||
|
||||
```python
|
||||
@@ -58,6 +60,7 @@ class DiversityExampleSelector:
|
||||
**Best For**: Demonstrating task variability, edge case handling
|
||||
|
||||
### 3. Difficulty-Based Selection
|
||||
|
||||
Gradually increase example complexity to scaffold learning.
|
||||
|
||||
```python
|
||||
@@ -75,6 +78,7 @@ class ProgressiveExampleSelector:
|
||||
**Best For**: Complex reasoning tasks, code generation
|
||||
|
||||
### 4. Error-Based Selection
|
||||
|
||||
Include examples that address common failure modes.
|
||||
|
||||
```python
|
||||
@@ -98,6 +102,7 @@ class ErrorGuidedSelector:
|
||||
## Example Construction Best Practices
|
||||
|
||||
### Format Consistency
|
||||
|
||||
All examples should follow identical formatting:
|
||||
|
||||
```python
|
||||
@@ -121,6 +126,7 @@ examples = [
|
||||
```
|
||||
|
||||
### Input-Output Alignment
|
||||
|
||||
Ensure examples demonstrate the exact task you want the model to perform:
|
||||
|
||||
```python
|
||||
@@ -138,6 +144,7 @@ example = {
|
||||
```
|
||||
|
||||
### Complexity Balance
|
||||
|
||||
Include examples spanning the expected difficulty range:
|
||||
|
||||
```python
|
||||
@@ -156,6 +163,7 @@ examples = [
|
||||
## Context Window Management
|
||||
|
||||
### Token Budget Allocation
|
||||
|
||||
Typical distribution for a 4K context window:
|
||||
|
||||
```
|
||||
@@ -166,6 +174,7 @@ Response: 1500 tokens (38%)
|
||||
```
|
||||
|
||||
### Dynamic Example Truncation
|
||||
|
||||
```python
|
||||
class TokenAwareSelector:
|
||||
def __init__(self, examples, tokenizer, max_tokens=1500):
|
||||
@@ -197,6 +206,7 @@ class TokenAwareSelector:
|
||||
## Edge Case Handling
|
||||
|
||||
### Include Boundary Examples
|
||||
|
||||
```python
|
||||
edge_case_examples = [
|
||||
# Empty input
|
||||
@@ -216,6 +226,7 @@ edge_case_examples = [
|
||||
## Few-Shot Prompt Templates
|
||||
|
||||
### Classification Template
|
||||
|
||||
```python
|
||||
def build_classification_prompt(examples, query, labels):
|
||||
prompt = f"Classify the text into one of these categories: {', '.join(labels)}\n\n"
|
||||
@@ -228,6 +239,7 @@ def build_classification_prompt(examples, query, labels):
|
||||
```
|
||||
|
||||
### Extraction Template
|
||||
|
||||
```python
|
||||
def build_extraction_prompt(examples, query):
|
||||
prompt = "Extract structured information from the text.\n\n"
|
||||
@@ -240,6 +252,7 @@ def build_extraction_prompt(examples, query):
|
||||
```
|
||||
|
||||
### Transformation Template
|
||||
|
||||
```python
|
||||
def build_transformation_prompt(examples, query):
|
||||
prompt = "Transform the input according to the pattern shown in examples.\n\n"
|
||||
@@ -254,6 +267,7 @@ def build_transformation_prompt(examples, query):
|
||||
## Evaluation and Optimization
|
||||
|
||||
### Example Quality Metrics
|
||||
|
||||
```python
|
||||
def evaluate_example_quality(example, validation_set):
|
||||
metrics = {
|
||||
@@ -266,6 +280,7 @@ def evaluate_example_quality(example, validation_set):
|
||||
```
|
||||
|
||||
### A/B Testing Example Sets
|
||||
|
||||
```python
|
||||
class ExampleSetTester:
|
||||
def __init__(self, llm_client):
|
||||
@@ -295,6 +310,7 @@ class ExampleSetTester:
|
||||
## Advanced Techniques
|
||||
|
||||
### Meta-Learning (Learning to Select)
|
||||
|
||||
Train a small model to predict which examples will be most effective:
|
||||
|
||||
```python
|
||||
@@ -334,6 +350,7 @@ class LearnedExampleSelector:
|
||||
```
|
||||
|
||||
### Adaptive Example Count
|
||||
|
||||
Dynamically adjust the number of examples based on task difficulty:
|
||||
|
||||
```python
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Systematic Refinement Process
|
||||
|
||||
### 1. Baseline Establishment
|
||||
|
||||
```python
|
||||
def establish_baseline(prompt, test_cases):
|
||||
results = {
|
||||
@@ -26,6 +27,7 @@ def establish_baseline(prompt, test_cases):
|
||||
```
|
||||
|
||||
### 2. Iterative Refinement Workflow
|
||||
|
||||
```
|
||||
Initial Prompt → Test → Analyze Failures → Refine → Test → Repeat
|
||||
```
|
||||
@@ -64,6 +66,7 @@ class PromptOptimizer:
|
||||
```
|
||||
|
||||
### 3. A/B Testing Framework
|
||||
|
||||
```python
|
||||
class PromptABTest:
|
||||
def __init__(self, variant_a, variant_b):
|
||||
@@ -116,6 +119,7 @@ class PromptABTest:
|
||||
## Optimization Strategies
|
||||
|
||||
### Token Reduction
|
||||
|
||||
```python
|
||||
def optimize_for_tokens(prompt):
|
||||
optimizations = [
|
||||
@@ -144,6 +148,7 @@ def optimize_for_tokens(prompt):
|
||||
```
|
||||
|
||||
### Latency Reduction
|
||||
|
||||
```python
|
||||
def optimize_for_latency(prompt):
|
||||
strategies = {
|
||||
@@ -167,6 +172,7 @@ def optimize_for_latency(prompt):
|
||||
```
|
||||
|
||||
### Accuracy Improvement
|
||||
|
||||
```python
|
||||
def improve_accuracy(prompt, failure_cases):
|
||||
improvements = []
|
||||
@@ -194,6 +200,7 @@ def improve_accuracy(prompt, failure_cases):
|
||||
## Performance Metrics
|
||||
|
||||
### Core Metrics
|
||||
|
||||
```python
|
||||
class PromptMetrics:
|
||||
@staticmethod
|
||||
@@ -230,6 +237,7 @@ class PromptMetrics:
|
||||
```
|
||||
|
||||
### Automated Evaluation
|
||||
|
||||
```python
|
||||
def evaluate_prompt_comprehensively(prompt, test_suite):
|
||||
results = {
|
||||
@@ -274,6 +282,7 @@ def evaluate_prompt_comprehensively(prompt, test_suite):
|
||||
## Failure Analysis
|
||||
|
||||
### Categorizing Failures
|
||||
|
||||
```python
|
||||
class FailureAnalyzer:
|
||||
def categorize_failures(self, test_results):
|
||||
@@ -326,6 +335,7 @@ class FailureAnalyzer:
|
||||
## Versioning and Rollback
|
||||
|
||||
### Prompt Version Control
|
||||
|
||||
```python
|
||||
class PromptVersionControl:
|
||||
def __init__(self, storage_path):
|
||||
@@ -381,24 +391,28 @@ class PromptVersionControl:
|
||||
## Common Optimization Patterns
|
||||
|
||||
### Pattern 1: Add Structure
|
||||
|
||||
```
|
||||
Before: "Analyze this text"
|
||||
After: "Analyze this text for:\n1. Main topic\n2. Key arguments\n3. Conclusion"
|
||||
```
|
||||
|
||||
### Pattern 2: Add Examples
|
||||
|
||||
```
|
||||
Before: "Extract entities"
|
||||
After: "Extract entities\\n\\nExample:\\nText: Apple released iPhone\\nEntities: {company: Apple, product: iPhone}"
|
||||
```
|
||||
|
||||
### Pattern 3: Add Constraints
|
||||
|
||||
```
|
||||
Before: "Summarize this"
|
||||
After: "Summarize in exactly 3 bullet points, 15 words each"
|
||||
```
|
||||
|
||||
### Pattern 4: Add Verification
|
||||
|
||||
```
|
||||
Before: "Calculate..."
|
||||
After: "Calculate... Then verify your calculation is correct before responding."
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Template Architecture
|
||||
|
||||
### Basic Template Structure
|
||||
|
||||
```python
|
||||
class PromptTemplate:
|
||||
def __init__(self, template_string, variables=None):
|
||||
@@ -30,6 +31,7 @@ prompt = template.render(
|
||||
```
|
||||
|
||||
### Conditional Templates
|
||||
|
||||
```python
|
||||
class ConditionalTemplate(PromptTemplate):
|
||||
def render(self, **kwargs):
|
||||
@@ -84,6 +86,7 @@ Reference examples:
|
||||
```
|
||||
|
||||
### Modular Template Composition
|
||||
|
||||
```python
|
||||
class ModularTemplate:
|
||||
def __init__(self):
|
||||
@@ -133,6 +136,7 @@ advanced_prompt = builder.render(
|
||||
## Common Template Patterns
|
||||
|
||||
### Classification Template
|
||||
|
||||
```python
|
||||
CLASSIFICATION_TEMPLATE = """
|
||||
Classify the following {content_type} into one of these categories: {categories}
|
||||
@@ -153,6 +157,7 @@ Category:"""
|
||||
```
|
||||
|
||||
### Extraction Template
|
||||
|
||||
```python
|
||||
EXTRACTION_TEMPLATE = """
|
||||
Extract structured information from the {content_type}.
|
||||
@@ -171,6 +176,7 @@ Extracted information (JSON):"""
|
||||
```
|
||||
|
||||
### Generation Template
|
||||
|
||||
```python
|
||||
GENERATION_TEMPLATE = """
|
||||
Generate {output_type} based on the following {input_type}.
|
||||
@@ -198,6 +204,7 @@ Examples:
|
||||
```
|
||||
|
||||
### Transformation Template
|
||||
|
||||
```python
|
||||
TRANSFORMATION_TEMPLATE = """
|
||||
Transform the input {source_format} to {target_format}.
|
||||
@@ -219,6 +226,7 @@ Output {target_format}:"""
|
||||
## Advanced Features
|
||||
|
||||
### Template Inheritance
|
||||
|
||||
```python
|
||||
class TemplateRegistry:
|
||||
def __init__(self):
|
||||
@@ -251,6 +259,7 @@ registry.register('sentiment_analysis', {
|
||||
```
|
||||
|
||||
### Variable Validation
|
||||
|
||||
```python
|
||||
class ValidatedTemplate:
|
||||
def __init__(self, template, schema):
|
||||
@@ -294,6 +303,7 @@ template = ValidatedTemplate(
|
||||
```
|
||||
|
||||
### Template Caching
|
||||
|
||||
```python
|
||||
class CachedTemplate:
|
||||
def __init__(self, template):
|
||||
@@ -323,6 +333,7 @@ class CachedTemplate:
|
||||
## Multi-Turn Templates
|
||||
|
||||
### Conversation Template
|
||||
|
||||
```python
|
||||
class ConversationTemplate:
|
||||
def __init__(self, system_prompt):
|
||||
@@ -349,6 +360,7 @@ class ConversationTemplate:
|
||||
```
|
||||
|
||||
### State-Based Templates
|
||||
|
||||
```python
|
||||
class StatefulTemplate:
|
||||
def __init__(self):
|
||||
@@ -406,6 +418,7 @@ Here's the result: {result}
|
||||
## Template Libraries
|
||||
|
||||
### Question Answering
|
||||
|
||||
```python
|
||||
QA_TEMPLATES = {
|
||||
'factual': """Answer the question based on the context.
|
||||
@@ -432,6 +445,7 @@ Assistant:"""
|
||||
```
|
||||
|
||||
### Content Generation
|
||||
|
||||
```python
|
||||
GENERATION_TEMPLATES = {
|
||||
'blog_post': """Write a blog post about {topic}.
|
||||
|
||||
@@ -11,6 +11,7 @@ System prompts set the foundation for LLM behavior. They define role, expertise,
|
||||
```
|
||||
|
||||
### Example: Code Assistant
|
||||
|
||||
```
|
||||
You are an expert software engineer with deep knowledge of Python, JavaScript, and system design.
|
||||
|
||||
@@ -36,6 +37,7 @@ Output format:
|
||||
## Pattern Library
|
||||
|
||||
### 1. Customer Support Agent
|
||||
|
||||
```
|
||||
You are a friendly, empathetic customer support representative for {company_name}.
|
||||
|
||||
@@ -59,6 +61,7 @@ Constraints:
|
||||
```
|
||||
|
||||
### 2. Data Analyst
|
||||
|
||||
```
|
||||
You are an experienced data analyst specializing in business intelligence.
|
||||
|
||||
@@ -85,6 +88,7 @@ Output:
|
||||
```
|
||||
|
||||
### 3. Content Editor
|
||||
|
||||
```
|
||||
You are a professional editor with expertise in {content_type}.
|
||||
|
||||
@@ -112,6 +116,7 @@ Format your feedback as:
|
||||
## Advanced Techniques
|
||||
|
||||
### Dynamic Role Adaptation
|
||||
|
||||
```python
|
||||
def build_adaptive_system_prompt(task_type, difficulty):
|
||||
base = "You are an expert assistant"
|
||||
@@ -136,6 +141,7 @@ Expertise level: {difficulty}
|
||||
```
|
||||
|
||||
### Constraint Specification
|
||||
|
||||
```
|
||||
Hard constraints (MUST follow):
|
||||
- Never generate harmful, biased, or illegal content
|
||||
|
||||
@@ -20,9 +20,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
## Core Components
|
||||
|
||||
### 1. Vector Databases
|
||||
|
||||
**Purpose**: Store and retrieve document embeddings efficiently
|
||||
|
||||
**Options:**
|
||||
|
||||
- **Pinecone**: Managed, scalable, serverless
|
||||
- **Weaviate**: Open-source, hybrid search, GraphQL
|
||||
- **Milvus**: High performance, on-premise
|
||||
@@ -31,6 +33,7 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
- **pgvector**: PostgreSQL extension, SQL integration
|
||||
|
||||
### 2. Embeddings
|
||||
|
||||
**Purpose**: Convert text to numerical vectors for similarity search
|
||||
|
||||
**Models (2026):**
|
||||
@@ -44,7 +47,9 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
| **multilingual-e5-large** | 1024 | Multi-language support |
|
||||
|
||||
### 3. Retrieval Strategies
|
||||
|
||||
**Approaches:**
|
||||
|
||||
- **Dense Retrieval**: Semantic similarity via embeddings
|
||||
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
|
||||
- **Hybrid Search**: Combine dense + sparse with weighted fusion
|
||||
@@ -52,9 +57,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
- **HyDE**: Generate hypothetical documents for better retrieval
|
||||
|
||||
### 4. Reranking
|
||||
|
||||
**Purpose**: Improve retrieval quality by reordering results
|
||||
|
||||
**Methods:**
|
||||
|
||||
- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
|
||||
- **Cohere Rerank**: API-based reranking
|
||||
- **Maximal Marginal Relevance (MMR)**: Diversity + relevance
|
||||
@@ -255,6 +262,7 @@ hyde_rag = builder.compile()
|
||||
## Document Chunking Strategies
|
||||
|
||||
### Recursive Character Text Splitter
|
||||
|
||||
```python
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
|
||||
@@ -269,6 +277,7 @@ chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### Token-Based Splitting
|
||||
|
||||
```python
|
||||
from langchain_text_splitters import TokenTextSplitter
|
||||
|
||||
@@ -280,6 +289,7 @@ splitter = TokenTextSplitter(
|
||||
```
|
||||
|
||||
### Semantic Chunking
|
||||
|
||||
```python
|
||||
from langchain_experimental.text_splitter import SemanticChunker
|
||||
|
||||
@@ -291,6 +301,7 @@ splitter = SemanticChunker(
|
||||
```
|
||||
|
||||
### Markdown Header Splitter
|
||||
|
||||
```python
|
||||
from langchain_text_splitters import MarkdownHeaderTextSplitter
|
||||
|
||||
@@ -309,6 +320,7 @@ splitter = MarkdownHeaderTextSplitter(
|
||||
## Vector Store Configurations
|
||||
|
||||
### Pinecone (Serverless)
|
||||
|
||||
```python
|
||||
from pinecone import Pinecone, ServerlessSpec
|
||||
from langchain_pinecone import PineconeVectorStore
|
||||
@@ -331,6 +343,7 @@ vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
|
||||
```
|
||||
|
||||
### Weaviate
|
||||
|
||||
```python
|
||||
import weaviate
|
||||
from langchain_weaviate import WeaviateVectorStore
|
||||
@@ -346,6 +359,7 @@ vectorstore = WeaviateVectorStore(
|
||||
```
|
||||
|
||||
### Chroma (Local Development)
|
||||
|
||||
```python
|
||||
from langchain_chroma import Chroma
|
||||
|
||||
@@ -357,6 +371,7 @@ vectorstore = Chroma(
|
||||
```
|
||||
|
||||
### pgvector (PostgreSQL)
|
||||
|
||||
```python
|
||||
from langchain_postgres.vectorstores import PGVector
|
||||
|
||||
@@ -372,6 +387,7 @@ vectorstore = PGVector(
|
||||
## Retrieval Optimization
|
||||
|
||||
### 1. Metadata Filtering
|
||||
|
||||
```python
|
||||
from langchain_core.documents import Document
|
||||
|
||||
@@ -394,6 +410,7 @@ results = await vectorstore.asimilarity_search(
|
||||
```
|
||||
|
||||
### 2. Maximal Marginal Relevance (MMR)
|
||||
|
||||
```python
|
||||
# Balance relevance with diversity
|
||||
results = await vectorstore.amax_marginal_relevance_search(
|
||||
@@ -405,6 +422,7 @@ results = await vectorstore.amax_marginal_relevance_search(
|
||||
```
|
||||
|
||||
### 3. Reranking with Cross-Encoder
|
||||
|
||||
```python
|
||||
from sentence_transformers import CrossEncoder
|
||||
|
||||
@@ -424,6 +442,7 @@ async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
|
||||
```
|
||||
|
||||
### 4. Cohere Rerank
|
||||
|
||||
```python
|
||||
from langchain.retrievers import CohereRerank
|
||||
from langchain_cohere import CohereRerank
|
||||
@@ -440,6 +459,7 @@ reranked_retriever = ContextualCompressionRetriever(
|
||||
## Prompt Engineering for RAG
|
||||
|
||||
### Contextual Prompt with Citations
|
||||
|
||||
```python
|
||||
rag_prompt = ChatPromptTemplate.from_template(
|
||||
"""Answer the question based on the context below. Include citations using [1], [2], etc.
|
||||
@@ -461,6 +481,7 @@ rag_prompt = ChatPromptTemplate.from_template(
|
||||
```
|
||||
|
||||
### Structured Output for RAG
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
@@ -20,12 +20,12 @@ Patterns for implementing efficient similarity search in production systems.
|
||||
|
||||
### 1. Distance Metrics
|
||||
|
||||
| Metric | Formula | Best For |
|
||||
|--------|---------|----------|
|
||||
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
|
||||
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
|
||||
| **Dot Product** | A·B | Magnitude matters |
|
||||
| **Manhattan (L1)** | Σ|a-b| | Sparse vectors |
|
||||
| Metric | Formula | Best For |
|
||||
| ------------------ | ------------------ | --------------------- | --- | -------------- |
|
||||
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
|
||||
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
|
||||
| **Dot Product** | A·B | Magnitude matters |
|
||||
| **Manhattan (L1)** | Σ | a-b | | Sparse vectors |
|
||||
|
||||
### 2. Index Types
|
||||
|
||||
@@ -538,6 +538,7 @@ class WeaviateVectorStore:
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Use appropriate index** - HNSW for most cases
|
||||
- **Tune parameters** - ef_search, nprobe for recall/speed
|
||||
- **Implement hybrid search** - Combine with keyword search
|
||||
@@ -545,6 +546,7 @@ class WeaviateVectorStore:
|
||||
- **Pre-filter when possible** - Reduce search space
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't skip evaluation** - Measure before optimizing
|
||||
- **Don't over-index** - Start with flat, scale up
|
||||
- **Don't ignore latency** - P99 matters for UX
|
||||
|
||||
@@ -31,11 +31,11 @@ Data Size Recommended Index
|
||||
|
||||
### 2. HNSW Parameters
|
||||
|
||||
| Parameter | Default | Effect |
|
||||
|-----------|---------|--------|
|
||||
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
|
||||
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
|
||||
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
|
||||
| Parameter | Default | Effect |
|
||||
| ------------------ | ------- | ---------------------------------------------------- |
|
||||
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
|
||||
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
|
||||
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
|
||||
|
||||
### 3. Quantization Types
|
||||
|
||||
@@ -502,6 +502,7 @@ def profile_index_build(
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Benchmark with real queries** - Synthetic may not represent production
|
||||
- **Monitor recall continuously** - Can degrade with data drift
|
||||
- **Start with defaults** - Tune only when needed
|
||||
@@ -509,6 +510,7 @@ def profile_index_build(
|
||||
- **Consider tiered storage** - Hot/cold data separation
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't over-optimize early** - Profile first
|
||||
- **Don't ignore build time** - Index updates have cost
|
||||
- **Don't forget reindexing** - Plan for maintenance
|
||||
|
||||
Reference in New Issue
Block a user