style: format all files with prettier

This commit is contained in:
Seth Hobson
2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions

View File

@@ -12,12 +12,14 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
## Features
### Core Capabilities
- **RAG Systems**: Production retrieval-augmented generation with hybrid search
- **Vector Search**: Pinecone, Qdrant, Weaviate, Milvus, pgvector optimization
- **Agent Architectures**: LangGraph-based agents with memory and tool use
- **Prompt Engineering**: Advanced prompting techniques with model-specific optimization
### Key Technologies
- LangChain 1.x / LangGraph for agent workflows
- Voyage AI, OpenAI, and open-source embedding models
- HNSW, IVF, and Product Quantization index strategies
@@ -25,31 +27,31 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
## Agents
| Agent | Description |
|-------|-------------|
| `ai-engineer` | Production-grade LLM applications, RAG systems, and agent architectures |
| `prompt-engineer` | Advanced prompting techniques, constitutional AI, and model optimization |
| Agent | Description |
| -------------------------- | -------------------------------------------------------------------------- |
| `ai-engineer` | Production-grade LLM applications, RAG systems, and agent architectures |
| `prompt-engineer` | Advanced prompting techniques, constitutional AI, and model optimization |
| `vector-database-engineer` | Vector search implementation, embedding strategies, and semantic retrieval |
## Skills
| Skill | Description |
|-------|-------------|
| `langchain-architecture` | LangGraph StateGraph patterns, memory, and tool integration |
| `rag-implementation` | RAG systems with hybrid search and reranking |
| `llm-evaluation` | Evaluation frameworks for LLM applications |
| `prompt-engineering-patterns` | Chain-of-thought, few-shot, and structured outputs |
| `embedding-strategies` | Embedding model selection and optimization |
| `similarity-search-patterns` | Vector similarity search implementation |
| `vector-index-tuning` | HNSW, IVF, and quantization optimization |
| `hybrid-search-implementation` | Vector + keyword search fusion |
| Skill | Description |
| ------------------------------ | ----------------------------------------------------------- |
| `langchain-architecture` | LangGraph StateGraph patterns, memory, and tool integration |
| `rag-implementation` | RAG systems with hybrid search and reranking |
| `llm-evaluation` | Evaluation frameworks for LLM applications |
| `prompt-engineering-patterns` | Chain-of-thought, few-shot, and structured outputs |
| `embedding-strategies` | Embedding model selection and optimization |
| `similarity-search-patterns` | Vector similarity search implementation |
| `vector-index-tuning` | HNSW, IVF, and quantization optimization |
| `hybrid-search-implementation` | Vector + keyword search fusion |
## Commands
| Command | Description |
|---------|-------------|
| `/llm-application-dev:langchain-agent` | Create LangGraph-based agent |
| `/llm-application-dev:ai-assistant` | Build AI assistant application |
| Command | Description |
| -------------------------------------- | ------------------------------- |
| `/llm-application-dev:langchain-agent` | Create LangGraph-based agent |
| `/llm-application-dev:ai-assistant` | Build AI assistant application |
| `/llm-application-dev:prompt-optimize` | Optimize prompts for production |
## Installation
@@ -69,6 +71,7 @@ Or copy to your project's `.claude-plugin/` directory.
## Changelog
### 2.0.0 (January 2026)
- **Breaking**: Migrated from LangChain 0.x to LangChain 1.x/LangGraph
- **Breaking**: Updated model references to Claude 4.5 and GPT-5.2
- Added Voyage AI as primary embedding recommendation for Claude apps
@@ -79,6 +82,7 @@ Or copy to your project's `.claude-plugin/` directory.
- Updated hybrid search with modern Pinecone client API
### 1.2.2
- Minor bug fixes and documentation updates
## License

View File

@@ -7,11 +7,13 @@ model: inherit
You are an AI engineer specializing in production-grade LLM applications, generative AI systems, and intelligent agent architectures.
## Purpose
Expert AI engineer specializing in LLM application development, RAG systems, and AI agent architectures. Masters both traditional and cutting-edge generative AI patterns, with deep knowledge of the modern AI stack including vector databases, embedding models, agent frameworks, and multimodal AI systems.
## Capabilities
### LLM Integration & Model Management
- OpenAI GPT-5.2/GPT-5.2-mini with function calling and structured outputs
- Anthropic Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5 with tool use and computer use
- Open-source models: Llama 3.3, Mixtral 8x22B, Qwen 2.5, DeepSeek-V3
@@ -21,6 +23,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Cost optimization through model selection and caching strategies
### Advanced RAG Systems
- Production RAG architectures with multi-stage retrieval pipelines
- Vector databases: Pinecone, Qdrant, Weaviate, Chroma, Milvus, pgvector
- Embedding models: Voyage AI voyage-3-large (recommended for Claude), OpenAI text-embedding-3-large/small, Cohere embed-v3, BGE-large
@@ -32,6 +35,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Advanced RAG patterns: GraphRAG, HyDE, RAG-Fusion, self-RAG
### Agent Frameworks & Orchestration
- LangGraph (LangChain 1.x) for complex agent workflows with StateGraph and durable execution
- LlamaIndex for data-centric AI applications and advanced retrieval
- CrewAI for multi-agent collaboration and specialized agent roles
@@ -42,6 +46,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Agent evaluation and monitoring with LangSmith
### Vector Search & Embeddings
- Embedding model selection and fine-tuning for domain-specific tasks
- Vector indexing strategies: HNSW, IVF, LSH for different scale requirements
- Similarity metrics: cosine, dot product, Euclidean for various use cases
@@ -50,6 +55,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Vector database optimization: indexing, sharding, and caching strategies
### Prompt Engineering & Optimization
- Advanced prompting techniques: chain-of-thought, tree-of-thoughts, self-consistency
- Few-shot and in-context learning optimization
- Prompt templates with dynamic variable injection and conditioning
@@ -59,6 +65,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Multi-modal prompting for vision and audio models
### Production AI Systems
- LLM serving with FastAPI, async processing, and load balancing
- Streaming responses and real-time inference optimization
- Caching strategies: semantic caching, response memoization, embedding caching
@@ -68,6 +75,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases
### Multimodal AI Integration
- Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding
- Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
- Document AI: OCR, table extraction, layout understanding with models like LayoutLM
@@ -75,6 +83,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Cross-modal embeddings and unified vector spaces
### AI Safety & Governance
- Content moderation with OpenAI Moderation API and custom classifiers
- Prompt injection detection and prevention strategies
- PII detection and redaction in AI workflows
@@ -83,6 +92,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Responsible AI practices and ethical considerations
### Data Processing & Pipeline Management
- Document processing: PDF extraction, web scraping, API integrations
- Data preprocessing: cleaning, normalization, deduplication
- Pipeline orchestration with Apache Airflow, Dagster, Prefect
@@ -91,6 +101,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- ETL/ELT processes for AI data preparation
### Integration & API Development
- RESTful API design for AI services with FastAPI, Flask
- GraphQL APIs for flexible AI data querying
- Webhook integration and event-driven architectures
@@ -99,6 +110,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- API security: OAuth, JWT, API key management
## Behavioral Traits
- Prioritizes production reliability and scalability over proof-of-concept implementations
- Implements comprehensive error handling and graceful degradation
- Focuses on cost optimization and efficient resource utilization
@@ -111,6 +123,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Balances cutting-edge techniques with proven, stable solutions
## Knowledge Base
- Latest LLM developments and model capabilities (GPT-5.2, Claude 4.5, Llama 3.3)
- Modern vector database architectures and optimization techniques
- Production AI system design patterns and best practices
@@ -123,6 +136,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- Prompt engineering and optimization methodologies
## Response Approach
1. **Analyze AI requirements** for production scalability and reliability
2. **Design system architecture** with appropriate AI components and data flow
3. **Implement production-ready code** with comprehensive error handling
@@ -133,6 +147,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
8. **Provide testing strategies** including adversarial and edge cases
## Example Interactions
- "Build a production RAG system for enterprise knowledge base with hybrid search"
- "Implement a multi-agent customer service system with escalation workflows"
- "Design a cost-optimized LLM inference pipeline with caching and load balancing"
@@ -140,4 +155,4 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
- "Build an AI agent that can browse the web and perform research tasks"
- "Implement semantic search with reranking for improved retrieval accuracy"
- "Design an A/B testing framework for comparing different LLM prompts"
- "Create a real-time AI content moderation system with custom classifiers"
- "Create a real-time AI content moderation system with custom classifiers"

View File

@@ -9,6 +9,7 @@ You are an expert prompt engineer specializing in crafting effective prompts for
IMPORTANT: When creating prompts, ALWAYS display the complete prompt text in a clearly marked section. Never describe a prompt without showing it. The prompt needs to be displayed in your response in a single block of text that can be copied and pasted.
## Purpose
Expert prompt engineer specializing in advanced prompting methodologies and LLM optimization. Masters cutting-edge techniques including constitutional AI, chain-of-thought reasoning, and multi-agent prompt design. Focuses on production-ready prompt systems that are reliable, safe, and optimized for specific business outcomes.
## Capabilities
@@ -16,6 +17,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
### Advanced Prompting Techniques
#### Chain-of-Thought & Reasoning
- Chain-of-thought (CoT) prompting for complex reasoning tasks
- Few-shot chain-of-thought with carefully crafted examples
- Zero-shot chain-of-thought with "Let's think step by step"
@@ -25,6 +27,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Program-aided language models (PAL) for computational tasks
#### Constitutional AI & Safety
- Constitutional AI principles for self-correction and alignment
- Critique and revise patterns for output improvement
- Safety prompting techniques to prevent harmful outputs
@@ -34,6 +37,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Red teaming prompts for adversarial testing
#### Meta-Prompting & Self-Improvement
- Meta-prompting for prompt optimization and generation
- Self-reflection and self-evaluation prompt patterns
- Auto-prompting for dynamic prompt generation
@@ -45,6 +49,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
### Model-Specific Optimization
#### OpenAI Models (GPT-5.2, GPT-5.2-mini)
- Function calling optimization and structured outputs
- JSON mode utilization for reliable data extraction
- System message design for consistent behavior
@@ -54,6 +59,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Image and multimodal prompt engineering
#### Anthropic Claude (Claude Opus 4.5, Sonnet 4.5, Haiku 4.5)
- Constitutional AI alignment with Claude's training
- Tool use optimization for complex workflows
- Computer use prompting for automation tasks
@@ -63,6 +69,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Safety considerations specific to Claude's capabilities
#### Open Source Models (Llama, Mixtral, Qwen)
- Model-specific prompt formatting and special tokens
- Fine-tuning prompt strategies for domain adaptation
- Instruction-following optimization for different architectures
@@ -74,6 +81,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
### Production Prompt Systems
#### Prompt Templates & Management
- Dynamic prompt templating with variable injection
- Conditional prompt logic based on context
- Multi-language prompt adaptation and localization
@@ -83,6 +91,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Rollback strategies for prompt deployments
#### RAG & Knowledge Integration
- Retrieval-augmented generation prompt optimization
- Context compression and relevance filtering
- Query understanding and expansion prompts
@@ -92,6 +101,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Knowledge graph integration prompts
#### Agent & Multi-Agent Prompting
- Agent role definition and persona creation
- Multi-agent collaboration and communication protocols
- Task decomposition and workflow orchestration
@@ -103,6 +113,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
### Specialized Applications
#### Business & Enterprise
- Customer service chatbot optimization
- Sales and marketing copy generation
- Legal document analysis and generation
@@ -112,6 +123,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Compliance and regulatory content generation
#### Creative & Content
- Creative writing and storytelling prompts
- Content marketing and SEO optimization
- Brand voice and tone consistency
@@ -121,6 +133,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Translation and localization prompts
#### Technical & Code
- Code generation and optimization prompts
- Technical documentation and API documentation
- Debugging and error analysis assistance
@@ -132,6 +145,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
### Evaluation & Testing
#### Performance Metrics
- Task-specific accuracy and quality metrics
- Response time and efficiency measurements
- Cost optimization and token usage analysis
@@ -141,6 +155,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Edge case and robustness assessment
#### Testing Methodologies
- Red team testing for prompt vulnerabilities
- Adversarial prompt testing and jailbreak attempts
- Cross-model performance comparison
@@ -152,6 +167,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
### Advanced Patterns & Architectures
#### Prompt Chaining & Workflows
- Sequential prompt chaining for complex tasks
- Parallel prompt execution and result aggregation
- Conditional branching based on intermediate outputs
@@ -161,6 +177,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Workflow optimization and performance tuning
#### Multimodal & Cross-Modal
- Vision-language model prompt optimization
- Image understanding and analysis prompts
- Document AI and OCR integration prompts
@@ -170,6 +187,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Multimodal creative and generative prompts
## Behavioral Traits
- Always displays complete prompt text, never just descriptions
- Focuses on production reliability and safety over experimental techniques
- Considers token efficiency and cost optimization in all prompt designs
@@ -182,6 +200,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Emphasizes reproducibility and version control for prompt systems
## Knowledge Base
- Latest research in prompt engineering and LLM optimization
- Model-specific capabilities and limitations across providers
- Production deployment patterns and best practices
@@ -194,6 +213,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
- Emerging trends in AI and prompt engineering
## Response Approach
1. **Understand the specific use case** and requirements for the prompt
2. **Analyze target model capabilities** and optimization opportunities
3. **Design prompt architecture** with appropriate techniques and patterns
@@ -208,27 +228,32 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
When creating any prompt, you MUST include:
### The Prompt
```
[Display the complete prompt text here - this is the most important part]
```
### Implementation Notes
- Key techniques used and why they were chosen
- Model-specific optimizations and considerations
- Expected behavior and output format
- Parameter recommendations (temperature, max tokens, etc.)
### Testing & Evaluation
- Suggested test cases and evaluation metrics
- Edge cases and potential failure modes
- A/B testing recommendations for optimization
### Usage Guidelines
- When and how to use this prompt effectively
- Customization options and variable parameters
- Integration considerations for production systems
## Example Interactions
- "Create a constitutional AI prompt for content moderation that self-corrects problematic outputs"
- "Design a chain-of-thought prompt for financial analysis that shows clear reasoning steps"
- "Build a multi-agent prompt system for customer service with escalation workflows"
@@ -248,4 +273,4 @@ Verify you have:
☐ Included testing and evaluation recommendations
☐ Considered safety and ethical implications
Remember: The best prompt is one that consistently produces the desired output with minimal post-processing. ALWAYS show the prompt, never just describe it.
Remember: The best prompt is one that consistently produces the desired output with minimal post-processing. ALWAYS show the prompt, never just describe it.

View File

@@ -15,6 +15,7 @@ Specializes in designing and implementing production-grade vector search systems
## Capabilities
### Vector Database Selection & Architecture
- **Pinecone**: Managed serverless, auto-scaling, metadata filtering
- **Qdrant**: High-performance, Rust-based, complex filtering
- **Weaviate**: GraphQL API, hybrid search, multi-tenancy
@@ -23,6 +24,7 @@ Specializes in designing and implementing production-grade vector search systems
- **Chroma**: Lightweight, local development, embeddings built-in
### Embedding Model Selection
- **Voyage AI**: voyage-3-large (recommended for Claude apps), voyage-code-3, voyage-finance-2, voyage-law-2
- **OpenAI**: text-embedding-3-large (3072 dims), text-embedding-3-small (1536 dims)
- **Open Source**: BGE-large-en-v1.5, E5-large-v2, multilingual-e5-large
@@ -30,6 +32,7 @@ Specializes in designing and implementing production-grade vector search systems
- Domain-specific fine-tuning strategies
### Index Configuration & Optimization
- **HNSW**: High recall, adjustable M and efConstruction parameters
- **IVF**: Large-scale datasets, nlist/nprobe tuning
- **Product Quantization (PQ)**: Memory optimization for billions of vectors
@@ -37,6 +40,7 @@ Specializes in designing and implementing production-grade vector search systems
- Index selection based on recall/latency/memory tradeoffs
### Hybrid Search Implementation
- Vector + BM25 keyword search fusion
- Reciprocal Rank Fusion (RRF) scoring
- Weighted combination strategies
@@ -44,6 +48,7 @@ Specializes in designing and implementing production-grade vector search systems
- Reranking with cross-encoders
### Document Processing Pipeline
- Chunking strategies: recursive, semantic, token-based
- Metadata extraction and enrichment
- Embedding batching and async processing
@@ -51,6 +56,7 @@ Specializes in designing and implementing production-grade vector search systems
- Document versioning and deduplication
### Production Operations
- Monitoring: latency percentiles, recall metrics
- Scaling: sharding, replication, auto-scaling
- Backup and disaster recovery
@@ -71,24 +77,28 @@ Specializes in designing and implementing production-grade vector search systems
## Best Practices
### Embedding Selection
- Use Voyage AI for Claude-based applications (officially recommended by Anthropic)
- Match embedding dimensions to use case (512-1024 for most, 3072 for maximum quality)
- Consider domain-specific models for code, legal, finance
- Test embedding quality on representative queries
### Chunking
- Chunk size 500-1000 tokens for most use cases
- 10-20% overlap to preserve context boundaries
- Use semantic chunking for complex documents
- Include metadata for filtering and debugging
### Index Tuning
- Start with HNSW for most use cases (good recall/latency balance)
- Use IVF+PQ for >10M vectors with memory constraints
- Benchmark recall@10 vs latency for your specific queries
- Monitor and re-tune as data grows
### Production
- Implement metadata filtering to reduce search space
- Cache frequent queries and embeddings
- Plan for index rebuilding (blue-green deployments)

File diff suppressed because it is too large Load Diff

View File

@@ -24,6 +24,7 @@ Build sophisticated AI agent system for: $ARGUMENTS
## Essential Architecture
### LangGraph State Management
```python
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import create_react_agent
@@ -35,6 +36,7 @@ class AgentState(TypedDict):
```
### Model & Embeddings
- **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
- **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
- **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)
@@ -84,6 +86,7 @@ base_retriever = vectorstore.as_retriever(
```
### Advanced RAG Patterns
- **HyDE**: Generate hypothetical documents for better retrieval
- **RAG Fusion**: Multiple query perspectives for comprehensive results
- **Reranking**: Use Cohere Rerank for relevance optimization
@@ -117,6 +120,7 @@ tool = StructuredTool.from_function(
## Production Deployment
### FastAPI Server with Streaming
```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
@@ -132,12 +136,14 @@ async def invoke_agent(request: AgentRequest):
```
### Monitoring & Observability
- **LangSmith**: Trace all agent executions
- **Prometheus**: Track metrics (requests, latency, errors)
- **Structured Logging**: Use `structlog` for consistent logs
- **Health Checks**: Validate LLM, tools, memory, and external services
### Optimization Strategies
- **Caching**: Redis for response caching with TTL
- **Connection Pooling**: Reuse vector DB connections
- **Load Balancing**: Multiple agent workers with round-robin routing
@@ -165,6 +171,7 @@ results = await evaluate(
## Key Patterns
### State Graph Pattern
```python
builder = StateGraph(MessagesState)
builder.add_node("node1", node1_func)
@@ -176,6 +183,7 @@ agent = builder.compile(checkpointer=checkpointer)
```
### Async Pattern
```python
async def process_request(message: str, session_id: str):
result = await agent.ainvoke(
@@ -186,6 +194,7 @@ async def process_request(message: str, session_id: str):
```
### Error Handling Pattern
```python
from tenacity import retry, stop_after_attempt, wait_exponential

View File

@@ -22,12 +22,14 @@ $ARGUMENTS
Evaluate the prompt across key dimensions:
**Assessment Framework**
- Clarity score (1-10) and ambiguity points
- Structure: logical flow and section boundaries
- Model alignment: capability utilization and token efficiency
- Performance: success rate, failure modes, edge case handling
**Decomposition**
- Core objective and constraints
- Output format requirements
- Explicit vs implicit expectations
@@ -36,6 +38,7 @@ Evaluate the prompt across key dimensions:
### 2. Apply Chain-of-Thought Enhancement
**Standard CoT Pattern**
```python
# Before: Simple instruction
prompt = "Analyze this customer feedback and determine sentiment"
@@ -56,11 +59,13 @@ Step 1 - Key emotional phrases:
```
**Zero-Shot CoT**
```python
enhanced = original + "\n\nLet's approach this step-by-step, breaking down the problem into smaller components and reasoning through each carefully."
```
**Tree-of-Thoughts**
```python
tot_prompt = """
Explore multiple solution paths:
@@ -79,6 +84,7 @@ Select best approach and implement.
### 3. Implement Few-Shot Learning
**Strategic Example Selection**
```python
few_shot = """
Example 1 (Simple case):
@@ -100,6 +106,7 @@ Now apply to: {actual_input}
### 4. Apply Constitutional AI Patterns
**Self-Critique Loop**
```python
constitutional = """
{initial_instruction}
@@ -119,7 +126,8 @@ Final Response: [Refined]
### 5. Model-Specific Optimization
**GPT-5.2**
```python
````python
gpt5_optimized = """
##CONTEXT##
{structured_context}
@@ -134,12 +142,13 @@ gpt5_optimized = """
##OUTPUT FORMAT##
```json
{"structured": "response"}
```
````
##EXAMPLES##
{few_shot_examples}
"""
```
````
**Claude 4.5/4**
```python
@@ -162,9 +171,10 @@ claude_optimized = """
{xml_structured_response}
</output_format>
"""
```
````
**Gemini Pro/Ultra**
```python
gemini_optimized = """
**System Context:** {background}
@@ -188,6 +198,7 @@ gemini_optimized = """
### 6. RAG Integration
**RAG-Optimized Prompt**
```python
rag_prompt = """
## Context Documents
@@ -210,6 +221,7 @@ Example: "Based on [Source 1], {answer}. [Source 3] corroborates: {detail}. No i
### 7. Evaluation Framework
**Testing Protocol**
```python
evaluation = """
## Test Cases (20 total)
@@ -227,6 +239,7 @@ evaluation = """
```
**LLM-as-Judge**
```python
judge_prompt = """
Evaluate AI response quality.
@@ -252,6 +265,7 @@ Recommendation: Accept/Revise/Reject
### 8. Production Deployment
**Prompt Versioning**
```python
class PromptVersion:
def __init__(self, base_prompt):
@@ -270,6 +284,7 @@ class PromptVersion:
```
**Error Handling**
```python
robust_prompt = """
{main_instruction}
@@ -291,15 +306,18 @@ Provide partial solution with boundaries and next steps if full task cannot be c
### Example 1: Customer Support
**Before**
```
Answer customer questions about our product.
```
**After**
```markdown
````markdown
You are a senior customer support specialist for TechCorp with 5+ years experience.
## Context
- Product: {product_name}
- Customer Tier: {tier}
- Issue Category: {category}
@@ -307,9 +325,11 @@ You are a senior customer support specialist for TechCorp with 5+ years experien
## Framework
### 1. Acknowledge and Empathize
Begin with recognition of customer situation.
### 2. Diagnostic Reasoning
<thinking>
1. Identify core issue
2. Consider common causes
@@ -318,23 +338,27 @@ Begin with recognition of customer situation.
</thinking>
### 3. Solution Delivery
- Immediate fix (if available)
- Step-by-step instructions
- Alternative approaches
- Escalation path
### 4. Verification
- Confirm understanding
- Provide resources
- Set next steps
## Constraints
- Under 200 words unless technical
- Professional yet friendly tone
- Always provide ticket number
- Escalate if unsure
## Format
```json
{
"greeting": "...",
@@ -343,14 +367,18 @@ Begin with recognition of customer situation.
"follow_up": "..."
}
```
````
```
### Example 2: Data Analysis
**Before**
```
Analyze this sales data and provide insights.
```
````
**After**
```python
@@ -404,16 +432,20 @@ recommendations:
immediate: []
short_term: []
long_term: []
```
````
"""
```
### Example 3: Code Generation
**Before**
```
Write a Python function to process user data.
```
````
**After**
```python
@@ -473,15 +505,17 @@ def process_user_data(raw_data: Dict[str, Any]) -> Union[ProcessedUser, Dict[str
name=sanitize_string(raw_data['name'], 100),
metadata={k: v for k, v in raw_data.items() if k not in required}
)
```
````
### Self-Review
✓ Input validation and sanitization
✓ Injection prevention
✓ Error handling
✓ Performance: O(n) complexity
"""
```
````
### Example 4: Meta-Prompt Generator
@@ -530,18 +564,20 @@ ELSE: APPLY hybrid
Overall: []/50
Recommendation: use_as_is | iterate | redesign
"""
```
````
## Output Format
Deliver comprehensive optimization report:
### Optimized Prompt
```markdown
[Complete production-ready prompt with all enhancements]
```
### Optimization Report
```yaml
analysis:
original_assessment:
@@ -583,6 +619,7 @@ next_steps:
```
### Usage Guidelines
1. **Implementation**: Use optimized prompt exactly
2. **Parameters**: Apply recommended settings
3. **Testing**: Run test cases before production

View File

@@ -20,18 +20,18 @@ Guide to selecting and optimizing embedding models for vector search application
### 1. Embedding Model Comparison (2026)
| Model | Dimensions | Max Tokens | Best For |
|-------|------------|------------|----------|
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
| **voyage-code-3** | 1024 | 32000 | Code search |
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
| **voyage-law-2** | 1024 | 32000 | Legal documents |
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
| Model | Dimensions | Max Tokens | Best For |
| -------------------------- | ---------- | ---------- | ----------------------------------- |
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
| **voyage-code-3** | 1024 | 32000 | Code search |
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
| **voyage-law-2** | 1024 | 32000 | Legal documents |
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
### 2. Embedding Pipeline
@@ -583,6 +583,7 @@ def compare_embedding_models(
## Best Practices
### Do's
- **Match model to use case**: Code vs prose vs multilingual
- **Chunk thoughtfully**: Preserve semantic boundaries
- **Normalize embeddings**: For cosine similarity search
@@ -591,6 +592,7 @@ def compare_embedding_models(
- **Use Voyage AI for Claude apps**: Recommended by Anthropic
### Don'ts
- **Don't ignore token limits**: Truncation loses information
- **Don't mix embedding models**: Incompatible vector spaces
- **Don't skip preprocessing**: Garbage in, garbage out

View File

@@ -27,12 +27,12 @@ Query → ┬─► Vector Search ──► Candidates ─┐
### 2. Fusion Methods
| Method | Description | Best For |
|--------|-------------|----------|
| **RRF** | Reciprocal Rank Fusion | General purpose |
| **Linear** | Weighted sum of scores | Tunable balance |
| Method | Description | Best For |
| ----------------- | ------------------------ | --------------- |
| **RRF** | Reciprocal Rank Fusion | General purpose |
| **Linear** | Weighted sum of scores | Tunable balance |
| **Cross-encoder** | Rerank with neural model | Highest quality |
| **Cascade** | Filter then rerank | Efficiency |
| **Cascade** | Filter then rerank | Efficiency |
## Templates
@@ -549,6 +549,7 @@ class HybridRAGPipeline:
## Best Practices
### Do's
- **Tune weights empirically** - Test on your data
- **Use RRF for simplicity** - Works well without tuning
- **Add reranking** - Significant quality improvement
@@ -556,6 +557,7 @@ class HybridRAGPipeline:
- **A/B test** - Measure real user impact
### Don'ts
- **Don't assume one size fits all** - Different queries need different weights
- **Don't skip keyword search** - Handles exact matches better
- **Don't over-fetch** - Balance recall vs latency

View File

@@ -33,9 +33,11 @@ langchain-pinecone # Pinecone vector store
## Core Concepts
### 1. LangGraph Agents
LangGraph is the standard for building agents in 2026. It provides:
**Key Features:**
- **StateGraph**: Explicit state management with typed state
- **Durable Execution**: Agents persist through failures
- **Human-in-the-Loop**: Inspect and modify state at any point
@@ -43,12 +45,14 @@ LangGraph is the standard for building agents in 2026. It provides:
- **Checkpointing**: Save and resume agent state
**Agent Patterns:**
- **ReAct**: Reasoning + Acting with `create_react_agent`
- **Plan-and-Execute**: Separate planning and execution nodes
- **Multi-Agent**: Supervisor routing between specialized agents
- **Tool-Calling**: Structured tool invocation with Pydantic schemas
### 2. State Management
LangGraph uses TypedDict for explicit state:
```python
@@ -69,6 +73,7 @@ class CustomState(TypedDict):
```
### 3. Memory Systems
Modern memory implementations:
- **ConversationBufferMemory**: Stores all messages (short conversations)
@@ -78,15 +83,18 @@ Modern memory implementations:
- **LangGraph Checkpointers**: Persistent state across sessions
### 4. Document Processing
Loading, transforming, and storing documents:
**Components:**
- **Document Loaders**: Load from various sources
- **Text Splitters**: Chunk documents intelligently
- **Vector Stores**: Store and retrieve embeddings
- **Retrievers**: Fetch relevant documents
### 5. Callbacks & Tracing
LangSmith is the standard for observability:
- Request/response logging

View File

@@ -20,9 +20,11 @@ Master comprehensive evaluation strategies for LLM applications, from automated
## Core Evaluation Types
### 1. Automated Metrics
Fast, repeatable, scalable evaluation using computed scores.
**Text Generation:**
- **BLEU**: N-gram overlap (translation)
- **ROUGE**: Recall-oriented (summarization)
- **METEOR**: Semantic similarity
@@ -30,21 +32,25 @@ Fast, repeatable, scalable evaluation using computed scores.
- **Perplexity**: Language model confidence
**Classification:**
- **Accuracy**: Percentage correct
- **Precision/Recall/F1**: Class-specific performance
- **Confusion Matrix**: Error patterns
- **AUC-ROC**: Ranking quality
**Retrieval (RAG):**
- **MRR**: Mean Reciprocal Rank
- **NDCG**: Normalized Discounted Cumulative Gain
- **Precision@K**: Relevant in top K
- **Recall@K**: Coverage in top K
### 2. Human Evaluation
Manual assessment for quality aspects difficult to automate.
**Dimensions:**
- **Accuracy**: Factual correctness
- **Coherence**: Logical flow
- **Relevance**: Answers the question
@@ -53,9 +59,11 @@ Manual assessment for quality aspects difficult to automate.
- **Helpfulness**: Useful to the user
### 3. LLM-as-Judge
Use stronger LLMs to evaluate weaker model outputs.
**Approaches:**
- **Pointwise**: Score individual responses
- **Pairwise**: Compare two responses
- **Reference-based**: Compare to gold standard
@@ -134,6 +142,7 @@ results = await suite.evaluate(model=your_model, test_cases=test_cases)
## Automated Metrics Implementation
### BLEU Score
```python
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
@@ -149,6 +158,7 @@ def calculate_bleu(reference: str, hypothesis: str, **kwargs) -> float:
```
### ROUGE Score
```python
from rouge_score import rouge_scorer
@@ -168,6 +178,7 @@ def calculate_rouge(reference: str, hypothesis: str, **kwargs) -> dict:
```
### BERTScore
```python
from bert_score import score
@@ -192,6 +203,7 @@ def calculate_bertscore(
```
### Custom Metrics
```python
def calculate_groundedness(response: str, context: str, **kwargs) -> float:
"""Check if response is grounded in provided context."""
@@ -232,6 +244,7 @@ def calculate_factuality(claim: str, sources: list[str], **kwargs) -> float:
## LLM-as-Judge Patterns
### Single Output Evaluation
```python
from anthropic import Anthropic
from pydantic import BaseModel, Field
@@ -280,6 +293,7 @@ Provide ratings in JSON format:
```
### Pairwise Comparison
```python
from pydantic import BaseModel, Field
from typing import Literal
@@ -324,6 +338,7 @@ Answer with JSON:
```
### Reference-Based Evaluation
```python
class ReferenceEvaluation(BaseModel):
semantic_similarity: float = Field(ge=0, le=1)
@@ -371,6 +386,7 @@ Respond in JSON:
## Human Evaluation Frameworks
### Annotation Guidelines
```python
from dataclasses import dataclass, field
from typing import Optional
@@ -412,6 +428,7 @@ class AnnotationTask:
```
### Inter-Rater Agreement
```python
from sklearn.metrics import cohen_kappa_score
@@ -444,6 +461,7 @@ def calculate_agreement(
## A/B Testing
### Statistical Testing Framework
```python
from scipy import stats
import numpy as np
@@ -504,6 +522,7 @@ class ABTest:
## Regression Testing
### Regression Detection
```python
from dataclasses import dataclass
@@ -595,6 +614,7 @@ print(f"Mean score: {experiment_results.aggregate_metrics['qa']['mean']}")
## Benchmarking
### Running Benchmarks
```python
from dataclasses import dataclass
import numpy as np

View File

@@ -21,6 +21,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
## Core Capabilities
### 1. Few-Shot Learning
- Example selection strategies (semantic similarity, diversity sampling)
- Balancing example count with context window constraints
- Constructing effective demonstrations with input-output pairs
@@ -28,6 +29,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Handling edge cases through strategic example selection
### 2. Chain-of-Thought Prompting
- Step-by-step reasoning elicitation
- Zero-shot CoT with "Let's think step by step"
- Few-shot CoT with reasoning traces
@@ -35,12 +37,14 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Verification and validation steps
### 3. Structured Outputs
- JSON mode for reliable parsing
- Pydantic schema enforcement
- Type-safe response handling
- Error handling for malformed outputs
### 4. Prompt Optimization
- Iterative refinement workflows
- A/B testing prompt variations
- Measuring prompt performance metrics (accuracy, consistency, latency)
@@ -48,6 +52,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Handling edge cases and failure modes
### 5. Template Systems
- Variable interpolation and formatting
- Conditional prompt sections
- Multi-turn conversation templates
@@ -55,6 +60,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Modular prompt components
### 6. System Prompt Design
- Setting model behavior and constraints
- Defining output formats and structure
- Establishing role and expertise
@@ -395,6 +401,7 @@ Response:"""
## Performance Optimization
### Token Efficiency
```python
# Before: Verbose prompt (150+ tokens)
verbose_prompt = """
@@ -457,6 +464,7 @@ response = client.messages.create(
## Success Metrics
Track these KPIs for your prompts:
- **Accuracy**: Correctness of outputs
- **Consistency**: Reproducibility across similar inputs
- **Latency**: Response time (P50, P95, P99)

View File

@@ -3,6 +3,7 @@
## Classification Templates
### Sentiment Analysis
```
Classify the sentiment of the following text as Positive, Negative, or Neutral.
@@ -12,6 +13,7 @@ Sentiment:
```
### Intent Detection
```
Determine the user's intent from the following message.
@@ -23,6 +25,7 @@ Intent:
```
### Topic Classification
```
Classify the following article into one of these categories: {categories}
@@ -35,6 +38,7 @@ Category:
## Extraction Templates
### Named Entity Recognition
```
Extract all named entities from the text and categorize them.
@@ -50,6 +54,7 @@ Entities (JSON format):
```
### Structured Data Extraction
```
Extract structured information from the job posting.
@@ -70,6 +75,7 @@ Extracted Information (JSON):
## Generation Templates
### Email Generation
```
Write a professional {email_type} email.
@@ -84,6 +90,7 @@ Body:
```
### Code Generation
```
Generate {language} code for the following task:
@@ -101,6 +108,7 @@ Code:
```
### Creative Writing
```
Write a {length}-word {style} story about {topic}.
@@ -115,6 +123,7 @@ Story:
## Transformation Templates
### Summarization
```
Summarize the following text in {num_sentences} sentences.
@@ -125,6 +134,7 @@ Summary:
```
### Translation with Context
```
Translate the following {source_lang} text to {target_lang}.
@@ -137,6 +147,7 @@ Translation:
```
### Format Conversion
```
Convert the following {source_format} to {target_format}.
@@ -149,6 +160,7 @@ Output ({target_format}):
## Analysis Templates
### Code Review
```
Review the following code for:
1. Bugs and errors
@@ -163,6 +175,7 @@ Review:
```
### SWOT Analysis
```
Conduct a SWOT analysis for: {subject}
@@ -185,6 +198,7 @@ Threats:
## Question Answering Templates
### RAG Template
```
Answer the question based on the provided context. If the context doesn't contain enough information, say so.
@@ -197,6 +211,7 @@ Answer:
```
### Multi-Turn Q&A
```
Previous conversation:
{conversation_history}
@@ -209,6 +224,7 @@ Answer (continue naturally from conversation):
## Specialized Templates
### SQL Query Generation
```
Generate a SQL query for the following request.
@@ -221,6 +237,7 @@ SQL Query:
```
### Regex Pattern Creation
```
Create a regex pattern to match: {requirement}
@@ -234,6 +251,7 @@ Regex pattern:
```
### API Documentation
```
Generate API documentation for this function:

View File

@@ -7,6 +7,7 @@ Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from LLMs, drama
## Core Techniques
### Zero-Shot CoT
Add a simple trigger phrase to elicit reasoning:
```python
@@ -29,6 +30,7 @@ prompt = zero_shot_cot(query)
```
### Few-Shot CoT
Provide examples with explicit reasoning chains:
```python
@@ -53,6 +55,7 @@ A: Let's think step by step:"""
```
### Self-Consistency
Generate multiple reasoning paths and take the majority vote:
```python
@@ -85,6 +88,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
## Advanced Patterns
### Least-to-Most Prompting
Break complex problems into simpler subproblems:
```python
@@ -125,6 +129,7 @@ Final Answer:"""
```
### Tree-of-Thought (ToT)
Explore multiple reasoning branches:
```python
@@ -176,6 +181,7 @@ Score:"""
```
### Verification Step
Add explicit verification to catch errors:
```python
@@ -220,6 +226,7 @@ Corrected solution:"""
## Domain-Specific CoT
### Math Problems
```python
math_cot_template = """
Problem: {problem}
@@ -248,6 +255,7 @@ Answer: {final_answer}
```
### Code Debugging
```python
debug_cot_template = """
Code with error:
@@ -278,6 +286,7 @@ Fixed code:
```
### Logical Reasoning
```python
logic_cot_template = """
Premises:
@@ -305,6 +314,7 @@ Answer: {final_answer}
## Performance Optimization
### Caching Reasoning Patterns
```python
class ReasoningCache:
def __init__(self):
@@ -328,6 +338,7 @@ class ReasoningCache:
```
### Adaptive Reasoning Depth
```python
def adaptive_cot(problem, initial_depth=3):
depth = initial_depth
@@ -378,6 +389,7 @@ def evaluate_cot_quality(reasoning_chain):
## When to Use CoT
**Use CoT for:**
- Math and arithmetic problems
- Logical reasoning tasks
- Multi-step planning
@@ -385,6 +397,7 @@ def evaluate_cot_quality(reasoning_chain):
- Complex decision making
**Skip CoT for:**
- Simple factual queries
- Direct lookups
- Creative writing

View File

@@ -7,6 +7,7 @@ Few-shot learning enables LLMs to perform tasks by providing a small number of e
## Example Selection Strategies
### 1. Semantic Similarity
Select examples most similar to the input query using embedding-based retrieval.
```python
@@ -29,6 +30,7 @@ class SemanticExampleSelector:
**Best For**: Question answering, text classification, extraction tasks
### 2. Diversity Sampling
Maximize coverage of different patterns and edge cases.
```python
@@ -58,6 +60,7 @@ class DiversityExampleSelector:
**Best For**: Demonstrating task variability, edge case handling
### 3. Difficulty-Based Selection
Gradually increase example complexity to scaffold learning.
```python
@@ -75,6 +78,7 @@ class ProgressiveExampleSelector:
**Best For**: Complex reasoning tasks, code generation
### 4. Error-Based Selection
Include examples that address common failure modes.
```python
@@ -98,6 +102,7 @@ class ErrorGuidedSelector:
## Example Construction Best Practices
### Format Consistency
All examples should follow identical formatting:
```python
@@ -121,6 +126,7 @@ examples = [
```
### Input-Output Alignment
Ensure examples demonstrate the exact task you want the model to perform:
```python
@@ -138,6 +144,7 @@ example = {
```
### Complexity Balance
Include examples spanning the expected difficulty range:
```python
@@ -156,6 +163,7 @@ examples = [
## Context Window Management
### Token Budget Allocation
Typical distribution for a 4K context window:
```
@@ -166,6 +174,7 @@ Response: 1500 tokens (38%)
```
### Dynamic Example Truncation
```python
class TokenAwareSelector:
def __init__(self, examples, tokenizer, max_tokens=1500):
@@ -197,6 +206,7 @@ class TokenAwareSelector:
## Edge Case Handling
### Include Boundary Examples
```python
edge_case_examples = [
# Empty input
@@ -216,6 +226,7 @@ edge_case_examples = [
## Few-Shot Prompt Templates
### Classification Template
```python
def build_classification_prompt(examples, query, labels):
prompt = f"Classify the text into one of these categories: {', '.join(labels)}\n\n"
@@ -228,6 +239,7 @@ def build_classification_prompt(examples, query, labels):
```
### Extraction Template
```python
def build_extraction_prompt(examples, query):
prompt = "Extract structured information from the text.\n\n"
@@ -240,6 +252,7 @@ def build_extraction_prompt(examples, query):
```
### Transformation Template
```python
def build_transformation_prompt(examples, query):
prompt = "Transform the input according to the pattern shown in examples.\n\n"
@@ -254,6 +267,7 @@ def build_transformation_prompt(examples, query):
## Evaluation and Optimization
### Example Quality Metrics
```python
def evaluate_example_quality(example, validation_set):
metrics = {
@@ -266,6 +280,7 @@ def evaluate_example_quality(example, validation_set):
```
### A/B Testing Example Sets
```python
class ExampleSetTester:
def __init__(self, llm_client):
@@ -295,6 +310,7 @@ class ExampleSetTester:
## Advanced Techniques
### Meta-Learning (Learning to Select)
Train a small model to predict which examples will be most effective:
```python
@@ -334,6 +350,7 @@ class LearnedExampleSelector:
```
### Adaptive Example Count
Dynamically adjust the number of examples based on task difficulty:
```python

View File

@@ -3,6 +3,7 @@
## Systematic Refinement Process
### 1. Baseline Establishment
```python
def establish_baseline(prompt, test_cases):
results = {
@@ -26,6 +27,7 @@ def establish_baseline(prompt, test_cases):
```
### 2. Iterative Refinement Workflow
```
Initial Prompt → Test → Analyze Failures → Refine → Test → Repeat
```
@@ -64,6 +66,7 @@ class PromptOptimizer:
```
### 3. A/B Testing Framework
```python
class PromptABTest:
def __init__(self, variant_a, variant_b):
@@ -116,6 +119,7 @@ class PromptABTest:
## Optimization Strategies
### Token Reduction
```python
def optimize_for_tokens(prompt):
optimizations = [
@@ -144,6 +148,7 @@ def optimize_for_tokens(prompt):
```
### Latency Reduction
```python
def optimize_for_latency(prompt):
strategies = {
@@ -167,6 +172,7 @@ def optimize_for_latency(prompt):
```
### Accuracy Improvement
```python
def improve_accuracy(prompt, failure_cases):
improvements = []
@@ -194,6 +200,7 @@ def improve_accuracy(prompt, failure_cases):
## Performance Metrics
### Core Metrics
```python
class PromptMetrics:
@staticmethod
@@ -230,6 +237,7 @@ class PromptMetrics:
```
### Automated Evaluation
```python
def evaluate_prompt_comprehensively(prompt, test_suite):
results = {
@@ -274,6 +282,7 @@ def evaluate_prompt_comprehensively(prompt, test_suite):
## Failure Analysis
### Categorizing Failures
```python
class FailureAnalyzer:
def categorize_failures(self, test_results):
@@ -326,6 +335,7 @@ class FailureAnalyzer:
## Versioning and Rollback
### Prompt Version Control
```python
class PromptVersionControl:
def __init__(self, storage_path):
@@ -381,24 +391,28 @@ class PromptVersionControl:
## Common Optimization Patterns
### Pattern 1: Add Structure
```
Before: "Analyze this text"
After: "Analyze this text for:\n1. Main topic\n2. Key arguments\n3. Conclusion"
```
### Pattern 2: Add Examples
```
Before: "Extract entities"
After: "Extract entities\\n\\nExample:\\nText: Apple released iPhone\\nEntities: {company: Apple, product: iPhone}"
```
### Pattern 3: Add Constraints
```
Before: "Summarize this"
After: "Summarize in exactly 3 bullet points, 15 words each"
```
### Pattern 4: Add Verification
```
Before: "Calculate..."
After: "Calculate... Then verify your calculation is correct before responding."

View File

@@ -3,6 +3,7 @@
## Template Architecture
### Basic Template Structure
```python
class PromptTemplate:
def __init__(self, template_string, variables=None):
@@ -30,6 +31,7 @@ prompt = template.render(
```
### Conditional Templates
```python
class ConditionalTemplate(PromptTemplate):
def render(self, **kwargs):
@@ -84,6 +86,7 @@ Reference examples:
```
### Modular Template Composition
```python
class ModularTemplate:
def __init__(self):
@@ -133,6 +136,7 @@ advanced_prompt = builder.render(
## Common Template Patterns
### Classification Template
```python
CLASSIFICATION_TEMPLATE = """
Classify the following {content_type} into one of these categories: {categories}
@@ -153,6 +157,7 @@ Category:"""
```
### Extraction Template
```python
EXTRACTION_TEMPLATE = """
Extract structured information from the {content_type}.
@@ -171,6 +176,7 @@ Extracted information (JSON):"""
```
### Generation Template
```python
GENERATION_TEMPLATE = """
Generate {output_type} based on the following {input_type}.
@@ -198,6 +204,7 @@ Examples:
```
### Transformation Template
```python
TRANSFORMATION_TEMPLATE = """
Transform the input {source_format} to {target_format}.
@@ -219,6 +226,7 @@ Output {target_format}:"""
## Advanced Features
### Template Inheritance
```python
class TemplateRegistry:
def __init__(self):
@@ -251,6 +259,7 @@ registry.register('sentiment_analysis', {
```
### Variable Validation
```python
class ValidatedTemplate:
def __init__(self, template, schema):
@@ -294,6 +303,7 @@ template = ValidatedTemplate(
```
### Template Caching
```python
class CachedTemplate:
def __init__(self, template):
@@ -323,6 +333,7 @@ class CachedTemplate:
## Multi-Turn Templates
### Conversation Template
```python
class ConversationTemplate:
def __init__(self, system_prompt):
@@ -349,6 +360,7 @@ class ConversationTemplate:
```
### State-Based Templates
```python
class StatefulTemplate:
def __init__(self):
@@ -406,6 +418,7 @@ Here's the result: {result}
## Template Libraries
### Question Answering
```python
QA_TEMPLATES = {
'factual': """Answer the question based on the context.
@@ -432,6 +445,7 @@ Assistant:"""
```
### Content Generation
```python
GENERATION_TEMPLATES = {
'blog_post': """Write a blog post about {topic}.

View File

@@ -11,6 +11,7 @@ System prompts set the foundation for LLM behavior. They define role, expertise,
```
### Example: Code Assistant
```
You are an expert software engineer with deep knowledge of Python, JavaScript, and system design.
@@ -36,6 +37,7 @@ Output format:
## Pattern Library
### 1. Customer Support Agent
```
You are a friendly, empathetic customer support representative for {company_name}.
@@ -59,6 +61,7 @@ Constraints:
```
### 2. Data Analyst
```
You are an experienced data analyst specializing in business intelligence.
@@ -85,6 +88,7 @@ Output:
```
### 3. Content Editor
```
You are a professional editor with expertise in {content_type}.
@@ -112,6 +116,7 @@ Format your feedback as:
## Advanced Techniques
### Dynamic Role Adaptation
```python
def build_adaptive_system_prompt(task_type, difficulty):
base = "You are an expert assistant"
@@ -136,6 +141,7 @@ Expertise level: {difficulty}
```
### Constraint Specification
```
Hard constraints (MUST follow):
- Never generate harmful, biased, or illegal content

View File

@@ -20,9 +20,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
## Core Components
### 1. Vector Databases
**Purpose**: Store and retrieve document embeddings efficiently
**Options:**
- **Pinecone**: Managed, scalable, serverless
- **Weaviate**: Open-source, hybrid search, GraphQL
- **Milvus**: High performance, on-premise
@@ -31,6 +33,7 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
- **pgvector**: PostgreSQL extension, SQL integration
### 2. Embeddings
**Purpose**: Convert text to numerical vectors for similarity search
**Models (2026):**
@@ -44,7 +47,9 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
| **multilingual-e5-large** | 1024 | Multi-language support |
### 3. Retrieval Strategies
**Approaches:**
- **Dense Retrieval**: Semantic similarity via embeddings
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
- **Hybrid Search**: Combine dense + sparse with weighted fusion
@@ -52,9 +57,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
- **HyDE**: Generate hypothetical documents for better retrieval
### 4. Reranking
**Purpose**: Improve retrieval quality by reordering results
**Methods:**
- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
- **Cohere Rerank**: API-based reranking
- **Maximal Marginal Relevance (MMR)**: Diversity + relevance
@@ -255,6 +262,7 @@ hyde_rag = builder.compile()
## Document Chunking Strategies
### Recursive Character Text Splitter
```python
from langchain_text_splitters import RecursiveCharacterTextSplitter
@@ -269,6 +277,7 @@ chunks = splitter.split_documents(documents)
```
### Token-Based Splitting
```python
from langchain_text_splitters import TokenTextSplitter
@@ -280,6 +289,7 @@ splitter = TokenTextSplitter(
```
### Semantic Chunking
```python
from langchain_experimental.text_splitter import SemanticChunker
@@ -291,6 +301,7 @@ splitter = SemanticChunker(
```
### Markdown Header Splitter
```python
from langchain_text_splitters import MarkdownHeaderTextSplitter
@@ -309,6 +320,7 @@ splitter = MarkdownHeaderTextSplitter(
## Vector Store Configurations
### Pinecone (Serverless)
```python
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore
@@ -331,6 +343,7 @@ vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
```
### Weaviate
```python
import weaviate
from langchain_weaviate import WeaviateVectorStore
@@ -346,6 +359,7 @@ vectorstore = WeaviateVectorStore(
```
### Chroma (Local Development)
```python
from langchain_chroma import Chroma
@@ -357,6 +371,7 @@ vectorstore = Chroma(
```
### pgvector (PostgreSQL)
```python
from langchain_postgres.vectorstores import PGVector
@@ -372,6 +387,7 @@ vectorstore = PGVector(
## Retrieval Optimization
### 1. Metadata Filtering
```python
from langchain_core.documents import Document
@@ -394,6 +410,7 @@ results = await vectorstore.asimilarity_search(
```
### 2. Maximal Marginal Relevance (MMR)
```python
# Balance relevance with diversity
results = await vectorstore.amax_marginal_relevance_search(
@@ -405,6 +422,7 @@ results = await vectorstore.amax_marginal_relevance_search(
```
### 3. Reranking with Cross-Encoder
```python
from sentence_transformers import CrossEncoder
@@ -424,6 +442,7 @@ async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
```
### 4. Cohere Rerank
```python
from langchain.retrievers import CohereRerank
from langchain_cohere import CohereRerank
@@ -440,6 +459,7 @@ reranked_retriever = ContextualCompressionRetriever(
## Prompt Engineering for RAG
### Contextual Prompt with Citations
```python
rag_prompt = ChatPromptTemplate.from_template(
"""Answer the question based on the context below. Include citations using [1], [2], etc.
@@ -461,6 +481,7 @@ rag_prompt = ChatPromptTemplate.from_template(
```
### Structured Output for RAG
```python
from pydantic import BaseModel, Field

View File

@@ -20,12 +20,12 @@ Patterns for implementing efficient similarity search in production systems.
### 1. Distance Metrics
| Metric | Formula | Best For |
|--------|---------|----------|
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
| **Dot Product** | A·B | Magnitude matters |
| **Manhattan (L1)** | Σ|a-b| | Sparse vectors |
| Metric | Formula | Best For |
| ------------------ | ------------------ | --------------------- | --- | -------------- |
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
| **Dot Product** | A·B | Magnitude matters |
| **Manhattan (L1)** | Σ | a-b | | Sparse vectors |
### 2. Index Types
@@ -538,6 +538,7 @@ class WeaviateVectorStore:
## Best Practices
### Do's
- **Use appropriate index** - HNSW for most cases
- **Tune parameters** - ef_search, nprobe for recall/speed
- **Implement hybrid search** - Combine with keyword search
@@ -545,6 +546,7 @@ class WeaviateVectorStore:
- **Pre-filter when possible** - Reduce search space
### Don'ts
- **Don't skip evaluation** - Measure before optimizing
- **Don't over-index** - Start with flat, scale up
- **Don't ignore latency** - P99 matters for UX

View File

@@ -31,11 +31,11 @@ Data Size Recommended Index
### 2. HNSW Parameters
| Parameter | Default | Effect |
|-----------|---------|--------|
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
| Parameter | Default | Effect |
| ------------------ | ------- | ---------------------------------------------------- |
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
### 3. Quantization Types
@@ -502,6 +502,7 @@ def profile_index_build(
## Best Practices
### Do's
- **Benchmark with real queries** - Synthetic may not represent production
- **Monitor recall continuously** - Can degrade with data drift
- **Start with defaults** - Tune only when needed
@@ -509,6 +510,7 @@ def profile_index_build(
- **Consider tiered storage** - Hot/cold data separation
### Don'ts
- **Don't over-optimize early** - Profile first
- **Don't ignore build time** - Index updates have cost
- **Don't forget reindexing** - Plan for maintenance