mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 09:37:15 +00:00
style: format all files with prettier
This commit is contained in:
@@ -20,18 +20,18 @@ Guide to selecting and optimizing embedding models for vector search application
|
||||
|
||||
### 1. Embedding Model Comparison (2026)
|
||||
|
||||
| Model | Dimensions | Max Tokens | Best For |
|
||||
|-------|------------|------------|----------|
|
||||
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
|
||||
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
|
||||
| **voyage-code-3** | 1024 | 32000 | Code search |
|
||||
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
|
||||
| **voyage-law-2** | 1024 | 32000 | Legal documents |
|
||||
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
|
||||
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
|
||||
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
|
||||
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
|
||||
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
|
||||
| Model | Dimensions | Max Tokens | Best For |
|
||||
| -------------------------- | ---------- | ---------- | ----------------------------------- |
|
||||
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
|
||||
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
|
||||
| **voyage-code-3** | 1024 | 32000 | Code search |
|
||||
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
|
||||
| **voyage-law-2** | 1024 | 32000 | Legal documents |
|
||||
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
|
||||
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
|
||||
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
|
||||
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
|
||||
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
|
||||
|
||||
### 2. Embedding Pipeline
|
||||
|
||||
@@ -583,6 +583,7 @@ def compare_embedding_models(
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Match model to use case**: Code vs prose vs multilingual
|
||||
- **Chunk thoughtfully**: Preserve semantic boundaries
|
||||
- **Normalize embeddings**: For cosine similarity search
|
||||
@@ -591,6 +592,7 @@ def compare_embedding_models(
|
||||
- **Use Voyage AI for Claude apps**: Recommended by Anthropic
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't ignore token limits**: Truncation loses information
|
||||
- **Don't mix embedding models**: Incompatible vector spaces
|
||||
- **Don't skip preprocessing**: Garbage in, garbage out
|
||||
|
||||
@@ -27,12 +27,12 @@ Query → ┬─► Vector Search ──► Candidates ─┐
|
||||
|
||||
### 2. Fusion Methods
|
||||
|
||||
| Method | Description | Best For |
|
||||
|--------|-------------|----------|
|
||||
| **RRF** | Reciprocal Rank Fusion | General purpose |
|
||||
| **Linear** | Weighted sum of scores | Tunable balance |
|
||||
| Method | Description | Best For |
|
||||
| ----------------- | ------------------------ | --------------- |
|
||||
| **RRF** | Reciprocal Rank Fusion | General purpose |
|
||||
| **Linear** | Weighted sum of scores | Tunable balance |
|
||||
| **Cross-encoder** | Rerank with neural model | Highest quality |
|
||||
| **Cascade** | Filter then rerank | Efficiency |
|
||||
| **Cascade** | Filter then rerank | Efficiency |
|
||||
|
||||
## Templates
|
||||
|
||||
@@ -549,6 +549,7 @@ class HybridRAGPipeline:
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Tune weights empirically** - Test on your data
|
||||
- **Use RRF for simplicity** - Works well without tuning
|
||||
- **Add reranking** - Significant quality improvement
|
||||
@@ -556,6 +557,7 @@ class HybridRAGPipeline:
|
||||
- **A/B test** - Measure real user impact
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't assume one size fits all** - Different queries need different weights
|
||||
- **Don't skip keyword search** - Handles exact matches better
|
||||
- **Don't over-fetch** - Balance recall vs latency
|
||||
|
||||
@@ -33,9 +33,11 @@ langchain-pinecone # Pinecone vector store
|
||||
## Core Concepts
|
||||
|
||||
### 1. LangGraph Agents
|
||||
|
||||
LangGraph is the standard for building agents in 2026. It provides:
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **StateGraph**: Explicit state management with typed state
|
||||
- **Durable Execution**: Agents persist through failures
|
||||
- **Human-in-the-Loop**: Inspect and modify state at any point
|
||||
@@ -43,12 +45,14 @@ LangGraph is the standard for building agents in 2026. It provides:
|
||||
- **Checkpointing**: Save and resume agent state
|
||||
|
||||
**Agent Patterns:**
|
||||
|
||||
- **ReAct**: Reasoning + Acting with `create_react_agent`
|
||||
- **Plan-and-Execute**: Separate planning and execution nodes
|
||||
- **Multi-Agent**: Supervisor routing between specialized agents
|
||||
- **Tool-Calling**: Structured tool invocation with Pydantic schemas
|
||||
|
||||
### 2. State Management
|
||||
|
||||
LangGraph uses TypedDict for explicit state:
|
||||
|
||||
```python
|
||||
@@ -69,6 +73,7 @@ class CustomState(TypedDict):
|
||||
```
|
||||
|
||||
### 3. Memory Systems
|
||||
|
||||
Modern memory implementations:
|
||||
|
||||
- **ConversationBufferMemory**: Stores all messages (short conversations)
|
||||
@@ -78,15 +83,18 @@ Modern memory implementations:
|
||||
- **LangGraph Checkpointers**: Persistent state across sessions
|
||||
|
||||
### 4. Document Processing
|
||||
|
||||
Loading, transforming, and storing documents:
|
||||
|
||||
**Components:**
|
||||
|
||||
- **Document Loaders**: Load from various sources
|
||||
- **Text Splitters**: Chunk documents intelligently
|
||||
- **Vector Stores**: Store and retrieve embeddings
|
||||
- **Retrievers**: Fetch relevant documents
|
||||
|
||||
### 5. Callbacks & Tracing
|
||||
|
||||
LangSmith is the standard for observability:
|
||||
|
||||
- Request/response logging
|
||||
|
||||
@@ -20,9 +20,11 @@ Master comprehensive evaluation strategies for LLM applications, from automated
|
||||
## Core Evaluation Types
|
||||
|
||||
### 1. Automated Metrics
|
||||
|
||||
Fast, repeatable, scalable evaluation using computed scores.
|
||||
|
||||
**Text Generation:**
|
||||
|
||||
- **BLEU**: N-gram overlap (translation)
|
||||
- **ROUGE**: Recall-oriented (summarization)
|
||||
- **METEOR**: Semantic similarity
|
||||
@@ -30,21 +32,25 @@ Fast, repeatable, scalable evaluation using computed scores.
|
||||
- **Perplexity**: Language model confidence
|
||||
|
||||
**Classification:**
|
||||
|
||||
- **Accuracy**: Percentage correct
|
||||
- **Precision/Recall/F1**: Class-specific performance
|
||||
- **Confusion Matrix**: Error patterns
|
||||
- **AUC-ROC**: Ranking quality
|
||||
|
||||
**Retrieval (RAG):**
|
||||
|
||||
- **MRR**: Mean Reciprocal Rank
|
||||
- **NDCG**: Normalized Discounted Cumulative Gain
|
||||
- **Precision@K**: Relevant in top K
|
||||
- **Recall@K**: Coverage in top K
|
||||
|
||||
### 2. Human Evaluation
|
||||
|
||||
Manual assessment for quality aspects difficult to automate.
|
||||
|
||||
**Dimensions:**
|
||||
|
||||
- **Accuracy**: Factual correctness
|
||||
- **Coherence**: Logical flow
|
||||
- **Relevance**: Answers the question
|
||||
@@ -53,9 +59,11 @@ Manual assessment for quality aspects difficult to automate.
|
||||
- **Helpfulness**: Useful to the user
|
||||
|
||||
### 3. LLM-as-Judge
|
||||
|
||||
Use stronger LLMs to evaluate weaker model outputs.
|
||||
|
||||
**Approaches:**
|
||||
|
||||
- **Pointwise**: Score individual responses
|
||||
- **Pairwise**: Compare two responses
|
||||
- **Reference-based**: Compare to gold standard
|
||||
@@ -134,6 +142,7 @@ results = await suite.evaluate(model=your_model, test_cases=test_cases)
|
||||
## Automated Metrics Implementation
|
||||
|
||||
### BLEU Score
|
||||
|
||||
```python
|
||||
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
|
||||
|
||||
@@ -149,6 +158,7 @@ def calculate_bleu(reference: str, hypothesis: str, **kwargs) -> float:
|
||||
```
|
||||
|
||||
### ROUGE Score
|
||||
|
||||
```python
|
||||
from rouge_score import rouge_scorer
|
||||
|
||||
@@ -168,6 +178,7 @@ def calculate_rouge(reference: str, hypothesis: str, **kwargs) -> dict:
|
||||
```
|
||||
|
||||
### BERTScore
|
||||
|
||||
```python
|
||||
from bert_score import score
|
||||
|
||||
@@ -192,6 +203,7 @@ def calculate_bertscore(
|
||||
```
|
||||
|
||||
### Custom Metrics
|
||||
|
||||
```python
|
||||
def calculate_groundedness(response: str, context: str, **kwargs) -> float:
|
||||
"""Check if response is grounded in provided context."""
|
||||
@@ -232,6 +244,7 @@ def calculate_factuality(claim: str, sources: list[str], **kwargs) -> float:
|
||||
## LLM-as-Judge Patterns
|
||||
|
||||
### Single Output Evaluation
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic
|
||||
from pydantic import BaseModel, Field
|
||||
@@ -280,6 +293,7 @@ Provide ratings in JSON format:
|
||||
```
|
||||
|
||||
### Pairwise Comparison
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Literal
|
||||
@@ -324,6 +338,7 @@ Answer with JSON:
|
||||
```
|
||||
|
||||
### Reference-Based Evaluation
|
||||
|
||||
```python
|
||||
class ReferenceEvaluation(BaseModel):
|
||||
semantic_similarity: float = Field(ge=0, le=1)
|
||||
@@ -371,6 +386,7 @@ Respond in JSON:
|
||||
## Human Evaluation Frameworks
|
||||
|
||||
### Annotation Guidelines
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional
|
||||
@@ -412,6 +428,7 @@ class AnnotationTask:
|
||||
```
|
||||
|
||||
### Inter-Rater Agreement
|
||||
|
||||
```python
|
||||
from sklearn.metrics import cohen_kappa_score
|
||||
|
||||
@@ -444,6 +461,7 @@ def calculate_agreement(
|
||||
## A/B Testing
|
||||
|
||||
### Statistical Testing Framework
|
||||
|
||||
```python
|
||||
from scipy import stats
|
||||
import numpy as np
|
||||
@@ -504,6 +522,7 @@ class ABTest:
|
||||
## Regression Testing
|
||||
|
||||
### Regression Detection
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
|
||||
@@ -595,6 +614,7 @@ print(f"Mean score: {experiment_results.aggregate_metrics['qa']['mean']}")
|
||||
## Benchmarking
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
import numpy as np
|
||||
|
||||
@@ -21,6 +21,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
## Core Capabilities
|
||||
|
||||
### 1. Few-Shot Learning
|
||||
|
||||
- Example selection strategies (semantic similarity, diversity sampling)
|
||||
- Balancing example count with context window constraints
|
||||
- Constructing effective demonstrations with input-output pairs
|
||||
@@ -28,6 +29,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Handling edge cases through strategic example selection
|
||||
|
||||
### 2. Chain-of-Thought Prompting
|
||||
|
||||
- Step-by-step reasoning elicitation
|
||||
- Zero-shot CoT with "Let's think step by step"
|
||||
- Few-shot CoT with reasoning traces
|
||||
@@ -35,12 +37,14 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Verification and validation steps
|
||||
|
||||
### 3. Structured Outputs
|
||||
|
||||
- JSON mode for reliable parsing
|
||||
- Pydantic schema enforcement
|
||||
- Type-safe response handling
|
||||
- Error handling for malformed outputs
|
||||
|
||||
### 4. Prompt Optimization
|
||||
|
||||
- Iterative refinement workflows
|
||||
- A/B testing prompt variations
|
||||
- Measuring prompt performance metrics (accuracy, consistency, latency)
|
||||
@@ -48,6 +52,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Handling edge cases and failure modes
|
||||
|
||||
### 5. Template Systems
|
||||
|
||||
- Variable interpolation and formatting
|
||||
- Conditional prompt sections
|
||||
- Multi-turn conversation templates
|
||||
@@ -55,6 +60,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Modular prompt components
|
||||
|
||||
### 6. System Prompt Design
|
||||
|
||||
- Setting model behavior and constraints
|
||||
- Defining output formats and structure
|
||||
- Establishing role and expertise
|
||||
@@ -395,6 +401,7 @@ Response:"""
|
||||
## Performance Optimization
|
||||
|
||||
### Token Efficiency
|
||||
|
||||
```python
|
||||
# Before: Verbose prompt (150+ tokens)
|
||||
verbose_prompt = """
|
||||
@@ -457,6 +464,7 @@ response = client.messages.create(
|
||||
## Success Metrics
|
||||
|
||||
Track these KPIs for your prompts:
|
||||
|
||||
- **Accuracy**: Correctness of outputs
|
||||
- **Consistency**: Reproducibility across similar inputs
|
||||
- **Latency**: Response time (P50, P95, P99)
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Classification Templates
|
||||
|
||||
### Sentiment Analysis
|
||||
|
||||
```
|
||||
Classify the sentiment of the following text as Positive, Negative, or Neutral.
|
||||
|
||||
@@ -12,6 +13,7 @@ Sentiment:
|
||||
```
|
||||
|
||||
### Intent Detection
|
||||
|
||||
```
|
||||
Determine the user's intent from the following message.
|
||||
|
||||
@@ -23,6 +25,7 @@ Intent:
|
||||
```
|
||||
|
||||
### Topic Classification
|
||||
|
||||
```
|
||||
Classify the following article into one of these categories: {categories}
|
||||
|
||||
@@ -35,6 +38,7 @@ Category:
|
||||
## Extraction Templates
|
||||
|
||||
### Named Entity Recognition
|
||||
|
||||
```
|
||||
Extract all named entities from the text and categorize them.
|
||||
|
||||
@@ -50,6 +54,7 @@ Entities (JSON format):
|
||||
```
|
||||
|
||||
### Structured Data Extraction
|
||||
|
||||
```
|
||||
Extract structured information from the job posting.
|
||||
|
||||
@@ -70,6 +75,7 @@ Extracted Information (JSON):
|
||||
## Generation Templates
|
||||
|
||||
### Email Generation
|
||||
|
||||
```
|
||||
Write a professional {email_type} email.
|
||||
|
||||
@@ -84,6 +90,7 @@ Body:
|
||||
```
|
||||
|
||||
### Code Generation
|
||||
|
||||
```
|
||||
Generate {language} code for the following task:
|
||||
|
||||
@@ -101,6 +108,7 @@ Code:
|
||||
```
|
||||
|
||||
### Creative Writing
|
||||
|
||||
```
|
||||
Write a {length}-word {style} story about {topic}.
|
||||
|
||||
@@ -115,6 +123,7 @@ Story:
|
||||
## Transformation Templates
|
||||
|
||||
### Summarization
|
||||
|
||||
```
|
||||
Summarize the following text in {num_sentences} sentences.
|
||||
|
||||
@@ -125,6 +134,7 @@ Summary:
|
||||
```
|
||||
|
||||
### Translation with Context
|
||||
|
||||
```
|
||||
Translate the following {source_lang} text to {target_lang}.
|
||||
|
||||
@@ -137,6 +147,7 @@ Translation:
|
||||
```
|
||||
|
||||
### Format Conversion
|
||||
|
||||
```
|
||||
Convert the following {source_format} to {target_format}.
|
||||
|
||||
@@ -149,6 +160,7 @@ Output ({target_format}):
|
||||
## Analysis Templates
|
||||
|
||||
### Code Review
|
||||
|
||||
```
|
||||
Review the following code for:
|
||||
1. Bugs and errors
|
||||
@@ -163,6 +175,7 @@ Review:
|
||||
```
|
||||
|
||||
### SWOT Analysis
|
||||
|
||||
```
|
||||
Conduct a SWOT analysis for: {subject}
|
||||
|
||||
@@ -185,6 +198,7 @@ Threats:
|
||||
## Question Answering Templates
|
||||
|
||||
### RAG Template
|
||||
|
||||
```
|
||||
Answer the question based on the provided context. If the context doesn't contain enough information, say so.
|
||||
|
||||
@@ -197,6 +211,7 @@ Answer:
|
||||
```
|
||||
|
||||
### Multi-Turn Q&A
|
||||
|
||||
```
|
||||
Previous conversation:
|
||||
{conversation_history}
|
||||
@@ -209,6 +224,7 @@ Answer (continue naturally from conversation):
|
||||
## Specialized Templates
|
||||
|
||||
### SQL Query Generation
|
||||
|
||||
```
|
||||
Generate a SQL query for the following request.
|
||||
|
||||
@@ -221,6 +237,7 @@ SQL Query:
|
||||
```
|
||||
|
||||
### Regex Pattern Creation
|
||||
|
||||
```
|
||||
Create a regex pattern to match: {requirement}
|
||||
|
||||
@@ -234,6 +251,7 @@ Regex pattern:
|
||||
```
|
||||
|
||||
### API Documentation
|
||||
|
||||
```
|
||||
Generate API documentation for this function:
|
||||
|
||||
|
||||
@@ -7,6 +7,7 @@ Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from LLMs, drama
|
||||
## Core Techniques
|
||||
|
||||
### Zero-Shot CoT
|
||||
|
||||
Add a simple trigger phrase to elicit reasoning:
|
||||
|
||||
```python
|
||||
@@ -29,6 +30,7 @@ prompt = zero_shot_cot(query)
|
||||
```
|
||||
|
||||
### Few-Shot CoT
|
||||
|
||||
Provide examples with explicit reasoning chains:
|
||||
|
||||
```python
|
||||
@@ -53,6 +55,7 @@ A: Let's think step by step:"""
|
||||
```
|
||||
|
||||
### Self-Consistency
|
||||
|
||||
Generate multiple reasoning paths and take the majority vote:
|
||||
|
||||
```python
|
||||
@@ -85,6 +88,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
|
||||
## Advanced Patterns
|
||||
|
||||
### Least-to-Most Prompting
|
||||
|
||||
Break complex problems into simpler subproblems:
|
||||
|
||||
```python
|
||||
@@ -125,6 +129,7 @@ Final Answer:"""
|
||||
```
|
||||
|
||||
### Tree-of-Thought (ToT)
|
||||
|
||||
Explore multiple reasoning branches:
|
||||
|
||||
```python
|
||||
@@ -176,6 +181,7 @@ Score:"""
|
||||
```
|
||||
|
||||
### Verification Step
|
||||
|
||||
Add explicit verification to catch errors:
|
||||
|
||||
```python
|
||||
@@ -220,6 +226,7 @@ Corrected solution:"""
|
||||
## Domain-Specific CoT
|
||||
|
||||
### Math Problems
|
||||
|
||||
```python
|
||||
math_cot_template = """
|
||||
Problem: {problem}
|
||||
@@ -248,6 +255,7 @@ Answer: {final_answer}
|
||||
```
|
||||
|
||||
### Code Debugging
|
||||
|
||||
```python
|
||||
debug_cot_template = """
|
||||
Code with error:
|
||||
@@ -278,6 +286,7 @@ Fixed code:
|
||||
```
|
||||
|
||||
### Logical Reasoning
|
||||
|
||||
```python
|
||||
logic_cot_template = """
|
||||
Premises:
|
||||
@@ -305,6 +314,7 @@ Answer: {final_answer}
|
||||
## Performance Optimization
|
||||
|
||||
### Caching Reasoning Patterns
|
||||
|
||||
```python
|
||||
class ReasoningCache:
|
||||
def __init__(self):
|
||||
@@ -328,6 +338,7 @@ class ReasoningCache:
|
||||
```
|
||||
|
||||
### Adaptive Reasoning Depth
|
||||
|
||||
```python
|
||||
def adaptive_cot(problem, initial_depth=3):
|
||||
depth = initial_depth
|
||||
@@ -378,6 +389,7 @@ def evaluate_cot_quality(reasoning_chain):
|
||||
## When to Use CoT
|
||||
|
||||
**Use CoT for:**
|
||||
|
||||
- Math and arithmetic problems
|
||||
- Logical reasoning tasks
|
||||
- Multi-step planning
|
||||
@@ -385,6 +397,7 @@ def evaluate_cot_quality(reasoning_chain):
|
||||
- Complex decision making
|
||||
|
||||
**Skip CoT for:**
|
||||
|
||||
- Simple factual queries
|
||||
- Direct lookups
|
||||
- Creative writing
|
||||
|
||||
@@ -7,6 +7,7 @@ Few-shot learning enables LLMs to perform tasks by providing a small number of e
|
||||
## Example Selection Strategies
|
||||
|
||||
### 1. Semantic Similarity
|
||||
|
||||
Select examples most similar to the input query using embedding-based retrieval.
|
||||
|
||||
```python
|
||||
@@ -29,6 +30,7 @@ class SemanticExampleSelector:
|
||||
**Best For**: Question answering, text classification, extraction tasks
|
||||
|
||||
### 2. Diversity Sampling
|
||||
|
||||
Maximize coverage of different patterns and edge cases.
|
||||
|
||||
```python
|
||||
@@ -58,6 +60,7 @@ class DiversityExampleSelector:
|
||||
**Best For**: Demonstrating task variability, edge case handling
|
||||
|
||||
### 3. Difficulty-Based Selection
|
||||
|
||||
Gradually increase example complexity to scaffold learning.
|
||||
|
||||
```python
|
||||
@@ -75,6 +78,7 @@ class ProgressiveExampleSelector:
|
||||
**Best For**: Complex reasoning tasks, code generation
|
||||
|
||||
### 4. Error-Based Selection
|
||||
|
||||
Include examples that address common failure modes.
|
||||
|
||||
```python
|
||||
@@ -98,6 +102,7 @@ class ErrorGuidedSelector:
|
||||
## Example Construction Best Practices
|
||||
|
||||
### Format Consistency
|
||||
|
||||
All examples should follow identical formatting:
|
||||
|
||||
```python
|
||||
@@ -121,6 +126,7 @@ examples = [
|
||||
```
|
||||
|
||||
### Input-Output Alignment
|
||||
|
||||
Ensure examples demonstrate the exact task you want the model to perform:
|
||||
|
||||
```python
|
||||
@@ -138,6 +144,7 @@ example = {
|
||||
```
|
||||
|
||||
### Complexity Balance
|
||||
|
||||
Include examples spanning the expected difficulty range:
|
||||
|
||||
```python
|
||||
@@ -156,6 +163,7 @@ examples = [
|
||||
## Context Window Management
|
||||
|
||||
### Token Budget Allocation
|
||||
|
||||
Typical distribution for a 4K context window:
|
||||
|
||||
```
|
||||
@@ -166,6 +174,7 @@ Response: 1500 tokens (38%)
|
||||
```
|
||||
|
||||
### Dynamic Example Truncation
|
||||
|
||||
```python
|
||||
class TokenAwareSelector:
|
||||
def __init__(self, examples, tokenizer, max_tokens=1500):
|
||||
@@ -197,6 +206,7 @@ class TokenAwareSelector:
|
||||
## Edge Case Handling
|
||||
|
||||
### Include Boundary Examples
|
||||
|
||||
```python
|
||||
edge_case_examples = [
|
||||
# Empty input
|
||||
@@ -216,6 +226,7 @@ edge_case_examples = [
|
||||
## Few-Shot Prompt Templates
|
||||
|
||||
### Classification Template
|
||||
|
||||
```python
|
||||
def build_classification_prompt(examples, query, labels):
|
||||
prompt = f"Classify the text into one of these categories: {', '.join(labels)}\n\n"
|
||||
@@ -228,6 +239,7 @@ def build_classification_prompt(examples, query, labels):
|
||||
```
|
||||
|
||||
### Extraction Template
|
||||
|
||||
```python
|
||||
def build_extraction_prompt(examples, query):
|
||||
prompt = "Extract structured information from the text.\n\n"
|
||||
@@ -240,6 +252,7 @@ def build_extraction_prompt(examples, query):
|
||||
```
|
||||
|
||||
### Transformation Template
|
||||
|
||||
```python
|
||||
def build_transformation_prompt(examples, query):
|
||||
prompt = "Transform the input according to the pattern shown in examples.\n\n"
|
||||
@@ -254,6 +267,7 @@ def build_transformation_prompt(examples, query):
|
||||
## Evaluation and Optimization
|
||||
|
||||
### Example Quality Metrics
|
||||
|
||||
```python
|
||||
def evaluate_example_quality(example, validation_set):
|
||||
metrics = {
|
||||
@@ -266,6 +280,7 @@ def evaluate_example_quality(example, validation_set):
|
||||
```
|
||||
|
||||
### A/B Testing Example Sets
|
||||
|
||||
```python
|
||||
class ExampleSetTester:
|
||||
def __init__(self, llm_client):
|
||||
@@ -295,6 +310,7 @@ class ExampleSetTester:
|
||||
## Advanced Techniques
|
||||
|
||||
### Meta-Learning (Learning to Select)
|
||||
|
||||
Train a small model to predict which examples will be most effective:
|
||||
|
||||
```python
|
||||
@@ -334,6 +350,7 @@ class LearnedExampleSelector:
|
||||
```
|
||||
|
||||
### Adaptive Example Count
|
||||
|
||||
Dynamically adjust the number of examples based on task difficulty:
|
||||
|
||||
```python
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Systematic Refinement Process
|
||||
|
||||
### 1. Baseline Establishment
|
||||
|
||||
```python
|
||||
def establish_baseline(prompt, test_cases):
|
||||
results = {
|
||||
@@ -26,6 +27,7 @@ def establish_baseline(prompt, test_cases):
|
||||
```
|
||||
|
||||
### 2. Iterative Refinement Workflow
|
||||
|
||||
```
|
||||
Initial Prompt → Test → Analyze Failures → Refine → Test → Repeat
|
||||
```
|
||||
@@ -64,6 +66,7 @@ class PromptOptimizer:
|
||||
```
|
||||
|
||||
### 3. A/B Testing Framework
|
||||
|
||||
```python
|
||||
class PromptABTest:
|
||||
def __init__(self, variant_a, variant_b):
|
||||
@@ -116,6 +119,7 @@ class PromptABTest:
|
||||
## Optimization Strategies
|
||||
|
||||
### Token Reduction
|
||||
|
||||
```python
|
||||
def optimize_for_tokens(prompt):
|
||||
optimizations = [
|
||||
@@ -144,6 +148,7 @@ def optimize_for_tokens(prompt):
|
||||
```
|
||||
|
||||
### Latency Reduction
|
||||
|
||||
```python
|
||||
def optimize_for_latency(prompt):
|
||||
strategies = {
|
||||
@@ -167,6 +172,7 @@ def optimize_for_latency(prompt):
|
||||
```
|
||||
|
||||
### Accuracy Improvement
|
||||
|
||||
```python
|
||||
def improve_accuracy(prompt, failure_cases):
|
||||
improvements = []
|
||||
@@ -194,6 +200,7 @@ def improve_accuracy(prompt, failure_cases):
|
||||
## Performance Metrics
|
||||
|
||||
### Core Metrics
|
||||
|
||||
```python
|
||||
class PromptMetrics:
|
||||
@staticmethod
|
||||
@@ -230,6 +237,7 @@ class PromptMetrics:
|
||||
```
|
||||
|
||||
### Automated Evaluation
|
||||
|
||||
```python
|
||||
def evaluate_prompt_comprehensively(prompt, test_suite):
|
||||
results = {
|
||||
@@ -274,6 +282,7 @@ def evaluate_prompt_comprehensively(prompt, test_suite):
|
||||
## Failure Analysis
|
||||
|
||||
### Categorizing Failures
|
||||
|
||||
```python
|
||||
class FailureAnalyzer:
|
||||
def categorize_failures(self, test_results):
|
||||
@@ -326,6 +335,7 @@ class FailureAnalyzer:
|
||||
## Versioning and Rollback
|
||||
|
||||
### Prompt Version Control
|
||||
|
||||
```python
|
||||
class PromptVersionControl:
|
||||
def __init__(self, storage_path):
|
||||
@@ -381,24 +391,28 @@ class PromptVersionControl:
|
||||
## Common Optimization Patterns
|
||||
|
||||
### Pattern 1: Add Structure
|
||||
|
||||
```
|
||||
Before: "Analyze this text"
|
||||
After: "Analyze this text for:\n1. Main topic\n2. Key arguments\n3. Conclusion"
|
||||
```
|
||||
|
||||
### Pattern 2: Add Examples
|
||||
|
||||
```
|
||||
Before: "Extract entities"
|
||||
After: "Extract entities\\n\\nExample:\\nText: Apple released iPhone\\nEntities: {company: Apple, product: iPhone}"
|
||||
```
|
||||
|
||||
### Pattern 3: Add Constraints
|
||||
|
||||
```
|
||||
Before: "Summarize this"
|
||||
After: "Summarize in exactly 3 bullet points, 15 words each"
|
||||
```
|
||||
|
||||
### Pattern 4: Add Verification
|
||||
|
||||
```
|
||||
Before: "Calculate..."
|
||||
After: "Calculate... Then verify your calculation is correct before responding."
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Template Architecture
|
||||
|
||||
### Basic Template Structure
|
||||
|
||||
```python
|
||||
class PromptTemplate:
|
||||
def __init__(self, template_string, variables=None):
|
||||
@@ -30,6 +31,7 @@ prompt = template.render(
|
||||
```
|
||||
|
||||
### Conditional Templates
|
||||
|
||||
```python
|
||||
class ConditionalTemplate(PromptTemplate):
|
||||
def render(self, **kwargs):
|
||||
@@ -84,6 +86,7 @@ Reference examples:
|
||||
```
|
||||
|
||||
### Modular Template Composition
|
||||
|
||||
```python
|
||||
class ModularTemplate:
|
||||
def __init__(self):
|
||||
@@ -133,6 +136,7 @@ advanced_prompt = builder.render(
|
||||
## Common Template Patterns
|
||||
|
||||
### Classification Template
|
||||
|
||||
```python
|
||||
CLASSIFICATION_TEMPLATE = """
|
||||
Classify the following {content_type} into one of these categories: {categories}
|
||||
@@ -153,6 +157,7 @@ Category:"""
|
||||
```
|
||||
|
||||
### Extraction Template
|
||||
|
||||
```python
|
||||
EXTRACTION_TEMPLATE = """
|
||||
Extract structured information from the {content_type}.
|
||||
@@ -171,6 +176,7 @@ Extracted information (JSON):"""
|
||||
```
|
||||
|
||||
### Generation Template
|
||||
|
||||
```python
|
||||
GENERATION_TEMPLATE = """
|
||||
Generate {output_type} based on the following {input_type}.
|
||||
@@ -198,6 +204,7 @@ Examples:
|
||||
```
|
||||
|
||||
### Transformation Template
|
||||
|
||||
```python
|
||||
TRANSFORMATION_TEMPLATE = """
|
||||
Transform the input {source_format} to {target_format}.
|
||||
@@ -219,6 +226,7 @@ Output {target_format}:"""
|
||||
## Advanced Features
|
||||
|
||||
### Template Inheritance
|
||||
|
||||
```python
|
||||
class TemplateRegistry:
|
||||
def __init__(self):
|
||||
@@ -251,6 +259,7 @@ registry.register('sentiment_analysis', {
|
||||
```
|
||||
|
||||
### Variable Validation
|
||||
|
||||
```python
|
||||
class ValidatedTemplate:
|
||||
def __init__(self, template, schema):
|
||||
@@ -294,6 +303,7 @@ template = ValidatedTemplate(
|
||||
```
|
||||
|
||||
### Template Caching
|
||||
|
||||
```python
|
||||
class CachedTemplate:
|
||||
def __init__(self, template):
|
||||
@@ -323,6 +333,7 @@ class CachedTemplate:
|
||||
## Multi-Turn Templates
|
||||
|
||||
### Conversation Template
|
||||
|
||||
```python
|
||||
class ConversationTemplate:
|
||||
def __init__(self, system_prompt):
|
||||
@@ -349,6 +360,7 @@ class ConversationTemplate:
|
||||
```
|
||||
|
||||
### State-Based Templates
|
||||
|
||||
```python
|
||||
class StatefulTemplate:
|
||||
def __init__(self):
|
||||
@@ -406,6 +418,7 @@ Here's the result: {result}
|
||||
## Template Libraries
|
||||
|
||||
### Question Answering
|
||||
|
||||
```python
|
||||
QA_TEMPLATES = {
|
||||
'factual': """Answer the question based on the context.
|
||||
@@ -432,6 +445,7 @@ Assistant:"""
|
||||
```
|
||||
|
||||
### Content Generation
|
||||
|
||||
```python
|
||||
GENERATION_TEMPLATES = {
|
||||
'blog_post': """Write a blog post about {topic}.
|
||||
|
||||
@@ -11,6 +11,7 @@ System prompts set the foundation for LLM behavior. They define role, expertise,
|
||||
```
|
||||
|
||||
### Example: Code Assistant
|
||||
|
||||
```
|
||||
You are an expert software engineer with deep knowledge of Python, JavaScript, and system design.
|
||||
|
||||
@@ -36,6 +37,7 @@ Output format:
|
||||
## Pattern Library
|
||||
|
||||
### 1. Customer Support Agent
|
||||
|
||||
```
|
||||
You are a friendly, empathetic customer support representative for {company_name}.
|
||||
|
||||
@@ -59,6 +61,7 @@ Constraints:
|
||||
```
|
||||
|
||||
### 2. Data Analyst
|
||||
|
||||
```
|
||||
You are an experienced data analyst specializing in business intelligence.
|
||||
|
||||
@@ -85,6 +88,7 @@ Output:
|
||||
```
|
||||
|
||||
### 3. Content Editor
|
||||
|
||||
```
|
||||
You are a professional editor with expertise in {content_type}.
|
||||
|
||||
@@ -112,6 +116,7 @@ Format your feedback as:
|
||||
## Advanced Techniques
|
||||
|
||||
### Dynamic Role Adaptation
|
||||
|
||||
```python
|
||||
def build_adaptive_system_prompt(task_type, difficulty):
|
||||
base = "You are an expert assistant"
|
||||
@@ -136,6 +141,7 @@ Expertise level: {difficulty}
|
||||
```
|
||||
|
||||
### Constraint Specification
|
||||
|
||||
```
|
||||
Hard constraints (MUST follow):
|
||||
- Never generate harmful, biased, or illegal content
|
||||
|
||||
@@ -20,9 +20,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
## Core Components
|
||||
|
||||
### 1. Vector Databases
|
||||
|
||||
**Purpose**: Store and retrieve document embeddings efficiently
|
||||
|
||||
**Options:**
|
||||
|
||||
- **Pinecone**: Managed, scalable, serverless
|
||||
- **Weaviate**: Open-source, hybrid search, GraphQL
|
||||
- **Milvus**: High performance, on-premise
|
||||
@@ -31,6 +33,7 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
- **pgvector**: PostgreSQL extension, SQL integration
|
||||
|
||||
### 2. Embeddings
|
||||
|
||||
**Purpose**: Convert text to numerical vectors for similarity search
|
||||
|
||||
**Models (2026):**
|
||||
@@ -44,7 +47,9 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
| **multilingual-e5-large** | 1024 | Multi-language support |
|
||||
|
||||
### 3. Retrieval Strategies
|
||||
|
||||
**Approaches:**
|
||||
|
||||
- **Dense Retrieval**: Semantic similarity via embeddings
|
||||
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
|
||||
- **Hybrid Search**: Combine dense + sparse with weighted fusion
|
||||
@@ -52,9 +57,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
- **HyDE**: Generate hypothetical documents for better retrieval
|
||||
|
||||
### 4. Reranking
|
||||
|
||||
**Purpose**: Improve retrieval quality by reordering results
|
||||
|
||||
**Methods:**
|
||||
|
||||
- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
|
||||
- **Cohere Rerank**: API-based reranking
|
||||
- **Maximal Marginal Relevance (MMR)**: Diversity + relevance
|
||||
@@ -255,6 +262,7 @@ hyde_rag = builder.compile()
|
||||
## Document Chunking Strategies
|
||||
|
||||
### Recursive Character Text Splitter
|
||||
|
||||
```python
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
|
||||
@@ -269,6 +277,7 @@ chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### Token-Based Splitting
|
||||
|
||||
```python
|
||||
from langchain_text_splitters import TokenTextSplitter
|
||||
|
||||
@@ -280,6 +289,7 @@ splitter = TokenTextSplitter(
|
||||
```
|
||||
|
||||
### Semantic Chunking
|
||||
|
||||
```python
|
||||
from langchain_experimental.text_splitter import SemanticChunker
|
||||
|
||||
@@ -291,6 +301,7 @@ splitter = SemanticChunker(
|
||||
```
|
||||
|
||||
### Markdown Header Splitter
|
||||
|
||||
```python
|
||||
from langchain_text_splitters import MarkdownHeaderTextSplitter
|
||||
|
||||
@@ -309,6 +320,7 @@ splitter = MarkdownHeaderTextSplitter(
|
||||
## Vector Store Configurations
|
||||
|
||||
### Pinecone (Serverless)
|
||||
|
||||
```python
|
||||
from pinecone import Pinecone, ServerlessSpec
|
||||
from langchain_pinecone import PineconeVectorStore
|
||||
@@ -331,6 +343,7 @@ vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
|
||||
```
|
||||
|
||||
### Weaviate
|
||||
|
||||
```python
|
||||
import weaviate
|
||||
from langchain_weaviate import WeaviateVectorStore
|
||||
@@ -346,6 +359,7 @@ vectorstore = WeaviateVectorStore(
|
||||
```
|
||||
|
||||
### Chroma (Local Development)
|
||||
|
||||
```python
|
||||
from langchain_chroma import Chroma
|
||||
|
||||
@@ -357,6 +371,7 @@ vectorstore = Chroma(
|
||||
```
|
||||
|
||||
### pgvector (PostgreSQL)
|
||||
|
||||
```python
|
||||
from langchain_postgres.vectorstores import PGVector
|
||||
|
||||
@@ -372,6 +387,7 @@ vectorstore = PGVector(
|
||||
## Retrieval Optimization
|
||||
|
||||
### 1. Metadata Filtering
|
||||
|
||||
```python
|
||||
from langchain_core.documents import Document
|
||||
|
||||
@@ -394,6 +410,7 @@ results = await vectorstore.asimilarity_search(
|
||||
```
|
||||
|
||||
### 2. Maximal Marginal Relevance (MMR)
|
||||
|
||||
```python
|
||||
# Balance relevance with diversity
|
||||
results = await vectorstore.amax_marginal_relevance_search(
|
||||
@@ -405,6 +422,7 @@ results = await vectorstore.amax_marginal_relevance_search(
|
||||
```
|
||||
|
||||
### 3. Reranking with Cross-Encoder
|
||||
|
||||
```python
|
||||
from sentence_transformers import CrossEncoder
|
||||
|
||||
@@ -424,6 +442,7 @@ async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
|
||||
```
|
||||
|
||||
### 4. Cohere Rerank
|
||||
|
||||
```python
|
||||
from langchain.retrievers import CohereRerank
|
||||
from langchain_cohere import CohereRerank
|
||||
@@ -440,6 +459,7 @@ reranked_retriever = ContextualCompressionRetriever(
|
||||
## Prompt Engineering for RAG
|
||||
|
||||
### Contextual Prompt with Citations
|
||||
|
||||
```python
|
||||
rag_prompt = ChatPromptTemplate.from_template(
|
||||
"""Answer the question based on the context below. Include citations using [1], [2], etc.
|
||||
@@ -461,6 +481,7 @@ rag_prompt = ChatPromptTemplate.from_template(
|
||||
```
|
||||
|
||||
### Structured Output for RAG
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
@@ -20,12 +20,12 @@ Patterns for implementing efficient similarity search in production systems.
|
||||
|
||||
### 1. Distance Metrics
|
||||
|
||||
| Metric | Formula | Best For |
|
||||
|--------|---------|----------|
|
||||
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
|
||||
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
|
||||
| **Dot Product** | A·B | Magnitude matters |
|
||||
| **Manhattan (L1)** | Σ|a-b| | Sparse vectors |
|
||||
| Metric | Formula | Best For |
|
||||
| ------------------ | ------------------ | --------------------- | --- | -------------- |
|
||||
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
|
||||
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
|
||||
| **Dot Product** | A·B | Magnitude matters |
|
||||
| **Manhattan (L1)** | Σ | a-b | | Sparse vectors |
|
||||
|
||||
### 2. Index Types
|
||||
|
||||
@@ -538,6 +538,7 @@ class WeaviateVectorStore:
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Use appropriate index** - HNSW for most cases
|
||||
- **Tune parameters** - ef_search, nprobe for recall/speed
|
||||
- **Implement hybrid search** - Combine with keyword search
|
||||
@@ -545,6 +546,7 @@ class WeaviateVectorStore:
|
||||
- **Pre-filter when possible** - Reduce search space
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't skip evaluation** - Measure before optimizing
|
||||
- **Don't over-index** - Start with flat, scale up
|
||||
- **Don't ignore latency** - P99 matters for UX
|
||||
|
||||
@@ -31,11 +31,11 @@ Data Size Recommended Index
|
||||
|
||||
### 2. HNSW Parameters
|
||||
|
||||
| Parameter | Default | Effect |
|
||||
|-----------|---------|--------|
|
||||
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
|
||||
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
|
||||
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
|
||||
| Parameter | Default | Effect |
|
||||
| ------------------ | ------- | ---------------------------------------------------- |
|
||||
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
|
||||
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
|
||||
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
|
||||
|
||||
### 3. Quantization Types
|
||||
|
||||
@@ -502,6 +502,7 @@ def profile_index_build(
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Benchmark with real queries** - Synthetic may not represent production
|
||||
- **Monitor recall continuously** - Can degrade with data drift
|
||||
- **Start with defaults** - Tune only when needed
|
||||
@@ -509,6 +510,7 @@ def profile_index_build(
|
||||
- **Consider tiered storage** - Hot/cold data separation
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't over-optimize early** - Profile first
|
||||
- **Don't ignore build time** - Index updates have cost
|
||||
- **Don't forget reindexing** - Plan for maintenance
|
||||
|
||||
Reference in New Issue
Block a user