style: format all files with prettier

This commit is contained in:
Seth Hobson
2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions

View File

@@ -20,18 +20,18 @@ Guide to selecting and optimizing embedding models for vector search application
### 1. Embedding Model Comparison (2026)
| Model | Dimensions | Max Tokens | Best For |
|-------|------------|------------|----------|
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
| **voyage-code-3** | 1024 | 32000 | Code search |
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
| **voyage-law-2** | 1024 | 32000 | Legal documents |
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
| Model | Dimensions | Max Tokens | Best For |
| -------------------------- | ---------- | ---------- | ----------------------------------- |
| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
| **voyage-code-3** | 1024 | 32000 | Code search |
| **voyage-finance-2** | 1024 | 32000 | Financial documents |
| **voyage-law-2** | 1024 | 32000 | Legal documents |
| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
| **multilingual-e5-large** | 1024 | 512 | Multi-language |
### 2. Embedding Pipeline
@@ -583,6 +583,7 @@ def compare_embedding_models(
## Best Practices
### Do's
- **Match model to use case**: Code vs prose vs multilingual
- **Chunk thoughtfully**: Preserve semantic boundaries
- **Normalize embeddings**: For cosine similarity search
@@ -591,6 +592,7 @@ def compare_embedding_models(
- **Use Voyage AI for Claude apps**: Recommended by Anthropic
### Don'ts
- **Don't ignore token limits**: Truncation loses information
- **Don't mix embedding models**: Incompatible vector spaces
- **Don't skip preprocessing**: Garbage in, garbage out

View File

@@ -27,12 +27,12 @@ Query → ┬─► Vector Search ──► Candidates ─┐
### 2. Fusion Methods
| Method | Description | Best For |
|--------|-------------|----------|
| **RRF** | Reciprocal Rank Fusion | General purpose |
| **Linear** | Weighted sum of scores | Tunable balance |
| Method | Description | Best For |
| ----------------- | ------------------------ | --------------- |
| **RRF** | Reciprocal Rank Fusion | General purpose |
| **Linear** | Weighted sum of scores | Tunable balance |
| **Cross-encoder** | Rerank with neural model | Highest quality |
| **Cascade** | Filter then rerank | Efficiency |
| **Cascade** | Filter then rerank | Efficiency |
## Templates
@@ -549,6 +549,7 @@ class HybridRAGPipeline:
## Best Practices
### Do's
- **Tune weights empirically** - Test on your data
- **Use RRF for simplicity** - Works well without tuning
- **Add reranking** - Significant quality improvement
@@ -556,6 +557,7 @@ class HybridRAGPipeline:
- **A/B test** - Measure real user impact
### Don'ts
- **Don't assume one size fits all** - Different queries need different weights
- **Don't skip keyword search** - Handles exact matches better
- **Don't over-fetch** - Balance recall vs latency

View File

@@ -33,9 +33,11 @@ langchain-pinecone # Pinecone vector store
## Core Concepts
### 1. LangGraph Agents
LangGraph is the standard for building agents in 2026. It provides:
**Key Features:**
- **StateGraph**: Explicit state management with typed state
- **Durable Execution**: Agents persist through failures
- **Human-in-the-Loop**: Inspect and modify state at any point
@@ -43,12 +45,14 @@ LangGraph is the standard for building agents in 2026. It provides:
- **Checkpointing**: Save and resume agent state
**Agent Patterns:**
- **ReAct**: Reasoning + Acting with `create_react_agent`
- **Plan-and-Execute**: Separate planning and execution nodes
- **Multi-Agent**: Supervisor routing between specialized agents
- **Tool-Calling**: Structured tool invocation with Pydantic schemas
### 2. State Management
LangGraph uses TypedDict for explicit state:
```python
@@ -69,6 +73,7 @@ class CustomState(TypedDict):
```
### 3. Memory Systems
Modern memory implementations:
- **ConversationBufferMemory**: Stores all messages (short conversations)
@@ -78,15 +83,18 @@ Modern memory implementations:
- **LangGraph Checkpointers**: Persistent state across sessions
### 4. Document Processing
Loading, transforming, and storing documents:
**Components:**
- **Document Loaders**: Load from various sources
- **Text Splitters**: Chunk documents intelligently
- **Vector Stores**: Store and retrieve embeddings
- **Retrievers**: Fetch relevant documents
### 5. Callbacks & Tracing
LangSmith is the standard for observability:
- Request/response logging

View File

@@ -20,9 +20,11 @@ Master comprehensive evaluation strategies for LLM applications, from automated
## Core Evaluation Types
### 1. Automated Metrics
Fast, repeatable, scalable evaluation using computed scores.
**Text Generation:**
- **BLEU**: N-gram overlap (translation)
- **ROUGE**: Recall-oriented (summarization)
- **METEOR**: Semantic similarity
@@ -30,21 +32,25 @@ Fast, repeatable, scalable evaluation using computed scores.
- **Perplexity**: Language model confidence
**Classification:**
- **Accuracy**: Percentage correct
- **Precision/Recall/F1**: Class-specific performance
- **Confusion Matrix**: Error patterns
- **AUC-ROC**: Ranking quality
**Retrieval (RAG):**
- **MRR**: Mean Reciprocal Rank
- **NDCG**: Normalized Discounted Cumulative Gain
- **Precision@K**: Relevant in top K
- **Recall@K**: Coverage in top K
### 2. Human Evaluation
Manual assessment for quality aspects difficult to automate.
**Dimensions:**
- **Accuracy**: Factual correctness
- **Coherence**: Logical flow
- **Relevance**: Answers the question
@@ -53,9 +59,11 @@ Manual assessment for quality aspects difficult to automate.
- **Helpfulness**: Useful to the user
### 3. LLM-as-Judge
Use stronger LLMs to evaluate weaker model outputs.
**Approaches:**
- **Pointwise**: Score individual responses
- **Pairwise**: Compare two responses
- **Reference-based**: Compare to gold standard
@@ -134,6 +142,7 @@ results = await suite.evaluate(model=your_model, test_cases=test_cases)
## Automated Metrics Implementation
### BLEU Score
```python
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
@@ -149,6 +158,7 @@ def calculate_bleu(reference: str, hypothesis: str, **kwargs) -> float:
```
### ROUGE Score
```python
from rouge_score import rouge_scorer
@@ -168,6 +178,7 @@ def calculate_rouge(reference: str, hypothesis: str, **kwargs) -> dict:
```
### BERTScore
```python
from bert_score import score
@@ -192,6 +203,7 @@ def calculate_bertscore(
```
### Custom Metrics
```python
def calculate_groundedness(response: str, context: str, **kwargs) -> float:
"""Check if response is grounded in provided context."""
@@ -232,6 +244,7 @@ def calculate_factuality(claim: str, sources: list[str], **kwargs) -> float:
## LLM-as-Judge Patterns
### Single Output Evaluation
```python
from anthropic import Anthropic
from pydantic import BaseModel, Field
@@ -280,6 +293,7 @@ Provide ratings in JSON format:
```
### Pairwise Comparison
```python
from pydantic import BaseModel, Field
from typing import Literal
@@ -324,6 +338,7 @@ Answer with JSON:
```
### Reference-Based Evaluation
```python
class ReferenceEvaluation(BaseModel):
semantic_similarity: float = Field(ge=0, le=1)
@@ -371,6 +386,7 @@ Respond in JSON:
## Human Evaluation Frameworks
### Annotation Guidelines
```python
from dataclasses import dataclass, field
from typing import Optional
@@ -412,6 +428,7 @@ class AnnotationTask:
```
### Inter-Rater Agreement
```python
from sklearn.metrics import cohen_kappa_score
@@ -444,6 +461,7 @@ def calculate_agreement(
## A/B Testing
### Statistical Testing Framework
```python
from scipy import stats
import numpy as np
@@ -504,6 +522,7 @@ class ABTest:
## Regression Testing
### Regression Detection
```python
from dataclasses import dataclass
@@ -595,6 +614,7 @@ print(f"Mean score: {experiment_results.aggregate_metrics['qa']['mean']}")
## Benchmarking
### Running Benchmarks
```python
from dataclasses import dataclass
import numpy as np

View File

@@ -21,6 +21,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
## Core Capabilities
### 1. Few-Shot Learning
- Example selection strategies (semantic similarity, diversity sampling)
- Balancing example count with context window constraints
- Constructing effective demonstrations with input-output pairs
@@ -28,6 +29,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Handling edge cases through strategic example selection
### 2. Chain-of-Thought Prompting
- Step-by-step reasoning elicitation
- Zero-shot CoT with "Let's think step by step"
- Few-shot CoT with reasoning traces
@@ -35,12 +37,14 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Verification and validation steps
### 3. Structured Outputs
- JSON mode for reliable parsing
- Pydantic schema enforcement
- Type-safe response handling
- Error handling for malformed outputs
### 4. Prompt Optimization
- Iterative refinement workflows
- A/B testing prompt variations
- Measuring prompt performance metrics (accuracy, consistency, latency)
@@ -48,6 +52,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Handling edge cases and failure modes
### 5. Template Systems
- Variable interpolation and formatting
- Conditional prompt sections
- Multi-turn conversation templates
@@ -55,6 +60,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
- Modular prompt components
### 6. System Prompt Design
- Setting model behavior and constraints
- Defining output formats and structure
- Establishing role and expertise
@@ -395,6 +401,7 @@ Response:"""
## Performance Optimization
### Token Efficiency
```python
# Before: Verbose prompt (150+ tokens)
verbose_prompt = """
@@ -457,6 +464,7 @@ response = client.messages.create(
## Success Metrics
Track these KPIs for your prompts:
- **Accuracy**: Correctness of outputs
- **Consistency**: Reproducibility across similar inputs
- **Latency**: Response time (P50, P95, P99)

View File

@@ -3,6 +3,7 @@
## Classification Templates
### Sentiment Analysis
```
Classify the sentiment of the following text as Positive, Negative, or Neutral.
@@ -12,6 +13,7 @@ Sentiment:
```
### Intent Detection
```
Determine the user's intent from the following message.
@@ -23,6 +25,7 @@ Intent:
```
### Topic Classification
```
Classify the following article into one of these categories: {categories}
@@ -35,6 +38,7 @@ Category:
## Extraction Templates
### Named Entity Recognition
```
Extract all named entities from the text and categorize them.
@@ -50,6 +54,7 @@ Entities (JSON format):
```
### Structured Data Extraction
```
Extract structured information from the job posting.
@@ -70,6 +75,7 @@ Extracted Information (JSON):
## Generation Templates
### Email Generation
```
Write a professional {email_type} email.
@@ -84,6 +90,7 @@ Body:
```
### Code Generation
```
Generate {language} code for the following task:
@@ -101,6 +108,7 @@ Code:
```
### Creative Writing
```
Write a {length}-word {style} story about {topic}.
@@ -115,6 +123,7 @@ Story:
## Transformation Templates
### Summarization
```
Summarize the following text in {num_sentences} sentences.
@@ -125,6 +134,7 @@ Summary:
```
### Translation with Context
```
Translate the following {source_lang} text to {target_lang}.
@@ -137,6 +147,7 @@ Translation:
```
### Format Conversion
```
Convert the following {source_format} to {target_format}.
@@ -149,6 +160,7 @@ Output ({target_format}):
## Analysis Templates
### Code Review
```
Review the following code for:
1. Bugs and errors
@@ -163,6 +175,7 @@ Review:
```
### SWOT Analysis
```
Conduct a SWOT analysis for: {subject}
@@ -185,6 +198,7 @@ Threats:
## Question Answering Templates
### RAG Template
```
Answer the question based on the provided context. If the context doesn't contain enough information, say so.
@@ -197,6 +211,7 @@ Answer:
```
### Multi-Turn Q&A
```
Previous conversation:
{conversation_history}
@@ -209,6 +224,7 @@ Answer (continue naturally from conversation):
## Specialized Templates
### SQL Query Generation
```
Generate a SQL query for the following request.
@@ -221,6 +237,7 @@ SQL Query:
```
### Regex Pattern Creation
```
Create a regex pattern to match: {requirement}
@@ -234,6 +251,7 @@ Regex pattern:
```
### API Documentation
```
Generate API documentation for this function:

View File

@@ -7,6 +7,7 @@ Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from LLMs, drama
## Core Techniques
### Zero-Shot CoT
Add a simple trigger phrase to elicit reasoning:
```python
@@ -29,6 +30,7 @@ prompt = zero_shot_cot(query)
```
### Few-Shot CoT
Provide examples with explicit reasoning chains:
```python
@@ -53,6 +55,7 @@ A: Let's think step by step:"""
```
### Self-Consistency
Generate multiple reasoning paths and take the majority vote:
```python
@@ -85,6 +88,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
## Advanced Patterns
### Least-to-Most Prompting
Break complex problems into simpler subproblems:
```python
@@ -125,6 +129,7 @@ Final Answer:"""
```
### Tree-of-Thought (ToT)
Explore multiple reasoning branches:
```python
@@ -176,6 +181,7 @@ Score:"""
```
### Verification Step
Add explicit verification to catch errors:
```python
@@ -220,6 +226,7 @@ Corrected solution:"""
## Domain-Specific CoT
### Math Problems
```python
math_cot_template = """
Problem: {problem}
@@ -248,6 +255,7 @@ Answer: {final_answer}
```
### Code Debugging
```python
debug_cot_template = """
Code with error:
@@ -278,6 +286,7 @@ Fixed code:
```
### Logical Reasoning
```python
logic_cot_template = """
Premises:
@@ -305,6 +314,7 @@ Answer: {final_answer}
## Performance Optimization
### Caching Reasoning Patterns
```python
class ReasoningCache:
def __init__(self):
@@ -328,6 +338,7 @@ class ReasoningCache:
```
### Adaptive Reasoning Depth
```python
def adaptive_cot(problem, initial_depth=3):
depth = initial_depth
@@ -378,6 +389,7 @@ def evaluate_cot_quality(reasoning_chain):
## When to Use CoT
**Use CoT for:**
- Math and arithmetic problems
- Logical reasoning tasks
- Multi-step planning
@@ -385,6 +397,7 @@ def evaluate_cot_quality(reasoning_chain):
- Complex decision making
**Skip CoT for:**
- Simple factual queries
- Direct lookups
- Creative writing

View File

@@ -7,6 +7,7 @@ Few-shot learning enables LLMs to perform tasks by providing a small number of e
## Example Selection Strategies
### 1. Semantic Similarity
Select examples most similar to the input query using embedding-based retrieval.
```python
@@ -29,6 +30,7 @@ class SemanticExampleSelector:
**Best For**: Question answering, text classification, extraction tasks
### 2. Diversity Sampling
Maximize coverage of different patterns and edge cases.
```python
@@ -58,6 +60,7 @@ class DiversityExampleSelector:
**Best For**: Demonstrating task variability, edge case handling
### 3. Difficulty-Based Selection
Gradually increase example complexity to scaffold learning.
```python
@@ -75,6 +78,7 @@ class ProgressiveExampleSelector:
**Best For**: Complex reasoning tasks, code generation
### 4. Error-Based Selection
Include examples that address common failure modes.
```python
@@ -98,6 +102,7 @@ class ErrorGuidedSelector:
## Example Construction Best Practices
### Format Consistency
All examples should follow identical formatting:
```python
@@ -121,6 +126,7 @@ examples = [
```
### Input-Output Alignment
Ensure examples demonstrate the exact task you want the model to perform:
```python
@@ -138,6 +144,7 @@ example = {
```
### Complexity Balance
Include examples spanning the expected difficulty range:
```python
@@ -156,6 +163,7 @@ examples = [
## Context Window Management
### Token Budget Allocation
Typical distribution for a 4K context window:
```
@@ -166,6 +174,7 @@ Response: 1500 tokens (38%)
```
### Dynamic Example Truncation
```python
class TokenAwareSelector:
def __init__(self, examples, tokenizer, max_tokens=1500):
@@ -197,6 +206,7 @@ class TokenAwareSelector:
## Edge Case Handling
### Include Boundary Examples
```python
edge_case_examples = [
# Empty input
@@ -216,6 +226,7 @@ edge_case_examples = [
## Few-Shot Prompt Templates
### Classification Template
```python
def build_classification_prompt(examples, query, labels):
prompt = f"Classify the text into one of these categories: {', '.join(labels)}\n\n"
@@ -228,6 +239,7 @@ def build_classification_prompt(examples, query, labels):
```
### Extraction Template
```python
def build_extraction_prompt(examples, query):
prompt = "Extract structured information from the text.\n\n"
@@ -240,6 +252,7 @@ def build_extraction_prompt(examples, query):
```
### Transformation Template
```python
def build_transformation_prompt(examples, query):
prompt = "Transform the input according to the pattern shown in examples.\n\n"
@@ -254,6 +267,7 @@ def build_transformation_prompt(examples, query):
## Evaluation and Optimization
### Example Quality Metrics
```python
def evaluate_example_quality(example, validation_set):
metrics = {
@@ -266,6 +280,7 @@ def evaluate_example_quality(example, validation_set):
```
### A/B Testing Example Sets
```python
class ExampleSetTester:
def __init__(self, llm_client):
@@ -295,6 +310,7 @@ class ExampleSetTester:
## Advanced Techniques
### Meta-Learning (Learning to Select)
Train a small model to predict which examples will be most effective:
```python
@@ -334,6 +350,7 @@ class LearnedExampleSelector:
```
### Adaptive Example Count
Dynamically adjust the number of examples based on task difficulty:
```python

View File

@@ -3,6 +3,7 @@
## Systematic Refinement Process
### 1. Baseline Establishment
```python
def establish_baseline(prompt, test_cases):
results = {
@@ -26,6 +27,7 @@ def establish_baseline(prompt, test_cases):
```
### 2. Iterative Refinement Workflow
```
Initial Prompt → Test → Analyze Failures → Refine → Test → Repeat
```
@@ -64,6 +66,7 @@ class PromptOptimizer:
```
### 3. A/B Testing Framework
```python
class PromptABTest:
def __init__(self, variant_a, variant_b):
@@ -116,6 +119,7 @@ class PromptABTest:
## Optimization Strategies
### Token Reduction
```python
def optimize_for_tokens(prompt):
optimizations = [
@@ -144,6 +148,7 @@ def optimize_for_tokens(prompt):
```
### Latency Reduction
```python
def optimize_for_latency(prompt):
strategies = {
@@ -167,6 +172,7 @@ def optimize_for_latency(prompt):
```
### Accuracy Improvement
```python
def improve_accuracy(prompt, failure_cases):
improvements = []
@@ -194,6 +200,7 @@ def improve_accuracy(prompt, failure_cases):
## Performance Metrics
### Core Metrics
```python
class PromptMetrics:
@staticmethod
@@ -230,6 +237,7 @@ class PromptMetrics:
```
### Automated Evaluation
```python
def evaluate_prompt_comprehensively(prompt, test_suite):
results = {
@@ -274,6 +282,7 @@ def evaluate_prompt_comprehensively(prompt, test_suite):
## Failure Analysis
### Categorizing Failures
```python
class FailureAnalyzer:
def categorize_failures(self, test_results):
@@ -326,6 +335,7 @@ class FailureAnalyzer:
## Versioning and Rollback
### Prompt Version Control
```python
class PromptVersionControl:
def __init__(self, storage_path):
@@ -381,24 +391,28 @@ class PromptVersionControl:
## Common Optimization Patterns
### Pattern 1: Add Structure
```
Before: "Analyze this text"
After: "Analyze this text for:\n1. Main topic\n2. Key arguments\n3. Conclusion"
```
### Pattern 2: Add Examples
```
Before: "Extract entities"
After: "Extract entities\\n\\nExample:\\nText: Apple released iPhone\\nEntities: {company: Apple, product: iPhone}"
```
### Pattern 3: Add Constraints
```
Before: "Summarize this"
After: "Summarize in exactly 3 bullet points, 15 words each"
```
### Pattern 4: Add Verification
```
Before: "Calculate..."
After: "Calculate... Then verify your calculation is correct before responding."

View File

@@ -3,6 +3,7 @@
## Template Architecture
### Basic Template Structure
```python
class PromptTemplate:
def __init__(self, template_string, variables=None):
@@ -30,6 +31,7 @@ prompt = template.render(
```
### Conditional Templates
```python
class ConditionalTemplate(PromptTemplate):
def render(self, **kwargs):
@@ -84,6 +86,7 @@ Reference examples:
```
### Modular Template Composition
```python
class ModularTemplate:
def __init__(self):
@@ -133,6 +136,7 @@ advanced_prompt = builder.render(
## Common Template Patterns
### Classification Template
```python
CLASSIFICATION_TEMPLATE = """
Classify the following {content_type} into one of these categories: {categories}
@@ -153,6 +157,7 @@ Category:"""
```
### Extraction Template
```python
EXTRACTION_TEMPLATE = """
Extract structured information from the {content_type}.
@@ -171,6 +176,7 @@ Extracted information (JSON):"""
```
### Generation Template
```python
GENERATION_TEMPLATE = """
Generate {output_type} based on the following {input_type}.
@@ -198,6 +204,7 @@ Examples:
```
### Transformation Template
```python
TRANSFORMATION_TEMPLATE = """
Transform the input {source_format} to {target_format}.
@@ -219,6 +226,7 @@ Output {target_format}:"""
## Advanced Features
### Template Inheritance
```python
class TemplateRegistry:
def __init__(self):
@@ -251,6 +259,7 @@ registry.register('sentiment_analysis', {
```
### Variable Validation
```python
class ValidatedTemplate:
def __init__(self, template, schema):
@@ -294,6 +303,7 @@ template = ValidatedTemplate(
```
### Template Caching
```python
class CachedTemplate:
def __init__(self, template):
@@ -323,6 +333,7 @@ class CachedTemplate:
## Multi-Turn Templates
### Conversation Template
```python
class ConversationTemplate:
def __init__(self, system_prompt):
@@ -349,6 +360,7 @@ class ConversationTemplate:
```
### State-Based Templates
```python
class StatefulTemplate:
def __init__(self):
@@ -406,6 +418,7 @@ Here's the result: {result}
## Template Libraries
### Question Answering
```python
QA_TEMPLATES = {
'factual': """Answer the question based on the context.
@@ -432,6 +445,7 @@ Assistant:"""
```
### Content Generation
```python
GENERATION_TEMPLATES = {
'blog_post': """Write a blog post about {topic}.

View File

@@ -11,6 +11,7 @@ System prompts set the foundation for LLM behavior. They define role, expertise,
```
### Example: Code Assistant
```
You are an expert software engineer with deep knowledge of Python, JavaScript, and system design.
@@ -36,6 +37,7 @@ Output format:
## Pattern Library
### 1. Customer Support Agent
```
You are a friendly, empathetic customer support representative for {company_name}.
@@ -59,6 +61,7 @@ Constraints:
```
### 2. Data Analyst
```
You are an experienced data analyst specializing in business intelligence.
@@ -85,6 +88,7 @@ Output:
```
### 3. Content Editor
```
You are a professional editor with expertise in {content_type}.
@@ -112,6 +116,7 @@ Format your feedback as:
## Advanced Techniques
### Dynamic Role Adaptation
```python
def build_adaptive_system_prompt(task_type, difficulty):
base = "You are an expert assistant"
@@ -136,6 +141,7 @@ Expertise level: {difficulty}
```
### Constraint Specification
```
Hard constraints (MUST follow):
- Never generate harmful, biased, or illegal content

View File

@@ -20,9 +20,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
## Core Components
### 1. Vector Databases
**Purpose**: Store and retrieve document embeddings efficiently
**Options:**
- **Pinecone**: Managed, scalable, serverless
- **Weaviate**: Open-source, hybrid search, GraphQL
- **Milvus**: High performance, on-premise
@@ -31,6 +33,7 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
- **pgvector**: PostgreSQL extension, SQL integration
### 2. Embeddings
**Purpose**: Convert text to numerical vectors for similarity search
**Models (2026):**
@@ -44,7 +47,9 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
| **multilingual-e5-large** | 1024 | Multi-language support |
### 3. Retrieval Strategies
**Approaches:**
- **Dense Retrieval**: Semantic similarity via embeddings
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
- **Hybrid Search**: Combine dense + sparse with weighted fusion
@@ -52,9 +57,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
- **HyDE**: Generate hypothetical documents for better retrieval
### 4. Reranking
**Purpose**: Improve retrieval quality by reordering results
**Methods:**
- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
- **Cohere Rerank**: API-based reranking
- **Maximal Marginal Relevance (MMR)**: Diversity + relevance
@@ -255,6 +262,7 @@ hyde_rag = builder.compile()
## Document Chunking Strategies
### Recursive Character Text Splitter
```python
from langchain_text_splitters import RecursiveCharacterTextSplitter
@@ -269,6 +277,7 @@ chunks = splitter.split_documents(documents)
```
### Token-Based Splitting
```python
from langchain_text_splitters import TokenTextSplitter
@@ -280,6 +289,7 @@ splitter = TokenTextSplitter(
```
### Semantic Chunking
```python
from langchain_experimental.text_splitter import SemanticChunker
@@ -291,6 +301,7 @@ splitter = SemanticChunker(
```
### Markdown Header Splitter
```python
from langchain_text_splitters import MarkdownHeaderTextSplitter
@@ -309,6 +320,7 @@ splitter = MarkdownHeaderTextSplitter(
## Vector Store Configurations
### Pinecone (Serverless)
```python
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore
@@ -331,6 +343,7 @@ vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
```
### Weaviate
```python
import weaviate
from langchain_weaviate import WeaviateVectorStore
@@ -346,6 +359,7 @@ vectorstore = WeaviateVectorStore(
```
### Chroma (Local Development)
```python
from langchain_chroma import Chroma
@@ -357,6 +371,7 @@ vectorstore = Chroma(
```
### pgvector (PostgreSQL)
```python
from langchain_postgres.vectorstores import PGVector
@@ -372,6 +387,7 @@ vectorstore = PGVector(
## Retrieval Optimization
### 1. Metadata Filtering
```python
from langchain_core.documents import Document
@@ -394,6 +410,7 @@ results = await vectorstore.asimilarity_search(
```
### 2. Maximal Marginal Relevance (MMR)
```python
# Balance relevance with diversity
results = await vectorstore.amax_marginal_relevance_search(
@@ -405,6 +422,7 @@ results = await vectorstore.amax_marginal_relevance_search(
```
### 3. Reranking with Cross-Encoder
```python
from sentence_transformers import CrossEncoder
@@ -424,6 +442,7 @@ async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
```
### 4. Cohere Rerank
```python
from langchain.retrievers import CohereRerank
from langchain_cohere import CohereRerank
@@ -440,6 +459,7 @@ reranked_retriever = ContextualCompressionRetriever(
## Prompt Engineering for RAG
### Contextual Prompt with Citations
```python
rag_prompt = ChatPromptTemplate.from_template(
"""Answer the question based on the context below. Include citations using [1], [2], etc.
@@ -461,6 +481,7 @@ rag_prompt = ChatPromptTemplate.from_template(
```
### Structured Output for RAG
```python
from pydantic import BaseModel, Field

View File

@@ -20,12 +20,12 @@ Patterns for implementing efficient similarity search in production systems.
### 1. Distance Metrics
| Metric | Formula | Best For |
|--------|---------|----------|
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
| **Dot Product** | A·B | Magnitude matters |
| **Manhattan (L1)** | Σ|a-b| | Sparse vectors |
| Metric | Formula | Best For |
| ------------------ | ------------------ | --------------------- | --- | -------------- |
| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
| **Dot Product** | A·B | Magnitude matters |
| **Manhattan (L1)** | Σ | a-b | | Sparse vectors |
### 2. Index Types
@@ -538,6 +538,7 @@ class WeaviateVectorStore:
## Best Practices
### Do's
- **Use appropriate index** - HNSW for most cases
- **Tune parameters** - ef_search, nprobe for recall/speed
- **Implement hybrid search** - Combine with keyword search
@@ -545,6 +546,7 @@ class WeaviateVectorStore:
- **Pre-filter when possible** - Reduce search space
### Don'ts
- **Don't skip evaluation** - Measure before optimizing
- **Don't over-index** - Start with flat, scale up
- **Don't ignore latency** - P99 matters for UX

View File

@@ -31,11 +31,11 @@ Data Size Recommended Index
### 2. HNSW Parameters
| Parameter | Default | Effect |
|-----------|---------|--------|
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
| Parameter | Default | Effect |
| ------------------ | ------- | ---------------------------------------------------- |
| **M** | 16 | Connections per node, ↑ = better recall, more memory |
| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
### 3. Quantization Types
@@ -502,6 +502,7 @@ def profile_index_build(
## Best Practices
### Do's
- **Benchmark with real queries** - Synthetic may not represent production
- **Monitor recall continuously** - Can degrade with data drift
- **Start with defaults** - Tune only when needed
@@ -509,6 +510,7 @@ def profile_index_build(
- **Consider tiered storage** - Hot/cold data separation
### Don'ts
- **Don't over-optimize early** - Profile first
- **Don't ignore build time** - Index updates have cost
- **Don't forget reindexing** - Plan for maintenance