mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 17:47:16 +00:00
feat(llm-application-dev): modernize to LangGraph and latest models v2.0.0
- Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns - Update model references to Claude 4.5 and GPT-5.2 - Add Voyage AI as primary embedding recommendation - Add structured outputs with Pydantic - Replace deprecated initialize_agent() with StateGraph - Fix security: use AST-based safe math instead of unsafe execution - Add plugin.json and README.md for consistency - Bump marketplace version to 1.3.3
This commit is contained in:
@@ -23,187 +23,276 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
|
||||
**Purpose**: Store and retrieve document embeddings efficiently
|
||||
|
||||
**Options:**
|
||||
- **Pinecone**: Managed, scalable, fast queries
|
||||
- **Weaviate**: Open-source, hybrid search
|
||||
- **Pinecone**: Managed, scalable, serverless
|
||||
- **Weaviate**: Open-source, hybrid search, GraphQL
|
||||
- **Milvus**: High performance, on-premise
|
||||
- **Chroma**: Lightweight, easy to use
|
||||
- **Qdrant**: Fast, filtered search
|
||||
- **FAISS**: Meta's library, local deployment
|
||||
- **Chroma**: Lightweight, easy to use, local development
|
||||
- **Qdrant**: Fast, filtered search, Rust-based
|
||||
- **pgvector**: PostgreSQL extension, SQL integration
|
||||
|
||||
### 2. Embeddings
|
||||
**Purpose**: Convert text to numerical vectors for similarity search
|
||||
|
||||
**Models:**
|
||||
- **text-embedding-ada-002** (OpenAI): General purpose, 1536 dims
|
||||
- **all-MiniLM-L6-v2** (Sentence Transformers): Fast, lightweight
|
||||
- **e5-large-v2**: High quality, multilingual
|
||||
- **Instructor**: Task-specific instructions
|
||||
- **bge-large-en-v1.5**: SOTA performance
|
||||
**Models (2026):**
|
||||
| Model | Dimensions | Best For |
|
||||
|-------|------------|----------|
|
||||
| **voyage-3-large** | 1024 | Claude apps (Anthropic recommended) |
|
||||
| **voyage-code-3** | 1024 | Code search |
|
||||
| **text-embedding-3-large** | 3072 | OpenAI apps, high accuracy |
|
||||
| **text-embedding-3-small** | 1536 | OpenAI apps, cost-effective |
|
||||
| **bge-large-en-v1.5** | 1024 | Open source, local deployment |
|
||||
| **multilingual-e5-large** | 1024 | Multi-language support |
|
||||
|
||||
### 3. Retrieval Strategies
|
||||
**Approaches:**
|
||||
- **Dense Retrieval**: Semantic similarity via embeddings
|
||||
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
|
||||
- **Hybrid Search**: Combine dense + sparse
|
||||
- **Hybrid Search**: Combine dense + sparse with weighted fusion
|
||||
- **Multi-Query**: Generate multiple query variations
|
||||
- **HyDE**: Generate hypothetical documents
|
||||
- **HyDE**: Generate hypothetical documents for better retrieval
|
||||
|
||||
### 4. Reranking
|
||||
**Purpose**: Improve retrieval quality by reordering results
|
||||
|
||||
**Methods:**
|
||||
- **Cross-Encoders**: BERT-based reranking
|
||||
- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
|
||||
- **Cohere Rerank**: API-based reranking
|
||||
- **Maximal Marginal Relevance (MMR)**: Diversity + relevance
|
||||
- **LLM-based**: Use LLM to score relevance
|
||||
|
||||
## Quick Start
|
||||
## Quick Start with LangGraph
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DirectoryLoader
|
||||
from langchain.text_splitters import RecursiveCharacterTextSplitter
|
||||
from langchain.embeddings import OpenAIEmbeddings
|
||||
from langchain.vectorstores import Chroma
|
||||
from langchain.chains import RetrievalQA
|
||||
from langchain.llms import OpenAI
|
||||
from langgraph.graph import StateGraph, START, END
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_voyageai import VoyageAIEmbeddings
|
||||
from langchain_pinecone import PineconeVectorStore
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
from typing import TypedDict, Annotated
|
||||
|
||||
# 1. Load documents
|
||||
loader = DirectoryLoader('./docs', glob="**/*.txt")
|
||||
documents = loader.load()
|
||||
class RAGState(TypedDict):
|
||||
question: str
|
||||
context: list[Document]
|
||||
answer: str
|
||||
|
||||
# 2. Split into chunks
|
||||
text_splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=1000,
|
||||
chunk_overlap=200,
|
||||
length_function=len
|
||||
)
|
||||
chunks = text_splitter.split_documents(documents)
|
||||
# Initialize components
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
|
||||
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
|
||||
|
||||
# 3. Create embeddings and vector store
|
||||
embeddings = OpenAIEmbeddings()
|
||||
vectorstore = Chroma.from_documents(chunks, embeddings)
|
||||
# RAG prompt
|
||||
rag_prompt = ChatPromptTemplate.from_template(
|
||||
"""Answer based on the context below. If you cannot answer, say so.
|
||||
|
||||
# 4. Create retrieval chain
|
||||
qa_chain = RetrievalQA.from_chain_type(
|
||||
llm=OpenAI(),
|
||||
chain_type="stuff",
|
||||
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
|
||||
return_source_documents=True
|
||||
Context:
|
||||
{context}
|
||||
|
||||
Question: {question}
|
||||
|
||||
Answer:"""
|
||||
)
|
||||
|
||||
# 5. Query
|
||||
result = qa_chain({"query": "What are the main features?"})
|
||||
print(result['result'])
|
||||
print(result['source_documents'])
|
||||
async def retrieve(state: RAGState) -> RAGState:
|
||||
"""Retrieve relevant documents."""
|
||||
docs = await retriever.ainvoke(state["question"])
|
||||
return {"context": docs}
|
||||
|
||||
async def generate(state: RAGState) -> RAGState:
|
||||
"""Generate answer from context."""
|
||||
context_text = "\n\n".join(doc.page_content for doc in state["context"])
|
||||
messages = rag_prompt.format_messages(
|
||||
context=context_text,
|
||||
question=state["question"]
|
||||
)
|
||||
response = await llm.ainvoke(messages)
|
||||
return {"answer": response.content}
|
||||
|
||||
# Build RAG graph
|
||||
builder = StateGraph(RAGState)
|
||||
builder.add_node("retrieve", retrieve)
|
||||
builder.add_node("generate", generate)
|
||||
builder.add_edge(START, "retrieve")
|
||||
builder.add_edge("retrieve", "generate")
|
||||
builder.add_edge("generate", END)
|
||||
|
||||
rag_chain = builder.compile()
|
||||
|
||||
# Use
|
||||
result = await rag_chain.ainvoke({"question": "What are the main features?"})
|
||||
print(result["answer"])
|
||||
```
|
||||
|
||||
## Advanced RAG Patterns
|
||||
|
||||
### Pattern 1: Hybrid Search
|
||||
### Pattern 1: Hybrid Search with RRF
|
||||
|
||||
```python
|
||||
from langchain.retrievers import BM25Retriever, EnsembleRetriever
|
||||
from langchain_community.retrievers import BM25Retriever
|
||||
from langchain.retrievers import EnsembleRetriever
|
||||
|
||||
# Sparse retriever (BM25)
|
||||
bm25_retriever = BM25Retriever.from_documents(chunks)
|
||||
bm25_retriever.k = 5
|
||||
# Sparse retriever (BM25 for keyword matching)
|
||||
bm25_retriever = BM25Retriever.from_documents(documents)
|
||||
bm25_retriever.k = 10
|
||||
|
||||
# Dense retriever (embeddings)
|
||||
embedding_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
|
||||
# Dense retriever (embeddings for semantic search)
|
||||
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
|
||||
|
||||
# Combine with weights
|
||||
# Combine with Reciprocal Rank Fusion weights
|
||||
ensemble_retriever = EnsembleRetriever(
|
||||
retrievers=[bm25_retriever, embedding_retriever],
|
||||
weights=[0.3, 0.7]
|
||||
retrievers=[bm25_retriever, dense_retriever],
|
||||
weights=[0.3, 0.7] # 30% keyword, 70% semantic
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Multi-Query Retrieval
|
||||
|
||||
```python
|
||||
from langchain.retrievers.multi_query import MultiQueryRetriever
|
||||
|
||||
# Generate multiple query perspectives
|
||||
retriever = MultiQueryRetriever.from_llm(
|
||||
retriever=vectorstore.as_retriever(),
|
||||
llm=OpenAI()
|
||||
# Generate multiple query perspectives for better recall
|
||||
multi_query_retriever = MultiQueryRetriever.from_llm(
|
||||
retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
|
||||
llm=llm
|
||||
)
|
||||
|
||||
# Single query → multiple variations → combined results
|
||||
results = retriever.get_relevant_documents("What is the main topic?")
|
||||
results = await multi_query_retriever.ainvoke("What is the main topic?")
|
||||
```
|
||||
|
||||
### Pattern 3: Contextual Compression
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ContextualCompressionRetriever
|
||||
from langchain.retrievers.document_compressors import LLMChainExtractor
|
||||
|
||||
# Compressor extracts only relevant portions
|
||||
compressor = LLMChainExtractor.from_llm(llm)
|
||||
|
||||
compression_retriever = ContextualCompressionRetriever(
|
||||
base_compressor=compressor,
|
||||
base_retriever=vectorstore.as_retriever()
|
||||
base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10})
|
||||
)
|
||||
|
||||
# Returns only relevant parts of documents
|
||||
compressed_docs = compression_retriever.get_relevant_documents("query")
|
||||
compressed_docs = await compression_retriever.ainvoke("specific query")
|
||||
```
|
||||
|
||||
### Pattern 4: Parent Document Retriever
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ParentDocumentRetriever
|
||||
from langchain.storage import InMemoryStore
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
|
||||
# Small chunks for precise retrieval, large chunks for context
|
||||
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
|
||||
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
|
||||
|
||||
# Store for parent documents
|
||||
store = InMemoryStore()
|
||||
docstore = InMemoryStore()
|
||||
|
||||
# Small chunks for retrieval, large chunks for context
|
||||
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
|
||||
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
|
||||
|
||||
retriever = ParentDocumentRetriever(
|
||||
parent_retriever = ParentDocumentRetriever(
|
||||
vectorstore=vectorstore,
|
||||
docstore=store,
|
||||
docstore=docstore,
|
||||
child_splitter=child_splitter,
|
||||
parent_splitter=parent_splitter
|
||||
)
|
||||
|
||||
# Add documents (splits children, stores parents)
|
||||
await parent_retriever.aadd_documents(documents)
|
||||
|
||||
# Retrieval returns parent documents with full context
|
||||
results = await parent_retriever.ainvoke("query")
|
||||
```
|
||||
|
||||
### Pattern 5: HyDE (Hypothetical Document Embeddings)
|
||||
|
||||
```python
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
|
||||
class HyDEState(TypedDict):
|
||||
question: str
|
||||
hypothetical_doc: str
|
||||
context: list[Document]
|
||||
answer: str
|
||||
|
||||
hyde_prompt = ChatPromptTemplate.from_template(
|
||||
"""Write a detailed passage that would answer this question:
|
||||
|
||||
Question: {question}
|
||||
|
||||
Passage:"""
|
||||
)
|
||||
|
||||
async def generate_hypothetical(state: HyDEState) -> HyDEState:
|
||||
"""Generate hypothetical document for better retrieval."""
|
||||
messages = hyde_prompt.format_messages(question=state["question"])
|
||||
response = await llm.ainvoke(messages)
|
||||
return {"hypothetical_doc": response.content}
|
||||
|
||||
async def retrieve_with_hyde(state: HyDEState) -> HyDEState:
|
||||
"""Retrieve using hypothetical document."""
|
||||
# Use hypothetical doc for retrieval instead of original query
|
||||
docs = await retriever.ainvoke(state["hypothetical_doc"])
|
||||
return {"context": docs}
|
||||
|
||||
# Build HyDE RAG graph
|
||||
builder = StateGraph(HyDEState)
|
||||
builder.add_node("hypothetical", generate_hypothetical)
|
||||
builder.add_node("retrieve", retrieve_with_hyde)
|
||||
builder.add_node("generate", generate)
|
||||
builder.add_edge(START, "hypothetical")
|
||||
builder.add_edge("hypothetical", "retrieve")
|
||||
builder.add_edge("retrieve", "generate")
|
||||
builder.add_edge("generate", END)
|
||||
|
||||
hyde_rag = builder.compile()
|
||||
```
|
||||
|
||||
## Document Chunking Strategies
|
||||
|
||||
### Recursive Character Text Splitter
|
||||
```python
|
||||
from langchain.text_splitters import RecursiveCharacterTextSplitter
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
|
||||
splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=1000,
|
||||
chunk_overlap=200,
|
||||
length_function=len,
|
||||
separators=["\n\n", "\n", " ", ""] # Try these in order
|
||||
separators=["\n\n", "\n", ". ", " ", ""] # Try in order
|
||||
)
|
||||
|
||||
chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### Token-Based Splitting
|
||||
```python
|
||||
from langchain.text_splitters import TokenTextSplitter
|
||||
from langchain_text_splitters import TokenTextSplitter
|
||||
|
||||
splitter = TokenTextSplitter(
|
||||
chunk_size=512,
|
||||
chunk_overlap=50
|
||||
chunk_overlap=50,
|
||||
encoding_name="cl100k_base" # OpenAI tiktoken encoding
|
||||
)
|
||||
```
|
||||
|
||||
### Semantic Chunking
|
||||
```python
|
||||
from langchain.text_splitters import SemanticChunker
|
||||
from langchain_experimental.text_splitter import SemanticChunker
|
||||
|
||||
splitter = SemanticChunker(
|
||||
embeddings=OpenAIEmbeddings(),
|
||||
breakpoint_threshold_type="percentile"
|
||||
embeddings=embeddings,
|
||||
breakpoint_threshold_type="percentile",
|
||||
breakpoint_threshold_amount=95
|
||||
)
|
||||
```
|
||||
|
||||
### Markdown Header Splitter
|
||||
```python
|
||||
from langchain.text_splitters import MarkdownHeaderTextSplitter
|
||||
from langchain_text_splitters import MarkdownHeaderTextSplitter
|
||||
|
||||
headers_to_split_on = [
|
||||
("#", "Header 1"),
|
||||
@@ -211,36 +300,54 @@ headers_to_split_on = [
|
||||
("###", "Header 3"),
|
||||
]
|
||||
|
||||
splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
|
||||
splitter = MarkdownHeaderTextSplitter(
|
||||
headers_to_split_on=headers_to_split_on,
|
||||
strip_headers=False
|
||||
)
|
||||
```
|
||||
|
||||
## Vector Store Configurations
|
||||
|
||||
### Pinecone
|
||||
### Pinecone (Serverless)
|
||||
```python
|
||||
import pinecone
|
||||
from langchain.vectorstores import Pinecone
|
||||
from pinecone import Pinecone, ServerlessSpec
|
||||
from langchain_pinecone import PineconeVectorStore
|
||||
|
||||
pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
|
||||
# Initialize Pinecone client
|
||||
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
|
||||
|
||||
index = pinecone.Index("your-index-name")
|
||||
# Create index if needed
|
||||
if "my-index" not in pc.list_indexes().names():
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1024, # voyage-3-large dimensions
|
||||
metric="cosine",
|
||||
spec=ServerlessSpec(cloud="aws", region="us-east-1")
|
||||
)
|
||||
|
||||
vectorstore = Pinecone(index, embeddings.embed_query, "text")
|
||||
# Create vector store
|
||||
index = pc.Index("my-index")
|
||||
vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
|
||||
```
|
||||
|
||||
### Weaviate
|
||||
```python
|
||||
import weaviate
|
||||
from langchain.vectorstores import Weaviate
|
||||
from langchain_weaviate import WeaviateVectorStore
|
||||
|
||||
client = weaviate.Client("http://localhost:8080")
|
||||
client = weaviate.connect_to_local() # or connect_to_weaviate_cloud()
|
||||
|
||||
vectorstore = Weaviate(client, "Document", "content", embeddings)
|
||||
vectorstore = WeaviateVectorStore(
|
||||
client=client,
|
||||
index_name="Documents",
|
||||
text_key="content",
|
||||
embedding=embeddings
|
||||
)
|
||||
```
|
||||
|
||||
### Chroma (Local)
|
||||
### Chroma (Local Development)
|
||||
```python
|
||||
from langchain.vectorstores import Chroma
|
||||
from langchain_chroma import Chroma
|
||||
|
||||
vectorstore = Chroma(
|
||||
collection_name="my_collection",
|
||||
@@ -249,32 +356,47 @@ vectorstore = Chroma(
|
||||
)
|
||||
```
|
||||
|
||||
### pgvector (PostgreSQL)
|
||||
```python
|
||||
from langchain_postgres.vectorstores import PGVector
|
||||
|
||||
connection_string = "postgresql+psycopg://user:pass@localhost:5432/vectordb"
|
||||
|
||||
vectorstore = PGVector(
|
||||
embeddings=embeddings,
|
||||
collection_name="documents",
|
||||
connection=connection_string,
|
||||
)
|
||||
```
|
||||
|
||||
## Retrieval Optimization
|
||||
|
||||
### 1. Metadata Filtering
|
||||
```python
|
||||
from langchain_core.documents import Document
|
||||
|
||||
# Add metadata during indexing
|
||||
chunks_with_metadata = []
|
||||
for i, chunk in enumerate(chunks):
|
||||
chunk.metadata = {
|
||||
"source": chunk.metadata.get("source"),
|
||||
"page": i,
|
||||
"category": determine_category(chunk.page_content)
|
||||
}
|
||||
chunks_with_metadata.append(chunk)
|
||||
docs_with_metadata = []
|
||||
for doc in documents:
|
||||
doc.metadata.update({
|
||||
"source": doc.metadata.get("source", "unknown"),
|
||||
"category": determine_category(doc.page_content),
|
||||
"date": datetime.now().isoformat()
|
||||
})
|
||||
docs_with_metadata.append(doc)
|
||||
|
||||
# Filter during retrieval
|
||||
results = vectorstore.similarity_search(
|
||||
results = await vectorstore.asimilarity_search(
|
||||
"query",
|
||||
filter={"category": "technical"},
|
||||
k=5
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Maximal Marginal Relevance
|
||||
### 2. Maximal Marginal Relevance (MMR)
|
||||
```python
|
||||
# Balance relevance with diversity
|
||||
results = vectorstore.max_marginal_relevance_search(
|
||||
results = await vectorstore.amax_marginal_relevance_search(
|
||||
"query",
|
||||
k=5,
|
||||
fetch_k=20, # Fetch 20, return top 5 diverse
|
||||
@@ -288,116 +410,140 @@ from sentence_transformers import CrossEncoder
|
||||
|
||||
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
|
||||
|
||||
# Get initial results
|
||||
candidates = vectorstore.similarity_search("query", k=20)
|
||||
async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
|
||||
# Get initial results
|
||||
candidates = await vectorstore.asimilarity_search(query, k=20)
|
||||
|
||||
# Rerank
|
||||
pairs = [[query, doc.page_content] for doc in candidates]
|
||||
scores = reranker.predict(pairs)
|
||||
# Rerank
|
||||
pairs = [[query, doc.page_content] for doc in candidates]
|
||||
scores = reranker.predict(pairs)
|
||||
|
||||
# Sort by score and take top k
|
||||
reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)[:5]
|
||||
# Sort by score and take top k
|
||||
ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)
|
||||
return [doc for doc, score in ranked[:k]]
|
||||
```
|
||||
|
||||
### 4. Cohere Rerank
|
||||
```python
|
||||
from langchain.retrievers import CohereRerank
|
||||
from langchain_cohere import CohereRerank
|
||||
|
||||
reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)
|
||||
|
||||
# Wrap retriever with reranking
|
||||
reranked_retriever = ContextualCompressionRetriever(
|
||||
base_compressor=reranker,
|
||||
base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20})
|
||||
)
|
||||
```
|
||||
|
||||
## Prompt Engineering for RAG
|
||||
|
||||
### Contextual Prompt
|
||||
### Contextual Prompt with Citations
|
||||
```python
|
||||
prompt_template = """Use the following context to answer the question. If you cannot answer based on the context, say "I don't have enough information."
|
||||
rag_prompt = ChatPromptTemplate.from_template(
|
||||
"""Answer the question based on the context below. Include citations using [1], [2], etc.
|
||||
|
||||
Context:
|
||||
{context}
|
||||
If you cannot answer based on the context, say "I don't have enough information."
|
||||
|
||||
Question: {question}
|
||||
Context:
|
||||
{context}
|
||||
|
||||
Answer:"""
|
||||
Question: {question}
|
||||
|
||||
Instructions:
|
||||
1. Use only information from the context
|
||||
2. Cite sources with [1], [2] format
|
||||
3. If uncertain, express uncertainty
|
||||
|
||||
Answer (with citations):"""
|
||||
)
|
||||
```
|
||||
|
||||
### With Citations
|
||||
### Structured Output for RAG
|
||||
```python
|
||||
prompt_template = """Answer the question based on the context below. Include citations using [1], [2], etc.
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
Context:
|
||||
{context}
|
||||
class RAGResponse(BaseModel):
|
||||
answer: str = Field(description="The answer based on context")
|
||||
confidence: float = Field(description="Confidence score 0-1")
|
||||
sources: list[str] = Field(description="Source document IDs used")
|
||||
reasoning: str = Field(description="Brief reasoning for the answer")
|
||||
|
||||
Question: {question}
|
||||
|
||||
Answer (with citations):"""
|
||||
```
|
||||
|
||||
### With Confidence
|
||||
```python
|
||||
prompt_template = """Answer the question using the context. Provide a confidence score (0-100%) for your answer.
|
||||
|
||||
Context:
|
||||
{context}
|
||||
|
||||
Question: {question}
|
||||
|
||||
Answer:
|
||||
Confidence:"""
|
||||
# Use with structured output
|
||||
structured_llm = llm.with_structured_output(RAGResponse)
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
```python
|
||||
def evaluate_rag_system(qa_chain, test_cases):
|
||||
metrics = {
|
||||
'accuracy': [],
|
||||
'retrieval_quality': [],
|
||||
'groundedness': []
|
||||
}
|
||||
from typing import TypedDict
|
||||
|
||||
class RAGEvalMetrics(TypedDict):
|
||||
retrieval_precision: float # Relevant docs / retrieved docs
|
||||
retrieval_recall: float # Retrieved relevant / total relevant
|
||||
answer_relevance: float # Answer addresses question
|
||||
faithfulness: float # Answer grounded in context
|
||||
context_relevance: float # Context relevant to question
|
||||
|
||||
async def evaluate_rag_system(
|
||||
rag_chain,
|
||||
test_cases: list[dict]
|
||||
) -> RAGEvalMetrics:
|
||||
"""Evaluate RAG system on test cases."""
|
||||
metrics = {k: [] for k in RAGEvalMetrics.__annotations__}
|
||||
|
||||
for test in test_cases:
|
||||
result = qa_chain({"query": test['question']})
|
||||
result = await rag_chain.ainvoke({"question": test["question"]})
|
||||
|
||||
# Check if answer matches expected
|
||||
accuracy = calculate_accuracy(result['result'], test['expected'])
|
||||
metrics['accuracy'].append(accuracy)
|
||||
# Retrieval metrics
|
||||
retrieved_ids = {doc.metadata["id"] for doc in result["context"]}
|
||||
relevant_ids = set(test["relevant_doc_ids"])
|
||||
|
||||
# Check if relevant docs were retrieved
|
||||
retrieval_quality = evaluate_retrieved_docs(
|
||||
result['source_documents'],
|
||||
test['relevant_docs']
|
||||
precision = len(retrieved_ids & relevant_ids) / len(retrieved_ids)
|
||||
recall = len(retrieved_ids & relevant_ids) / len(relevant_ids)
|
||||
|
||||
metrics["retrieval_precision"].append(precision)
|
||||
metrics["retrieval_recall"].append(recall)
|
||||
|
||||
# Use LLM-as-judge for quality metrics
|
||||
quality = await evaluate_answer_quality(
|
||||
question=test["question"],
|
||||
answer=result["answer"],
|
||||
context=result["context"],
|
||||
expected=test.get("expected_answer")
|
||||
)
|
||||
metrics['retrieval_quality'].append(retrieval_quality)
|
||||
metrics["answer_relevance"].append(quality["relevance"])
|
||||
metrics["faithfulness"].append(quality["faithfulness"])
|
||||
metrics["context_relevance"].append(quality["context_relevance"])
|
||||
|
||||
# Check if answer is grounded in context
|
||||
groundedness = check_groundedness(
|
||||
result['result'],
|
||||
result['source_documents']
|
||||
)
|
||||
metrics['groundedness'].append(groundedness)
|
||||
|
||||
return {k: sum(v)/len(v) for k, v in metrics.items()}
|
||||
return {k: sum(v) / len(v) for k, v in metrics.items()}
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- **references/vector-databases.md**: Detailed comparison of vector DBs
|
||||
- **references/embeddings.md**: Embedding model selection guide
|
||||
- **references/retrieval-strategies.md**: Advanced retrieval techniques
|
||||
- **references/reranking.md**: Reranking methods and when to use them
|
||||
- **references/context-window.md**: Managing context limits
|
||||
- **assets/vector-store-config.yaml**: Configuration templates
|
||||
- **assets/retriever-pipeline.py**: Complete RAG pipeline
|
||||
- **assets/embedding-models.md**: Model comparison and benchmarks
|
||||
- [LangChain RAG Tutorial](https://python.langchain.com/docs/tutorials/rag/)
|
||||
- [LangGraph RAG Examples](https://langchain-ai.github.io/langgraph/tutorials/rag/)
|
||||
- [Pinecone Best Practices](https://docs.pinecone.io/guides/get-started/overview)
|
||||
- [Voyage AI Embeddings](https://docs.voyageai.com/)
|
||||
- [RAG Evaluation Guide](https://docs.ragas.io/)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Chunk Size**: Balance between context and specificity (500-1000 tokens)
|
||||
1. **Chunk Size**: Balance between context (larger) and specificity (smaller) - typically 500-1000 tokens
|
||||
2. **Overlap**: Use 10-20% overlap to preserve context at boundaries
|
||||
3. **Metadata**: Include source, page, timestamp for filtering and debugging
|
||||
4. **Hybrid Search**: Combine semantic and keyword search for best results
|
||||
5. **Reranking**: Improve top results with cross-encoder
|
||||
4. **Hybrid Search**: Combine semantic and keyword search for best recall
|
||||
5. **Reranking**: Use cross-encoder reranking for precision-critical applications
|
||||
6. **Citations**: Always return source documents for transparency
|
||||
7. **Evaluation**: Continuously test retrieval quality and answer accuracy
|
||||
8. **Monitoring**: Track retrieval metrics in production
|
||||
8. **Monitoring**: Track retrieval metrics and latency in production
|
||||
|
||||
## Common Issues
|
||||
|
||||
- **Poor Retrieval**: Check embedding quality, chunk size, query formulation
|
||||
- **Irrelevant Results**: Add metadata filtering, use hybrid search, rerank
|
||||
- **Missing Information**: Ensure documents are properly indexed
|
||||
- **Missing Information**: Ensure documents are properly indexed, check chunking
|
||||
- **Slow Queries**: Optimize vector store, use caching, reduce k
|
||||
- **Hallucinations**: Improve grounding prompt, add verification step
|
||||
- **Context Too Long**: Use compression or parent document retriever
|
||||
|
||||
Reference in New Issue
Block a user