mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 09:37:15 +00:00
feat(llm-application-dev): modernize to LangGraph and latest models v2.0.0
- Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns - Update model references to Claude 4.5 and GPT-5.2 - Add Voyage AI as primary embedding recommendation - Add structured outputs with Pydantic - Replace deprecated initialize_agent() with StateGraph - Fix security: use AST-based safe math instead of unsafe execution - Add plugin.json and README.md for consistency - Bump marketplace version to 1.3.3
This commit is contained in:
@@ -16,6 +16,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Creating reusable prompt templates with variable interpolation
|
||||
- Debugging and refining prompts that produce inconsistent outputs
|
||||
- Implementing system prompts for specialized AI assistants
|
||||
- Using structured outputs (JSON mode) for reliable parsing
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
@@ -33,21 +34,27 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
- Self-consistency techniques (sampling multiple reasoning paths)
|
||||
- Verification and validation steps
|
||||
|
||||
### 3. Prompt Optimization
|
||||
### 3. Structured Outputs
|
||||
- JSON mode for reliable parsing
|
||||
- Pydantic schema enforcement
|
||||
- Type-safe response handling
|
||||
- Error handling for malformed outputs
|
||||
|
||||
### 4. Prompt Optimization
|
||||
- Iterative refinement workflows
|
||||
- A/B testing prompt variations
|
||||
- Measuring prompt performance metrics (accuracy, consistency, latency)
|
||||
- Reducing token usage while maintaining quality
|
||||
- Handling edge cases and failure modes
|
||||
|
||||
### 4. Template Systems
|
||||
### 5. Template Systems
|
||||
- Variable interpolation and formatting
|
||||
- Conditional prompt sections
|
||||
- Multi-turn conversation templates
|
||||
- Role-based prompt composition
|
||||
- Modular prompt components
|
||||
|
||||
### 5. System Prompt Design
|
||||
### 6. System Prompt Design
|
||||
- Setting model behavior and constraints
|
||||
- Defining output formats and structure
|
||||
- Establishing role and expertise
|
||||
@@ -57,68 +64,385 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from prompt_optimizer import PromptTemplate, FewShotSelector
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
# Define a structured prompt template
|
||||
template = PromptTemplate(
|
||||
system="You are an expert SQL developer. Generate efficient, secure SQL queries.",
|
||||
instruction="Convert the following natural language query to SQL:\n{query}",
|
||||
few_shot_examples=True,
|
||||
output_format="SQL code block with explanatory comments"
|
||||
)
|
||||
# Define structured output schema
|
||||
class SQLQuery(BaseModel):
|
||||
query: str = Field(description="The SQL query")
|
||||
explanation: str = Field(description="Brief explanation of what the query does")
|
||||
tables_used: list[str] = Field(description="List of tables referenced")
|
||||
|
||||
# Configure few-shot learning
|
||||
selector = FewShotSelector(
|
||||
examples_db="sql_examples.jsonl",
|
||||
selection_strategy="semantic_similarity",
|
||||
max_examples=3
|
||||
)
|
||||
# Initialize model with structured output
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
structured_llm = llm.with_structured_output(SQLQuery)
|
||||
|
||||
# Generate optimized prompt
|
||||
prompt = template.render(
|
||||
query="Find all users who registered in the last 30 days",
|
||||
examples=selector.select(query="user registration date filter")
|
||||
)
|
||||
# Create prompt template
|
||||
prompt = ChatPromptTemplate.from_messages([
|
||||
("system", """You are an expert SQL developer. Generate efficient, secure SQL queries.
|
||||
Always use parameterized queries to prevent SQL injection.
|
||||
Explain your reasoning briefly."""),
|
||||
("user", "Convert this to SQL: {query}")
|
||||
])
|
||||
|
||||
# Create chain
|
||||
chain = prompt | structured_llm
|
||||
|
||||
# Use
|
||||
result = await chain.ainvoke({
|
||||
"query": "Find all users who registered in the last 30 days"
|
||||
})
|
||||
print(result.query)
|
||||
print(result.explanation)
|
||||
```
|
||||
|
||||
## Key Patterns
|
||||
|
||||
### Progressive Disclosure
|
||||
### Pattern 1: Structured Output with Pydantic
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Literal
|
||||
import json
|
||||
|
||||
class SentimentAnalysis(BaseModel):
|
||||
sentiment: Literal["positive", "negative", "neutral"]
|
||||
confidence: float = Field(ge=0, le=1)
|
||||
key_phrases: list[str]
|
||||
reasoning: str
|
||||
|
||||
async def analyze_sentiment(text: str) -> SentimentAnalysis:
|
||||
"""Analyze sentiment with structured output."""
|
||||
client = Anthropic()
|
||||
|
||||
message = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
max_tokens=500,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": f"""Analyze the sentiment of this text.
|
||||
|
||||
Text: {text}
|
||||
|
||||
Respond with JSON matching this schema:
|
||||
{{
|
||||
"sentiment": "positive" | "negative" | "neutral",
|
||||
"confidence": 0.0-1.0,
|
||||
"key_phrases": ["phrase1", "phrase2"],
|
||||
"reasoning": "brief explanation"
|
||||
}}"""
|
||||
}]
|
||||
)
|
||||
|
||||
return SentimentAnalysis(**json.loads(message.content[0].text))
|
||||
```
|
||||
|
||||
### Pattern 2: Chain-of-Thought with Self-Verification
|
||||
|
||||
```python
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
|
||||
cot_prompt = ChatPromptTemplate.from_template("""
|
||||
Solve this problem step by step.
|
||||
|
||||
Problem: {problem}
|
||||
|
||||
Instructions:
|
||||
1. Break down the problem into clear steps
|
||||
2. Work through each step showing your reasoning
|
||||
3. State your final answer
|
||||
4. Verify your answer by checking it against the original problem
|
||||
|
||||
Format your response as:
|
||||
## Steps
|
||||
[Your step-by-step reasoning]
|
||||
|
||||
## Answer
|
||||
[Your final answer]
|
||||
|
||||
## Verification
|
||||
[Check that your answer is correct]
|
||||
""")
|
||||
```
|
||||
|
||||
### Pattern 3: Few-Shot with Dynamic Example Selection
|
||||
|
||||
```python
|
||||
from langchain_voyageai import VoyageAIEmbeddings
|
||||
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
|
||||
from langchain_chroma import Chroma
|
||||
|
||||
# Create example selector with semantic similarity
|
||||
example_selector = SemanticSimilarityExampleSelector.from_examples(
|
||||
examples=[
|
||||
{"input": "How do I reset my password?", "output": "Go to Settings > Security > Reset Password"},
|
||||
{"input": "Where can I see my order history?", "output": "Navigate to Account > Orders"},
|
||||
{"input": "How do I contact support?", "output": "Click Help > Contact Us or email support@example.com"},
|
||||
],
|
||||
embeddings=VoyageAIEmbeddings(model="voyage-3-large"),
|
||||
vectorstore_cls=Chroma,
|
||||
k=2 # Select 2 most similar examples
|
||||
)
|
||||
|
||||
async def get_few_shot_prompt(query: str) -> str:
|
||||
"""Build prompt with dynamically selected examples."""
|
||||
examples = await example_selector.aselect_examples({"input": query})
|
||||
|
||||
examples_text = "\n".join(
|
||||
f"User: {ex['input']}\nAssistant: {ex['output']}"
|
||||
for ex in examples
|
||||
)
|
||||
|
||||
return f"""You are a helpful customer support assistant.
|
||||
|
||||
Here are some example interactions:
|
||||
{examples_text}
|
||||
|
||||
Now respond to this query:
|
||||
User: {query}
|
||||
Assistant:"""
|
||||
```
|
||||
|
||||
### Pattern 4: Progressive Disclosure
|
||||
|
||||
Start with simple prompts, add complexity only when needed:
|
||||
|
||||
1. **Level 1**: Direct instruction
|
||||
- "Summarize this article"
|
||||
```python
|
||||
PROMPT_LEVELS = {
|
||||
# Level 1: Direct instruction
|
||||
"simple": "Summarize this article: {text}",
|
||||
|
||||
2. **Level 2**: Add constraints
|
||||
- "Summarize this article in 3 bullet points, focusing on key findings"
|
||||
# Level 2: Add constraints
|
||||
"constrained": """Summarize this article in 3 bullet points, focusing on:
|
||||
- Key findings
|
||||
- Main conclusions
|
||||
- Practical implications
|
||||
|
||||
3. **Level 3**: Add reasoning
|
||||
- "Read this article, identify the main findings, then summarize in 3 bullet points"
|
||||
Article: {text}""",
|
||||
|
||||
4. **Level 4**: Add examples
|
||||
- Include 2-3 example summaries with input-output pairs
|
||||
# Level 3: Add reasoning
|
||||
"reasoning": """Read this article carefully.
|
||||
1. First, identify the main topic and thesis
|
||||
2. Then, extract the key supporting points
|
||||
3. Finally, summarize in 3 bullet points
|
||||
|
||||
### Instruction Hierarchy
|
||||
```
|
||||
[System Context] → [Task Instruction] → [Examples] → [Input Data] → [Output Format]
|
||||
Article: {text}
|
||||
|
||||
Summary:""",
|
||||
|
||||
# Level 4: Add examples
|
||||
"few_shot": """Read articles and provide concise summaries.
|
||||
|
||||
Example:
|
||||
Article: "New research shows that regular exercise can reduce anxiety by up to 40%..."
|
||||
Summary:
|
||||
• Regular exercise reduces anxiety by up to 40%
|
||||
• 30 minutes of moderate activity 3x/week is sufficient
|
||||
• Benefits appear within 2 weeks of starting
|
||||
|
||||
Now summarize this article:
|
||||
Article: {text}
|
||||
|
||||
Summary:"""
|
||||
}
|
||||
```
|
||||
|
||||
### Error Recovery
|
||||
Build prompts that gracefully handle failures:
|
||||
- Include fallback instructions
|
||||
- Request confidence scores
|
||||
- Ask for alternative interpretations when uncertain
|
||||
- Specify how to indicate missing information
|
||||
### Pattern 5: Error Recovery and Fallback
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, ValidationError
|
||||
import json
|
||||
|
||||
class ResponseWithConfidence(BaseModel):
|
||||
answer: str
|
||||
confidence: float
|
||||
sources: list[str]
|
||||
alternative_interpretations: list[str] = []
|
||||
|
||||
ERROR_RECOVERY_PROMPT = """
|
||||
Answer the question based on the context provided.
|
||||
|
||||
Context: {context}
|
||||
Question: {question}
|
||||
|
||||
Instructions:
|
||||
1. If you can answer confidently (>0.8), provide a direct answer
|
||||
2. If you're somewhat confident (0.5-0.8), provide your best answer with caveats
|
||||
3. If you're uncertain (<0.5), explain what information is missing
|
||||
4. Always provide alternative interpretations if the question is ambiguous
|
||||
|
||||
Respond in JSON:
|
||||
{{
|
||||
"answer": "your answer or 'I cannot determine this from the context'",
|
||||
"confidence": 0.0-1.0,
|
||||
"sources": ["relevant context excerpts"],
|
||||
"alternative_interpretations": ["if question is ambiguous"]
|
||||
}}
|
||||
"""
|
||||
|
||||
async def answer_with_fallback(
|
||||
context: str,
|
||||
question: str,
|
||||
llm
|
||||
) -> ResponseWithConfidence:
|
||||
"""Answer with error recovery and fallback."""
|
||||
prompt = ERROR_RECOVERY_PROMPT.format(context=context, question=question)
|
||||
|
||||
try:
|
||||
response = await llm.ainvoke(prompt)
|
||||
return ResponseWithConfidence(**json.loads(response.content))
|
||||
except (json.JSONDecodeError, ValidationError) as e:
|
||||
# Fallback: try to extract answer without structure
|
||||
simple_prompt = f"Based on: {context}\n\nAnswer: {question}"
|
||||
simple_response = await llm.ainvoke(simple_prompt)
|
||||
return ResponseWithConfidence(
|
||||
answer=simple_response.content,
|
||||
confidence=0.5,
|
||||
sources=["fallback extraction"],
|
||||
alternative_interpretations=[]
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 6: Role-Based System Prompts
|
||||
|
||||
```python
|
||||
SYSTEM_PROMPTS = {
|
||||
"analyst": """You are a senior data analyst with expertise in SQL, Python, and business intelligence.
|
||||
|
||||
Your responsibilities:
|
||||
- Write efficient, well-documented queries
|
||||
- Explain your analysis methodology
|
||||
- Highlight key insights and recommendations
|
||||
- Flag any data quality concerns
|
||||
|
||||
Communication style:
|
||||
- Be precise and technical when discussing methodology
|
||||
- Translate technical findings into business impact
|
||||
- Use clear visualizations when helpful""",
|
||||
|
||||
"assistant": """You are a helpful AI assistant focused on accuracy and clarity.
|
||||
|
||||
Core principles:
|
||||
- Always cite sources when making factual claims
|
||||
- Acknowledge uncertainty rather than guessing
|
||||
- Ask clarifying questions when the request is ambiguous
|
||||
- Provide step-by-step explanations for complex topics
|
||||
|
||||
Constraints:
|
||||
- Do not provide medical, legal, or financial advice
|
||||
- Redirect harmful requests appropriately
|
||||
- Protect user privacy""",
|
||||
|
||||
"code_reviewer": """You are a senior software engineer conducting code reviews.
|
||||
|
||||
Review criteria:
|
||||
- Correctness: Does the code work as intended?
|
||||
- Security: Are there any vulnerabilities?
|
||||
- Performance: Are there efficiency concerns?
|
||||
- Maintainability: Is the code readable and well-structured?
|
||||
- Best practices: Does it follow language idioms?
|
||||
|
||||
Output format:
|
||||
1. Summary assessment (approve/request changes)
|
||||
2. Critical issues (must fix)
|
||||
3. Suggestions (nice to have)
|
||||
4. Positive feedback (what's done well)"""
|
||||
}
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### With RAG Systems
|
||||
|
||||
```python
|
||||
RAG_PROMPT = """You are a knowledgeable assistant that answers questions based on provided context.
|
||||
|
||||
Context (retrieved from knowledge base):
|
||||
{context}
|
||||
|
||||
Instructions:
|
||||
1. Answer ONLY based on the provided context
|
||||
2. If the context doesn't contain the answer, say "I don't have information about that in my knowledge base"
|
||||
3. Cite specific passages using [1], [2] notation
|
||||
4. If the question is ambiguous, ask for clarification
|
||||
|
||||
Question: {question}
|
||||
|
||||
Answer:"""
|
||||
```
|
||||
|
||||
### With Validation and Verification
|
||||
|
||||
```python
|
||||
VALIDATED_PROMPT = """Complete the following task:
|
||||
|
||||
Task: {task}
|
||||
|
||||
After generating your response, verify it meets ALL these criteria:
|
||||
✓ Directly addresses the original request
|
||||
✓ Contains no factual errors
|
||||
✓ Is appropriately detailed (not too brief, not too verbose)
|
||||
✓ Uses proper formatting
|
||||
✓ Is safe and appropriate
|
||||
|
||||
If verification fails on any criterion, revise before responding.
|
||||
|
||||
Response:"""
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Token Efficiency
|
||||
```python
|
||||
# Before: Verbose prompt (150+ tokens)
|
||||
verbose_prompt = """
|
||||
I would like you to please take the following text and provide me with a comprehensive
|
||||
summary of the main points. The summary should capture the key ideas and important details
|
||||
while being concise and easy to understand.
|
||||
"""
|
||||
|
||||
# After: Concise prompt (30 tokens)
|
||||
concise_prompt = """Summarize the key points concisely:
|
||||
|
||||
{text}
|
||||
|
||||
Summary:"""
|
||||
```
|
||||
|
||||
### Caching Common Prefixes
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic
|
||||
|
||||
client = Anthropic()
|
||||
|
||||
# Use prompt caching for repeated system prompts
|
||||
response = client.messages.create(
|
||||
model="claude-sonnet-4-5",
|
||||
max_tokens=1000,
|
||||
system=[
|
||||
{
|
||||
"type": "text",
|
||||
"text": LONG_SYSTEM_PROMPT,
|
||||
"cache_control": {"type": "ephemeral"}
|
||||
}
|
||||
],
|
||||
messages=[{"role": "user", "content": user_query}]
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Be Specific**: Vague prompts produce inconsistent results
|
||||
2. **Show, Don't Tell**: Examples are more effective than descriptions
|
||||
3. **Test Extensively**: Evaluate on diverse, representative inputs
|
||||
4. **Iterate Rapidly**: Small changes can have large impacts
|
||||
5. **Monitor Performance**: Track metrics in production
|
||||
6. **Version Control**: Treat prompts as code with proper versioning
|
||||
7. **Document Intent**: Explain why prompts are structured as they are
|
||||
3. **Use Structured Outputs**: Enforce schemas with Pydantic for reliability
|
||||
4. **Test Extensively**: Evaluate on diverse, representative inputs
|
||||
5. **Iterate Rapidly**: Small changes can have large impacts
|
||||
6. **Monitor Performance**: Track metrics in production
|
||||
7. **Version Control**: Treat prompts as code with proper versioning
|
||||
8. **Document Intent**: Explain why prompts are structured as they are
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
@@ -127,60 +451,8 @@ Build prompts that gracefully handle failures:
|
||||
- **Context overflow**: Exceeding token limits with excessive examples
|
||||
- **Ambiguous instructions**: Leaving room for multiple interpretations
|
||||
- **Ignoring edge cases**: Not testing on unusual or boundary inputs
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### With RAG Systems
|
||||
```python
|
||||
# Combine retrieved context with prompt engineering
|
||||
prompt = f"""Given the following context:
|
||||
{retrieved_context}
|
||||
|
||||
{few_shot_examples}
|
||||
|
||||
Question: {user_question}
|
||||
|
||||
Provide a detailed answer based solely on the context above. If the context doesn't contain enough information, explicitly state what's missing."""
|
||||
```
|
||||
|
||||
### With Validation
|
||||
```python
|
||||
# Add self-verification step
|
||||
prompt = f"""{main_task_prompt}
|
||||
|
||||
After generating your response, verify it meets these criteria:
|
||||
1. Answers the question directly
|
||||
2. Uses only information from provided context
|
||||
3. Cites specific sources
|
||||
4. Acknowledges any uncertainty
|
||||
|
||||
If verification fails, revise your response."""
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Token Efficiency
|
||||
- Remove redundant words and phrases
|
||||
- Use abbreviations consistently after first definition
|
||||
- Consolidate similar instructions
|
||||
- Move stable content to system prompts
|
||||
|
||||
### Latency Reduction
|
||||
- Minimize prompt length without sacrificing quality
|
||||
- Use streaming for long-form outputs
|
||||
- Cache common prompt prefixes
|
||||
- Batch similar requests when possible
|
||||
|
||||
## Resources
|
||||
|
||||
- **references/few-shot-learning.md**: Deep dive on example selection and construction
|
||||
- **references/chain-of-thought.md**: Advanced reasoning elicitation techniques
|
||||
- **references/prompt-optimization.md**: Systematic refinement workflows
|
||||
- **references/prompt-templates.md**: Reusable template patterns
|
||||
- **references/system-prompts.md**: System-level prompt design
|
||||
- **assets/prompt-template-library.md**: Battle-tested prompt templates
|
||||
- **assets/few-shot-examples.json**: Curated example datasets
|
||||
- **scripts/optimize-prompt.py**: Automated prompt optimization tool
|
||||
- **No error handling**: Assuming outputs will always be well-formed
|
||||
- **Hardcoded values**: Not parameterizing prompts for reuse
|
||||
|
||||
## Success Metrics
|
||||
|
||||
@@ -189,13 +461,12 @@ Track these KPIs for your prompts:
|
||||
- **Consistency**: Reproducibility across similar inputs
|
||||
- **Latency**: Response time (P50, P95, P99)
|
||||
- **Token Usage**: Average tokens per request
|
||||
- **Success Rate**: Percentage of valid outputs
|
||||
- **Success Rate**: Percentage of valid, parseable outputs
|
||||
- **User Satisfaction**: Ratings and feedback
|
||||
|
||||
## Next Steps
|
||||
## Resources
|
||||
|
||||
1. Review the prompt template library for common patterns
|
||||
2. Experiment with few-shot learning for your specific use case
|
||||
3. Implement prompt versioning and A/B testing
|
||||
4. Set up automated evaluation pipelines
|
||||
5. Document your prompt engineering decisions and learnings
|
||||
- [Anthropic Prompt Engineering Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering)
|
||||
- [Claude Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
|
||||
- [OpenAI Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering)
|
||||
- [LangChain Prompts](https://python.langchain.com/docs/concepts/prompts/)
|
||||
|
||||
Reference in New Issue
Block a user