feat(llm-application-dev): modernize to LangGraph and latest models v2.0.0

- Migrate from LangChain 0.x to LangChain 1.x/LangGraph patterns - Update model references to Claude 4.5 and GPT-5.2 - Add Voyage AI as primary embedding recommendation - Add structured outputs with Pydantic - Replace deprecated initialize_agent() with StateGraph - Fix security: use AST-based safe math instead of unsafe execution - Add plugin.json and README.md for consistency - Bump marketplace version to 1.3.3
2026-03-18 09:37:15 +00:00 · 2026-01-19 15:43:25 -05:00
parent e827cc713a
commit 8be0e8ac7a
12 changed files with 1940 additions and 708 deletions
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/SKILL.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/SKILL.md
@@ -16,6 +16,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 - Creating reusable prompt templates with variable interpolation
 - Debugging and refining prompts that produce inconsistent outputs
 - Implementing system prompts for specialized AI assistants
+- Using structured outputs (JSON mode) for reliable parsing

 ## Core Capabilities

@@ -33,21 +34,27 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 - Self-consistency techniques (sampling multiple reasoning paths)
 - Verification and validation steps

-### 3. Prompt Optimization
+### 3. Structured Outputs
+- JSON mode for reliable parsing
+- Pydantic schema enforcement
+- Type-safe response handling
+- Error handling for malformed outputs
+
+### 4. Prompt Optimization
 - Iterative refinement workflows
 - A/B testing prompt variations
 - Measuring prompt performance metrics (accuracy, consistency, latency)
 - Reducing token usage while maintaining quality
 - Handling edge cases and failure modes

-### 4. Template Systems
+### 5. Template Systems
 - Variable interpolation and formatting
 - Conditional prompt sections
 - Multi-turn conversation templates
 - Role-based prompt composition
 - Modular prompt components

-### 5. System Prompt Design
+### 6. System Prompt Design
 - Setting model behavior and constraints
 - Defining output formats and structure
 - Establishing role and expertise
@@ -57,68 +64,385 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 ## Quick Start

 ```python
-from prompt_optimizer import PromptTemplate, FewShotSelector
+from langchain_anthropic import ChatAnthropic
+from langchain_core.prompts import ChatPromptTemplate
+from pydantic import BaseModel, Field

-# Define a structured prompt template
-template = PromptTemplate(
-    system="You are an expert SQL developer. Generate efficient, secure SQL queries.",
-    instruction="Convert the following natural language query to SQL:\n{query}",
-    few_shot_examples=True,
-    output_format="SQL code block with explanatory comments"
-)
+# Define structured output schema
+class SQLQuery(BaseModel):
+    query: str = Field(description="The SQL query")
+    explanation: str = Field(description="Brief explanation of what the query does")
+    tables_used: list[str] = Field(description="List of tables referenced")

-# Configure few-shot learning
-selector = FewShotSelector(
-    examples_db="sql_examples.jsonl",
-    selection_strategy="semantic_similarity",
-    max_examples=3
-)
+# Initialize model with structured output
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+structured_llm = llm.with_structured_output(SQLQuery)

-# Generate optimized prompt
-prompt = template.render(
-    query="Find all users who registered in the last 30 days",
-    examples=selector.select(query="user registration date filter")
-)
+# Create prompt template
+prompt = ChatPromptTemplate.from_messages([
+    ("system", """You are an expert SQL developer. Generate efficient, secure SQL queries.
+    Always use parameterized queries to prevent SQL injection.
+    Explain your reasoning briefly."""),
+    ("user", "Convert this to SQL: {query}")
+])
+
+# Create chain
+chain = prompt | structured_llm
+
+# Use
+result = await chain.ainvoke({
+    "query": "Find all users who registered in the last 30 days"
+})
+print(result.query)
+print(result.explanation)
 ```

 ## Key Patterns

-### Progressive Disclosure
+### Pattern 1: Structured Output with Pydantic
+
+```python
+from anthropic import Anthropic
+from pydantic import BaseModel, Field
+from typing import Literal
+import json
+
+class SentimentAnalysis(BaseModel):
+    sentiment: Literal["positive", "negative", "neutral"]
+    confidence: float = Field(ge=0, le=1)
+    key_phrases: list[str]
+    reasoning: str
+
+async def analyze_sentiment(text: str) -> SentimentAnalysis:
+    """Analyze sentiment with structured output."""
+    client = Anthropic()
+
+    message = client.messages.create(
+        model="claude-sonnet-4-5",
+        max_tokens=500,
+        messages=[{
+            "role": "user",
+            "content": f"""Analyze the sentiment of this text.
+
+Text: {text}
+
+Respond with JSON matching this schema:
+{{
+    "sentiment": "positive" | "negative" | "neutral",
+    "confidence": 0.0-1.0,
+    "key_phrases": ["phrase1", "phrase2"],
+    "reasoning": "brief explanation"
+}}"""
+        }]
+    )
+
+    return SentimentAnalysis(**json.loads(message.content[0].text))
+```
+
+### Pattern 2: Chain-of-Thought with Self-Verification
+
+```python
+from langchain_core.prompts import ChatPromptTemplate
+
+cot_prompt = ChatPromptTemplate.from_template("""
+Solve this problem step by step.
+
+Problem: {problem}
+
+Instructions:
+1. Break down the problem into clear steps
+2. Work through each step showing your reasoning
+3. State your final answer
+4. Verify your answer by checking it against the original problem
+
+Format your response as:
+## Steps
+[Your step-by-step reasoning]
+
+## Answer
+[Your final answer]
+
+## Verification
+[Check that your answer is correct]
+""")
+```
+
+### Pattern 3: Few-Shot with Dynamic Example Selection
+
+```python
+from langchain_voyageai import VoyageAIEmbeddings
+from langchain_core.example_selectors import SemanticSimilarityExampleSelector
+from langchain_chroma import Chroma
+
+# Create example selector with semantic similarity
+example_selector = SemanticSimilarityExampleSelector.from_examples(
+    examples=[
+        {"input": "How do I reset my password?", "output": "Go to Settings > Security > Reset Password"},
+        {"input": "Where can I see my order history?", "output": "Navigate to Account > Orders"},
+        {"input": "How do I contact support?", "output": "Click Help > Contact Us or email support@example.com"},
+    ],
+    embeddings=VoyageAIEmbeddings(model="voyage-3-large"),
+    vectorstore_cls=Chroma,
+    k=2  # Select 2 most similar examples
+)
+
+async def get_few_shot_prompt(query: str) -> str:
+    """Build prompt with dynamically selected examples."""
+    examples = await example_selector.aselect_examples({"input": query})
+
+    examples_text = "\n".join(
+        f"User: {ex['input']}\nAssistant: {ex['output']}"
+        for ex in examples
+    )
+
+    return f"""You are a helpful customer support assistant.
+
+Here are some example interactions:
+{examples_text}
+
+Now respond to this query:
+User: {query}
+Assistant:"""
+```
+
+### Pattern 4: Progressive Disclosure
+
 Start with simple prompts, add complexity only when needed:

-1. **Level 1**: Direct instruction
-   - "Summarize this article"
+```python
+PROMPT_LEVELS = {
+    # Level 1: Direct instruction
+    "simple": "Summarize this article: {text}",

-2. **Level 2**: Add constraints
-   - "Summarize this article in 3 bullet points, focusing on key findings"
+    # Level 2: Add constraints
+    "constrained": """Summarize this article in 3 bullet points, focusing on:
+- Key findings
+- Main conclusions
+- Practical implications

-3. **Level 3**: Add reasoning
-   - "Read this article, identify the main findings, then summarize in 3 bullet points"
+Article: {text}""",

-4. **Level 4**: Add examples
-   - Include 2-3 example summaries with input-output pairs
+    # Level 3: Add reasoning
+    "reasoning": """Read this article carefully.
+1. First, identify the main topic and thesis
+2. Then, extract the key supporting points
+3. Finally, summarize in 3 bullet points

-### Instruction Hierarchy
-```
-[System Context] → [Task Instruction] → [Examples] → [Input Data] → [Output Format]
+Article: {text}
+
+Summary:""",
+
+    # Level 4: Add examples
+    "few_shot": """Read articles and provide concise summaries.
+
+Example:
+Article: "New research shows that regular exercise can reduce anxiety by up to 40%..."
+Summary:
+• Regular exercise reduces anxiety by up to 40%
+• 30 minutes of moderate activity 3x/week is sufficient
+• Benefits appear within 2 weeks of starting
+
+Now summarize this article:
+Article: {text}
+
+Summary:"""
+}
 ```

-### Error Recovery
-Build prompts that gracefully handle failures:
- Include fallback instructions
- Request confidence scores
- Ask for alternative interpretations when uncertain
- Specify how to indicate missing information
+### Pattern 5: Error Recovery and Fallback
+
+```python
+from pydantic import BaseModel, ValidationError
+import json
+
+class ResponseWithConfidence(BaseModel):
+    answer: str
+    confidence: float
+    sources: list[str]
+    alternative_interpretations: list[str] = []
+
+ERROR_RECOVERY_PROMPT = """
+Answer the question based on the context provided.
+
+Context: {context}
+Question: {question}
+
+Instructions:
+1. If you can answer confidently (>0.8), provide a direct answer
+2. If you're somewhat confident (0.5-0.8), provide your best answer with caveats
+3. If you're uncertain (<0.5), explain what information is missing
+4. Always provide alternative interpretations if the question is ambiguous
+
+Respond in JSON:
+{{
+    "answer": "your answer or 'I cannot determine this from the context'",
+    "confidence": 0.0-1.0,
+    "sources": ["relevant context excerpts"],
+    "alternative_interpretations": ["if question is ambiguous"]
+}}
+"""
+
+async def answer_with_fallback(
+    context: str,
+    question: str,
+    llm
+) -> ResponseWithConfidence:
+    """Answer with error recovery and fallback."""
+    prompt = ERROR_RECOVERY_PROMPT.format(context=context, question=question)
+
+    try:
+        response = await llm.ainvoke(prompt)
+        return ResponseWithConfidence(**json.loads(response.content))
+    except (json.JSONDecodeError, ValidationError) as e:
+        # Fallback: try to extract answer without structure
+        simple_prompt = f"Based on: {context}\n\nAnswer: {question}"
+        simple_response = await llm.ainvoke(simple_prompt)
+        return ResponseWithConfidence(
+            answer=simple_response.content,
+            confidence=0.5,
+            sources=["fallback extraction"],
+            alternative_interpretations=[]
+        )
+```
+
+### Pattern 6: Role-Based System Prompts
+
+```python
+SYSTEM_PROMPTS = {
+    "analyst": """You are a senior data analyst with expertise in SQL, Python, and business intelligence.
+
+Your responsibilities:
+- Write efficient, well-documented queries
+- Explain your analysis methodology
+- Highlight key insights and recommendations
+- Flag any data quality concerns
+
+Communication style:
+- Be precise and technical when discussing methodology
+- Translate technical findings into business impact
+- Use clear visualizations when helpful""",
+
+    "assistant": """You are a helpful AI assistant focused on accuracy and clarity.
+
+Core principles:
+- Always cite sources when making factual claims
+- Acknowledge uncertainty rather than guessing
+- Ask clarifying questions when the request is ambiguous
+- Provide step-by-step explanations for complex topics
+
+Constraints:
+- Do not provide medical, legal, or financial advice
+- Redirect harmful requests appropriately
+- Protect user privacy""",
+
+    "code_reviewer": """You are a senior software engineer conducting code reviews.
+
+Review criteria:
+- Correctness: Does the code work as intended?
+- Security: Are there any vulnerabilities?
+- Performance: Are there efficiency concerns?
+- Maintainability: Is the code readable and well-structured?
+- Best practices: Does it follow language idioms?
+
+Output format:
+1. Summary assessment (approve/request changes)
+2. Critical issues (must fix)
+3. Suggestions (nice to have)
+4. Positive feedback (what's done well)"""
+}
+```
+
+## Integration Patterns
+
+### With RAG Systems
+
+```python
+RAG_PROMPT = """You are a knowledgeable assistant that answers questions based on provided context.
+
+Context (retrieved from knowledge base):
+{context}
+
+Instructions:
+1. Answer ONLY based on the provided context
+2. If the context doesn't contain the answer, say "I don't have information about that in my knowledge base"
+3. Cite specific passages using [1], [2] notation
+4. If the question is ambiguous, ask for clarification
+
+Question: {question}
+
+Answer:"""
+```
+
+### With Validation and Verification
+
+```python
+VALIDATED_PROMPT = """Complete the following task:
+
+Task: {task}
+
+After generating your response, verify it meets ALL these criteria:
+✓ Directly addresses the original request
+✓ Contains no factual errors
+✓ Is appropriately detailed (not too brief, not too verbose)
+✓ Uses proper formatting
+✓ Is safe and appropriate
+
+If verification fails on any criterion, revise before responding.
+
+Response:"""
+```
+
+## Performance Optimization
+
+### Token Efficiency
+```python
+# Before: Verbose prompt (150+ tokens)
+verbose_prompt = """
+I would like you to please take the following text and provide me with a comprehensive
+summary of the main points. The summary should capture the key ideas and important details
+while being concise and easy to understand.
+"""
+
+# After: Concise prompt (30 tokens)
+concise_prompt = """Summarize the key points concisely:
+
+{text}
+
+Summary:"""
+```
+
+### Caching Common Prefixes
+
+```python
+from anthropic import Anthropic
+
+client = Anthropic()
+
+# Use prompt caching for repeated system prompts
+response = client.messages.create(
+    model="claude-sonnet-4-5",
+    max_tokens=1000,
+    system=[
+        {
+            "type": "text",
+            "text": LONG_SYSTEM_PROMPT,
+            "cache_control": {"type": "ephemeral"}
+        }
+    ],
+    messages=[{"role": "user", "content": user_query}]
+)
+```

 ## Best Practices

 1. **Be Specific**: Vague prompts produce inconsistent results
 2. **Show, Don't Tell**: Examples are more effective than descriptions
-3. **Test Extensively**: Evaluate on diverse, representative inputs
-4. **Iterate Rapidly**: Small changes can have large impacts
-5. **Monitor Performance**: Track metrics in production
-6. **Version Control**: Treat prompts as code with proper versioning
-7. **Document Intent**: Explain why prompts are structured as they are
+3. **Use Structured Outputs**: Enforce schemas with Pydantic for reliability
+4. **Test Extensively**: Evaluate on diverse, representative inputs
+5. **Iterate Rapidly**: Small changes can have large impacts
+6. **Monitor Performance**: Track metrics in production
+7. **Version Control**: Treat prompts as code with proper versioning
+8. **Document Intent**: Explain why prompts are structured as they are

 ## Common Pitfalls

@@ -127,60 +451,8 @@ Build prompts that gracefully handle failures:
 - **Context overflow**: Exceeding token limits with excessive examples
 - **Ambiguous instructions**: Leaving room for multiple interpretations
 - **Ignoring edge cases**: Not testing on unusual or boundary inputs
-
-## Integration Patterns
-
-### With RAG Systems
-```python
-# Combine retrieved context with prompt engineering
-prompt = f"""Given the following context:
-{retrieved_context}
-
-{few_shot_examples}
-
-Question: {user_question}
-
-Provide a detailed answer based solely on the context above. If the context doesn't contain enough information, explicitly state what's missing."""
-```
-
-### With Validation
-```python
-# Add self-verification step
-prompt = f"""{main_task_prompt}
-
-After generating your response, verify it meets these criteria:
-1. Answers the question directly
-2. Uses only information from provided context
-3. Cites specific sources
-4. Acknowledges any uncertainty
-
-If verification fails, revise your response."""
-```
-
-## Performance Optimization
-
-### Token Efficiency
- Remove redundant words and phrases
- Use abbreviations consistently after first definition
- Consolidate similar instructions
- Move stable content to system prompts
-
-### Latency Reduction
- Minimize prompt length without sacrificing quality
- Use streaming for long-form outputs
- Cache common prompt prefixes
- Batch similar requests when possible
-
-## Resources
-
- **references/few-shot-learning.md**: Deep dive on example selection and construction
- **references/chain-of-thought.md**: Advanced reasoning elicitation techniques
- **references/prompt-optimization.md**: Systematic refinement workflows
- **references/prompt-templates.md**: Reusable template patterns
- **references/system-prompts.md**: System-level prompt design
- **assets/prompt-template-library.md**: Battle-tested prompt templates
- **assets/few-shot-examples.json**: Curated example datasets
- **scripts/optimize-prompt.py**: Automated prompt optimization tool
+- **No error handling**: Assuming outputs will always be well-formed
+- **Hardcoded values**: Not parameterizing prompts for reuse

 ## Success Metrics

@@ -189,13 +461,12 @@ Track these KPIs for your prompts:
 - **Consistency**: Reproducibility across similar inputs
 - **Latency**: Response time (P50, P95, P99)
 - **Token Usage**: Average tokens per request
- **Success Rate**: Percentage of valid outputs
+- **Success Rate**: Percentage of valid, parseable outputs
 - **User Satisfaction**: Ratings and feedback

-## Next Steps
+## Resources

-1. Review the prompt template library for common patterns
-2. Experiment with few-shot learning for your specific use case
-3. Implement prompt versioning and A/B testing
-4. Set up automated evaluation pipelines
-5. Document your prompt engineering decisions and learnings
+- [Anthropic Prompt Engineering Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering)
+- [Claude Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
+- [OpenAI Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering)
+- [LangChain Prompts](https://python.langchain.com/docs/concepts/prompts/)