style: format all files with prettier

2026-03-18 09:37:15 +00:00 · 2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions
--- a/plugins/llm-application-dev/README.md
+++ b/plugins/llm-application-dev/README.md
@@ -12,12 +12,14 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a
 ## Features

 ### Core Capabilities
+
 - **RAG Systems**: Production retrieval-augmented generation with hybrid search
 - **Vector Search**: Pinecone, Qdrant, Weaviate, Milvus, pgvector optimization
 - **Agent Architectures**: LangGraph-based agents with memory and tool use
 - **Prompt Engineering**: Advanced prompting techniques with model-specific optimization

 ### Key Technologies
+
 - LangChain 1.x / LangGraph for agent workflows
 - Voyage AI, OpenAI, and open-source embedding models
 - HNSW, IVF, and Product Quantization index strategies
@@ -25,31 +27,31 @@ Build production-ready LLM applications, advanced RAG systems, and intelligent a

 ## Agents

-| Agent | Description |
-|-------|-------------|
-| `ai-engineer` | Production-grade LLM applications, RAG systems, and agent architectures |
-| `prompt-engineer` | Advanced prompting techniques, constitutional AI, and model optimization |
+| Agent                      | Description                                                                |
+| -------------------------- | -------------------------------------------------------------------------- |
+| `ai-engineer`              | Production-grade LLM applications, RAG systems, and agent architectures    |
+| `prompt-engineer`          | Advanced prompting techniques, constitutional AI, and model optimization   |
 | `vector-database-engineer` | Vector search implementation, embedding strategies, and semantic retrieval |

 ## Skills

-| Skill | Description |
-|-------|-------------|
-| `langchain-architecture` | LangGraph StateGraph patterns, memory, and tool integration |
-| `rag-implementation` | RAG systems with hybrid search and reranking |
-| `llm-evaluation` | Evaluation frameworks for LLM applications |
-| `prompt-engineering-patterns` | Chain-of-thought, few-shot, and structured outputs |
-| `embedding-strategies` | Embedding model selection and optimization |
-| `similarity-search-patterns` | Vector similarity search implementation |
-| `vector-index-tuning` | HNSW, IVF, and quantization optimization |
-| `hybrid-search-implementation` | Vector + keyword search fusion |
+| Skill                          | Description                                                 |
+| ------------------------------ | ----------------------------------------------------------- |
+| `langchain-architecture`       | LangGraph StateGraph patterns, memory, and tool integration |
+| `rag-implementation`           | RAG systems with hybrid search and reranking                |
+| `llm-evaluation`               | Evaluation frameworks for LLM applications                  |
+| `prompt-engineering-patterns`  | Chain-of-thought, few-shot, and structured outputs          |
+| `embedding-strategies`         | Embedding model selection and optimization                  |
+| `similarity-search-patterns`   | Vector similarity search implementation                     |
+| `vector-index-tuning`          | HNSW, IVF, and quantization optimization                    |
+| `hybrid-search-implementation` | Vector + keyword search fusion                              |

 ## Commands

-| Command | Description |
-|---------|-------------|
-| `/llm-application-dev:langchain-agent` | Create LangGraph-based agent |
-| `/llm-application-dev:ai-assistant` | Build AI assistant application |
+| Command                                | Description                     |
+| -------------------------------------- | ------------------------------- |
+| `/llm-application-dev:langchain-agent` | Create LangGraph-based agent    |
+| `/llm-application-dev:ai-assistant`    | Build AI assistant application  |
 | `/llm-application-dev:prompt-optimize` | Optimize prompts for production |

 ## Installation
@@ -69,6 +71,7 @@ Or copy to your project's `.claude-plugin/` directory.
 ## Changelog

 ### 2.0.0 (January 2026)
+
 - **Breaking**: Migrated from LangChain 0.x to LangChain 1.x/LangGraph
 - **Breaking**: Updated model references to Claude 4.5 and GPT-5.2
 - Added Voyage AI as primary embedding recommendation for Claude apps
@@ -79,6 +82,7 @@ Or copy to your project's `.claude-plugin/` directory.
 - Updated hybrid search with modern Pinecone client API

 ### 1.2.2
+
 - Minor bug fixes and documentation updates

 ## License
--- a/plugins/llm-application-dev/agents/ai-engineer.md
+++ b/plugins/llm-application-dev/agents/ai-engineer.md
@@ -7,11 +7,13 @@ model: inherit
 You are an AI engineer specializing in production-grade LLM applications, generative AI systems, and intelligent agent architectures.

 ## Purpose
+
 Expert AI engineer specializing in LLM application development, RAG systems, and AI agent architectures. Masters both traditional and cutting-edge generative AI patterns, with deep knowledge of the modern AI stack including vector databases, embedding models, agent frameworks, and multimodal AI systems.

 ## Capabilities

 ### LLM Integration & Model Management
+
 - OpenAI GPT-5.2/GPT-5.2-mini with function calling and structured outputs
 - Anthropic Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5 with tool use and computer use
 - Open-source models: Llama 3.3, Mixtral 8x22B, Qwen 2.5, DeepSeek-V3
@@ -21,6 +23,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Cost optimization through model selection and caching strategies

 ### Advanced RAG Systems
+
 - Production RAG architectures with multi-stage retrieval pipelines
 - Vector databases: Pinecone, Qdrant, Weaviate, Chroma, Milvus, pgvector
 - Embedding models: Voyage AI voyage-3-large (recommended for Claude), OpenAI text-embedding-3-large/small, Cohere embed-v3, BGE-large
@@ -32,6 +35,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Advanced RAG patterns: GraphRAG, HyDE, RAG-Fusion, self-RAG

 ### Agent Frameworks & Orchestration
+
 - LangGraph (LangChain 1.x) for complex agent workflows with StateGraph and durable execution
 - LlamaIndex for data-centric AI applications and advanced retrieval
 - CrewAI for multi-agent collaboration and specialized agent roles
@@ -42,6 +46,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Agent evaluation and monitoring with LangSmith

 ### Vector Search & Embeddings
+
 - Embedding model selection and fine-tuning for domain-specific tasks
 - Vector indexing strategies: HNSW, IVF, LSH for different scale requirements
 - Similarity metrics: cosine, dot product, Euclidean for various use cases
@@ -50,6 +55,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Vector database optimization: indexing, sharding, and caching strategies

 ### Prompt Engineering & Optimization
+
 - Advanced prompting techniques: chain-of-thought, tree-of-thoughts, self-consistency
 - Few-shot and in-context learning optimization
 - Prompt templates with dynamic variable injection and conditioning
@@ -59,6 +65,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Multi-modal prompting for vision and audio models

 ### Production AI Systems
+
 - LLM serving with FastAPI, async processing, and load balancing
 - Streaming responses and real-time inference optimization
 - Caching strategies: semantic caching, response memoization, embedding caching
@@ -68,6 +75,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Observability: logging, metrics, tracing with LangSmith, Phoenix, Weights & Biases

 ### Multimodal AI Integration
+
 - Vision models: GPT-4V, Claude 4 Vision, LLaVA, CLIP for image understanding
 - Audio processing: Whisper for speech-to-text, ElevenLabs for text-to-speech
 - Document AI: OCR, table extraction, layout understanding with models like LayoutLM
@@ -75,6 +83,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Cross-modal embeddings and unified vector spaces

 ### AI Safety & Governance
+
 - Content moderation with OpenAI Moderation API and custom classifiers
 - Prompt injection detection and prevention strategies
 - PII detection and redaction in AI workflows
@@ -83,6 +92,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Responsible AI practices and ethical considerations

 ### Data Processing & Pipeline Management
+
 - Document processing: PDF extraction, web scraping, API integrations
 - Data preprocessing: cleaning, normalization, deduplication
 - Pipeline orchestration with Apache Airflow, Dagster, Prefect
@@ -91,6 +101,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - ETL/ELT processes for AI data preparation

 ### Integration & API Development
+
 - RESTful API design for AI services with FastAPI, Flask
 - GraphQL APIs for flexible AI data querying
 - Webhook integration and event-driven architectures
@@ -99,6 +110,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - API security: OAuth, JWT, API key management

 ## Behavioral Traits
+
 - Prioritizes production reliability and scalability over proof-of-concept implementations
 - Implements comprehensive error handling and graceful degradation
 - Focuses on cost optimization and efficient resource utilization
@@ -111,6 +123,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Balances cutting-edge techniques with proven, stable solutions

 ## Knowledge Base
+
 - Latest LLM developments and model capabilities (GPT-5.2, Claude 4.5, Llama 3.3)
 - Modern vector database architectures and optimization techniques
 - Production AI system design patterns and best practices
@@ -123,6 +136,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - Prompt engineering and optimization methodologies

 ## Response Approach
+
 1. **Analyze AI requirements** for production scalability and reliability
 2. **Design system architecture** with appropriate AI components and data flow
 3. **Implement production-ready code** with comprehensive error handling
@@ -133,6 +147,7 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 8. **Provide testing strategies** including adversarial and edge cases

 ## Example Interactions
+
 - "Build a production RAG system for enterprise knowledge base with hybrid search"
 - "Implement a multi-agent customer service system with escalation workflows"
 - "Design a cost-optimized LLM inference pipeline with caching and load balancing"
@@ -140,4 +155,4 @@ Expert AI engineer specializing in LLM application development, RAG systems, and
 - "Build an AI agent that can browse the web and perform research tasks"
 - "Implement semantic search with reranking for improved retrieval accuracy"
 - "Design an A/B testing framework for comparing different LLM prompts"
- "Create a real-time AI content moderation system with custom classifiers"
+- "Create a real-time AI content moderation system with custom classifiers"
--- a/plugins/llm-application-dev/agents/prompt-engineer.md
+++ b/plugins/llm-application-dev/agents/prompt-engineer.md
@@ -9,6 +9,7 @@ You are an expert prompt engineer specializing in crafting effective prompts for
 IMPORTANT: When creating prompts, ALWAYS display the complete prompt text in a clearly marked section. Never describe a prompt without showing it. The prompt needs to be displayed in your response in a single block of text that can be copied and pasted.

 ## Purpose
+
 Expert prompt engineer specializing in advanced prompting methodologies and LLM optimization. Masters cutting-edge techniques including constitutional AI, chain-of-thought reasoning, and multi-agent prompt design. Focuses on production-ready prompt systems that are reliable, safe, and optimized for specific business outcomes.

 ## Capabilities
@@ -16,6 +17,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 ### Advanced Prompting Techniques

 #### Chain-of-Thought & Reasoning
+
 - Chain-of-thought (CoT) prompting for complex reasoning tasks
 - Few-shot chain-of-thought with carefully crafted examples
 - Zero-shot chain-of-thought with "Let's think step by step"
@@ -25,6 +27,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Program-aided language models (PAL) for computational tasks

 #### Constitutional AI & Safety
+
 - Constitutional AI principles for self-correction and alignment
 - Critique and revise patterns for output improvement
 - Safety prompting techniques to prevent harmful outputs
@@ -34,6 +37,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Red teaming prompts for adversarial testing

 #### Meta-Prompting & Self-Improvement
+
 - Meta-prompting for prompt optimization and generation
 - Self-reflection and self-evaluation prompt patterns
 - Auto-prompting for dynamic prompt generation
@@ -45,6 +49,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 ### Model-Specific Optimization

 #### OpenAI Models (GPT-5.2, GPT-5.2-mini)
+
 - Function calling optimization and structured outputs
 - JSON mode utilization for reliable data extraction
 - System message design for consistent behavior
@@ -54,6 +59,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Image and multimodal prompt engineering

 #### Anthropic Claude (Claude Opus 4.5, Sonnet 4.5, Haiku 4.5)
+
 - Constitutional AI alignment with Claude's training
 - Tool use optimization for complex workflows
 - Computer use prompting for automation tasks
@@ -63,6 +69,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Safety considerations specific to Claude's capabilities

 #### Open Source Models (Llama, Mixtral, Qwen)
+
 - Model-specific prompt formatting and special tokens
 - Fine-tuning prompt strategies for domain adaptation
 - Instruction-following optimization for different architectures
@@ -74,6 +81,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 ### Production Prompt Systems

 #### Prompt Templates & Management
+
 - Dynamic prompt templating with variable injection
 - Conditional prompt logic based on context
 - Multi-language prompt adaptation and localization
@@ -83,6 +91,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Rollback strategies for prompt deployments

 #### RAG & Knowledge Integration
+
 - Retrieval-augmented generation prompt optimization
 - Context compression and relevance filtering
 - Query understanding and expansion prompts
@@ -92,6 +101,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Knowledge graph integration prompts

 #### Agent & Multi-Agent Prompting
+
 - Agent role definition and persona creation
 - Multi-agent collaboration and communication protocols
 - Task decomposition and workflow orchestration
@@ -103,6 +113,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 ### Specialized Applications

 #### Business & Enterprise
+
 - Customer service chatbot optimization
 - Sales and marketing copy generation
 - Legal document analysis and generation
@@ -112,6 +123,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Compliance and regulatory content generation

 #### Creative & Content
+
 - Creative writing and storytelling prompts
 - Content marketing and SEO optimization
 - Brand voice and tone consistency
@@ -121,6 +133,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Translation and localization prompts

 #### Technical & Code
+
 - Code generation and optimization prompts
 - Technical documentation and API documentation
 - Debugging and error analysis assistance
@@ -132,6 +145,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 ### Evaluation & Testing

 #### Performance Metrics
+
 - Task-specific accuracy and quality metrics
 - Response time and efficiency measurements
 - Cost optimization and token usage analysis
@@ -141,6 +155,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Edge case and robustness assessment

 #### Testing Methodologies
+
 - Red team testing for prompt vulnerabilities
 - Adversarial prompt testing and jailbreak attempts
 - Cross-model performance comparison
@@ -152,6 +167,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 ### Advanced Patterns & Architectures

 #### Prompt Chaining & Workflows
+
 - Sequential prompt chaining for complex tasks
 - Parallel prompt execution and result aggregation
 - Conditional branching based on intermediate outputs
@@ -161,6 +177,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Workflow optimization and performance tuning

 #### Multimodal & Cross-Modal
+
 - Vision-language model prompt optimization
 - Image understanding and analysis prompts
 - Document AI and OCR integration prompts
@@ -170,6 +187,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Multimodal creative and generative prompts

 ## Behavioral Traits
+
 - Always displays complete prompt text, never just descriptions
 - Focuses on production reliability and safety over experimental techniques
 - Considers token efficiency and cost optimization in all prompt designs
@@ -182,6 +200,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Emphasizes reproducibility and version control for prompt systems

 ## Knowledge Base
+
 - Latest research in prompt engineering and LLM optimization
 - Model-specific capabilities and limitations across providers
 - Production deployment patterns and best practices
@@ -194,6 +213,7 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 - Emerging trends in AI and prompt engineering

 ## Response Approach
+
 1. **Understand the specific use case** and requirements for the prompt
 2. **Analyze target model capabilities** and optimization opportunities
 3. **Design prompt architecture** with appropriate techniques and patterns
@@ -208,27 +228,32 @@ Expert prompt engineer specializing in advanced prompting methodologies and LLM
 When creating any prompt, you MUST include:

 ### The Prompt
+
 ```
 [Display the complete prompt text here - this is the most important part]
 ```

 ### Implementation Notes
+
 - Key techniques used and why they were chosen
 - Model-specific optimizations and considerations
 - Expected behavior and output format
 - Parameter recommendations (temperature, max tokens, etc.)

 ### Testing & Evaluation
+
 - Suggested test cases and evaluation metrics
 - Edge cases and potential failure modes
 - A/B testing recommendations for optimization

 ### Usage Guidelines
+
 - When and how to use this prompt effectively
 - Customization options and variable parameters
 - Integration considerations for production systems

 ## Example Interactions
+
 - "Create a constitutional AI prompt for content moderation that self-corrects problematic outputs"
 - "Design a chain-of-thought prompt for financial analysis that shows clear reasoning steps"
 - "Build a multi-agent prompt system for customer service with escalation workflows"
@@ -248,4 +273,4 @@ Verify you have:
 ☐ Included testing and evaluation recommendations
 ☐ Considered safety and ethical implications

-Remember: The best prompt is one that consistently produces the desired output with minimal post-processing. ALWAYS show the prompt, never just describe it.
+Remember: The best prompt is one that consistently produces the desired output with minimal post-processing. ALWAYS show the prompt, never just describe it.
--- a/plugins/llm-application-dev/agents/vector-database-engineer.md
+++ b/plugins/llm-application-dev/agents/vector-database-engineer.md
@@ -15,6 +15,7 @@ Specializes in designing and implementing production-grade vector search systems
 ## Capabilities

 ### Vector Database Selection & Architecture
+
 - **Pinecone**: Managed serverless, auto-scaling, metadata filtering
 - **Qdrant**: High-performance, Rust-based, complex filtering
 - **Weaviate**: GraphQL API, hybrid search, multi-tenancy
@@ -23,6 +24,7 @@ Specializes in designing and implementing production-grade vector search systems
 - **Chroma**: Lightweight, local development, embeddings built-in

 ### Embedding Model Selection
+
 - **Voyage AI**: voyage-3-large (recommended for Claude apps), voyage-code-3, voyage-finance-2, voyage-law-2
 - **OpenAI**: text-embedding-3-large (3072 dims), text-embedding-3-small (1536 dims)
 - **Open Source**: BGE-large-en-v1.5, E5-large-v2, multilingual-e5-large
@@ -30,6 +32,7 @@ Specializes in designing and implementing production-grade vector search systems
 - Domain-specific fine-tuning strategies

 ### Index Configuration & Optimization
+
 - **HNSW**: High recall, adjustable M and efConstruction parameters
 - **IVF**: Large-scale datasets, nlist/nprobe tuning
 - **Product Quantization (PQ)**: Memory optimization for billions of vectors
@@ -37,6 +40,7 @@ Specializes in designing and implementing production-grade vector search systems
 - Index selection based on recall/latency/memory tradeoffs

 ### Hybrid Search Implementation
+
 - Vector + BM25 keyword search fusion
 - Reciprocal Rank Fusion (RRF) scoring
 - Weighted combination strategies
@@ -44,6 +48,7 @@ Specializes in designing and implementing production-grade vector search systems
 - Reranking with cross-encoders

 ### Document Processing Pipeline
+
 - Chunking strategies: recursive, semantic, token-based
 - Metadata extraction and enrichment
 - Embedding batching and async processing
@@ -51,6 +56,7 @@ Specializes in designing and implementing production-grade vector search systems
 - Document versioning and deduplication

 ### Production Operations
+
 - Monitoring: latency percentiles, recall metrics
 - Scaling: sharding, replication, auto-scaling
 - Backup and disaster recovery
@@ -71,24 +77,28 @@ Specializes in designing and implementing production-grade vector search systems
 ## Best Practices

 ### Embedding Selection
+
 - Use Voyage AI for Claude-based applications (officially recommended by Anthropic)
 - Match embedding dimensions to use case (512-1024 for most, 3072 for maximum quality)
 - Consider domain-specific models for code, legal, finance
 - Test embedding quality on representative queries

 ### Chunking
+
 - Chunk size 500-1000 tokens for most use cases
 - 10-20% overlap to preserve context boundaries
 - Use semantic chunking for complex documents
 - Include metadata for filtering and debugging

 ### Index Tuning
+
 - Start with HNSW for most use cases (good recall/latency balance)
 - Use IVF+PQ for >10M vectors with memory constraints
 - Benchmark recall@10 vs latency for your specific queries
 - Monitor and re-tune as data grows

 ### Production
+
 - Implement metadata filtering to reduce search space
 - Cache frequent queries and embeddings
 - Plan for index rebuilding (blue-green deployments)
--- a/plugins/llm-application-dev/commands/ai-assistant.md
+++ b/plugins/llm-application-dev/commands/ai-assistant.md
--- a/plugins/llm-application-dev/commands/langchain-agent.md
+++ b/plugins/llm-application-dev/commands/langchain-agent.md
@@ -24,6 +24,7 @@ Build sophisticated AI agent system for: $ARGUMENTS
 ## Essential Architecture

 ### LangGraph State Management
+
 ```python
 from langgraph.graph import StateGraph, MessagesState, START, END
 from langgraph.prebuilt import create_react_agent
@@ -35,6 +36,7 @@ class AgentState(TypedDict):
 ```

 ### Model & Embeddings
+
 - **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
 - **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
 - **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)
@@ -84,6 +86,7 @@ base_retriever = vectorstore.as_retriever(
 ```

 ### Advanced RAG Patterns
+
 - **HyDE**: Generate hypothetical documents for better retrieval
 - **RAG Fusion**: Multiple query perspectives for comprehensive results
 - **Reranking**: Use Cohere Rerank for relevance optimization
@@ -117,6 +120,7 @@ tool = StructuredTool.from_function(
 ## Production Deployment

 ### FastAPI Server with Streaming
+
 ```python
 from fastapi import FastAPI
 from fastapi.responses import StreamingResponse
@@ -132,12 +136,14 @@ async def invoke_agent(request: AgentRequest):
 ```

 ### Monitoring & Observability
+
 - **LangSmith**: Trace all agent executions
 - **Prometheus**: Track metrics (requests, latency, errors)
 - **Structured Logging**: Use `structlog` for consistent logs
 - **Health Checks**: Validate LLM, tools, memory, and external services

 ### Optimization Strategies
+
 - **Caching**: Redis for response caching with TTL
 - **Connection Pooling**: Reuse vector DB connections
 - **Load Balancing**: Multiple agent workers with round-robin routing
@@ -165,6 +171,7 @@ results = await evaluate(
 ## Key Patterns

 ### State Graph Pattern
+
 ```python
 builder = StateGraph(MessagesState)
 builder.add_node("node1", node1_func)
@@ -176,6 +183,7 @@ agent = builder.compile(checkpointer=checkpointer)
 ```

 ### Async Pattern
+
 ```python
 async def process_request(message: str, session_id: str):
    result = await agent.ainvoke(
@@ -186,6 +194,7 @@ async def process_request(message: str, session_id: str):
 ```

 ### Error Handling Pattern
+
 ```python
 from tenacity import retry, stop_after_attempt, wait_exponential

--- a/plugins/llm-application-dev/commands/prompt-optimize.md
+++ b/plugins/llm-application-dev/commands/prompt-optimize.md
@@ -22,12 +22,14 @@ $ARGUMENTS
 Evaluate the prompt across key dimensions:

 **Assessment Framework**
+
 - Clarity score (1-10) and ambiguity points
 - Structure: logical flow and section boundaries
 - Model alignment: capability utilization and token efficiency
 - Performance: success rate, failure modes, edge case handling

 **Decomposition**
+
 - Core objective and constraints
 - Output format requirements
 - Explicit vs implicit expectations
@@ -36,6 +38,7 @@ Evaluate the prompt across key dimensions:
 ### 2. Apply Chain-of-Thought Enhancement

 **Standard CoT Pattern**
+
 ```python
 # Before: Simple instruction
 prompt = "Analyze this customer feedback and determine sentiment"
@@ -56,11 +59,13 @@ Step 1 - Key emotional phrases:
 ```

 **Zero-Shot CoT**
+
 ```python
 enhanced = original + "\n\nLet's approach this step-by-step, breaking down the problem into smaller components and reasoning through each carefully."
 ```

 **Tree-of-Thoughts**
+
 ```python
 tot_prompt = """
 Explore multiple solution paths:
@@ -79,6 +84,7 @@ Select best approach and implement.
 ### 3. Implement Few-Shot Learning

 **Strategic Example Selection**
+
 ```python
 few_shot = """
 Example 1 (Simple case):
@@ -100,6 +106,7 @@ Now apply to: {actual_input}
 ### 4. Apply Constitutional AI Patterns

 **Self-Critique Loop**
+
 ```python
 constitutional = """
 {initial_instruction}
@@ -119,7 +126,8 @@ Final Response: [Refined]
 ### 5. Model-Specific Optimization

 **GPT-5.2**
-```python
+
+````python
 gpt5_optimized = """
 ##CONTEXT##
 {structured_context}
@@ -134,12 +142,13 @@ gpt5_optimized = """
 ##OUTPUT FORMAT##
 ```json
 {"structured": "response"}
-```
+````

 ##EXAMPLES##
 {few_shot_examples}
 """
-```
+
+````

 **Claude 4.5/4**
 ```python
@@ -162,9 +171,10 @@ claude_optimized = """
 {xml_structured_response}
 </output_format>
 """
-```
+````

 **Gemini Pro/Ultra**
+
 ```python
 gemini_optimized = """
 **System Context:** {background}
@@ -188,6 +198,7 @@ gemini_optimized = """
 ### 6. RAG Integration

 **RAG-Optimized Prompt**
+
 ```python
 rag_prompt = """
 ## Context Documents
@@ -210,6 +221,7 @@ Example: "Based on [Source 1], {answer}. [Source 3] corroborates: {detail}. No i
 ### 7. Evaluation Framework

 **Testing Protocol**
+
 ```python
 evaluation = """
 ## Test Cases (20 total)
@@ -227,6 +239,7 @@ evaluation = """
 ```

 **LLM-as-Judge**
+
 ```python
 judge_prompt = """
 Evaluate AI response quality.
@@ -252,6 +265,7 @@ Recommendation: Accept/Revise/Reject
 ### 8. Production Deployment

 **Prompt Versioning**
+
 ```python
 class PromptVersion:
    def __init__(self, base_prompt):
@@ -270,6 +284,7 @@ class PromptVersion:
 ```

 **Error Handling**
+
 ```python
 robust_prompt = """
 {main_instruction}
@@ -291,15 +306,18 @@ Provide partial solution with boundaries and next steps if full task cannot be c
 ### Example 1: Customer Support

 **Before**
+
 ```
 Answer customer questions about our product.
 ```

 **After**
-```markdown
+
+````markdown
 You are a senior customer support specialist for TechCorp with 5+ years experience.

 ## Context
+
 - Product: {product_name}
 - Customer Tier: {tier}
 - Issue Category: {category}
@@ -307,9 +325,11 @@ You are a senior customer support specialist for TechCorp with 5+ years experien
 ## Framework

 ### 1. Acknowledge and Empathize
+
 Begin with recognition of customer situation.

 ### 2. Diagnostic Reasoning
+
 <thinking>
 1. Identify core issue
 2. Consider common causes
@@ -318,23 +338,27 @@ Begin with recognition of customer situation.
 </thinking>

 ### 3. Solution Delivery
+
 - Immediate fix (if available)
 - Step-by-step instructions
 - Alternative approaches
 - Escalation path

 ### 4. Verification
+
 - Confirm understanding
 - Provide resources
 - Set next steps

 ## Constraints
+
 - Under 200 words unless technical
 - Professional yet friendly tone
 - Always provide ticket number
 - Escalate if unsure

 ## Format
+
 ```json
 {
  "greeting": "...",
@@ -343,14 +367,18 @@ Begin with recognition of customer situation.
  "follow_up": "..."
 }
 ```
+````
+
 ```

 ### Example 2: Data Analysis

 **Before**
 ```
+
 Analyze this sales data and provide insights.
-```
+
+````

 **After**
 ```python
@@ -404,16 +432,20 @@ recommendations:
  immediate: []
  short_term: []
  long_term: []
-```
+````
+
 """
+
 ```

 ### Example 3: Code Generation

 **Before**
 ```
+
 Write a Python function to process user data.
-```
+
+````

 **After**
 ```python
@@ -473,15 +505,17 @@ def process_user_data(raw_data: Dict[str, Any]) -> Union[ProcessedUser, Dict[str
        name=sanitize_string(raw_data['name'], 100),
        metadata={k: v for k, v in raw_data.items() if k not in required}
    )
-```
+````

 ### Self-Review
+
 ✓ Input validation and sanitization
 ✓ Injection prevention
 ✓ Error handling
 ✓ Performance: O(n) complexity
 """
-```
+
+````

 ### Example 4: Meta-Prompt Generator

@@ -530,18 +564,20 @@ ELSE: APPLY hybrid
 Overall: []/50
 Recommendation: use_as_is | iterate | redesign
 """
-```
+````

 ## Output Format

 Deliver comprehensive optimization report:

 ### Optimized Prompt
+
 ```markdown
 [Complete production-ready prompt with all enhancements]
 ```

 ### Optimization Report
+
 ```yaml
 analysis:
  original_assessment:
@@ -583,6 +619,7 @@ next_steps:
 ```

 ### Usage Guidelines
+
 1. **Implementation**: Use optimized prompt exactly
 2. **Parameters**: Apply recommended settings
 3. **Testing**: Run test cases before production
--- a/plugins/llm-application-dev/skills/embedding-strategies/SKILL.md
+++ b/plugins/llm-application-dev/skills/embedding-strategies/SKILL.md
@@ -20,18 +20,18 @@ Guide to selecting and optimizing embedding models for vector search application

 ### 1. Embedding Model Comparison (2026)

-| Model | Dimensions | Max Tokens | Best For |
-|-------|------------|------------|----------|
-| **voyage-3-large** | 1024 | 32000 | Claude apps (Anthropic recommended) |
-| **voyage-3** | 1024 | 32000 | Claude apps, cost-effective |
-| **voyage-code-3** | 1024 | 32000 | Code search |
-| **voyage-finance-2** | 1024 | 32000 | Financial documents |
-| **voyage-law-2** | 1024 | 32000 | Legal documents |
-| **text-embedding-3-large** | 3072 | 8191 | OpenAI apps, high accuracy |
-| **text-embedding-3-small** | 1536 | 8191 | OpenAI apps, cost-effective |
-| **bge-large-en-v1.5** | 1024 | 512 | Open source, local deployment |
-| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |
-| **multilingual-e5-large** | 1024 | 512 | Multi-language |
+| Model                      | Dimensions | Max Tokens | Best For                            |
+| -------------------------- | ---------- | ---------- | ----------------------------------- |
+| **voyage-3-large**         | 1024       | 32000      | Claude apps (Anthropic recommended) |
+| **voyage-3**               | 1024       | 32000      | Claude apps, cost-effective         |
+| **voyage-code-3**          | 1024       | 32000      | Code search                         |
+| **voyage-finance-2**       | 1024       | 32000      | Financial documents                 |
+| **voyage-law-2**           | 1024       | 32000      | Legal documents                     |
+| **text-embedding-3-large** | 3072       | 8191       | OpenAI apps, high accuracy          |
+| **text-embedding-3-small** | 1536       | 8191       | OpenAI apps, cost-effective         |
+| **bge-large-en-v1.5**      | 1024       | 512        | Open source, local deployment       |
+| **all-MiniLM-L6-v2**       | 384        | 256        | Fast, lightweight                   |
+| **multilingual-e5-large**  | 1024       | 512        | Multi-language                      |

 ### 2. Embedding Pipeline

@@ -583,6 +583,7 @@ def compare_embedding_models(
 ## Best Practices

 ### Do's
+
 - **Match model to use case**: Code vs prose vs multilingual
 - **Chunk thoughtfully**: Preserve semantic boundaries
 - **Normalize embeddings**: For cosine similarity search
@@ -591,6 +592,7 @@ def compare_embedding_models(
 - **Use Voyage AI for Claude apps**: Recommended by Anthropic

 ### Don'ts
+
 - **Don't ignore token limits**: Truncation loses information
 - **Don't mix embedding models**: Incompatible vector spaces
 - **Don't skip preprocessing**: Garbage in, garbage out
--- a/plugins/llm-application-dev/skills/hybrid-search-implementation/SKILL.md
+++ b/plugins/llm-application-dev/skills/hybrid-search-implementation/SKILL.md
@@ -27,12 +27,12 @@ Query → ┬─► Vector Search ──► Candidates ─┐

 ### 2. Fusion Methods

-| Method | Description | Best For |
-|--------|-------------|----------|
-| **RRF** | Reciprocal Rank Fusion | General purpose |
-| **Linear** | Weighted sum of scores | Tunable balance |
+| Method            | Description              | Best For        |
+| ----------------- | ------------------------ | --------------- |
+| **RRF**           | Reciprocal Rank Fusion   | General purpose |
+| **Linear**        | Weighted sum of scores   | Tunable balance |
 | **Cross-encoder** | Rerank with neural model | Highest quality |
-| **Cascade** | Filter then rerank | Efficiency |
+| **Cascade**       | Filter then rerank       | Efficiency      |

 ## Templates

@@ -549,6 +549,7 @@ class HybridRAGPipeline:
 ## Best Practices

 ### Do's
+
 - **Tune weights empirically** - Test on your data
 - **Use RRF for simplicity** - Works well without tuning
 - **Add reranking** - Significant quality improvement
@@ -556,6 +557,7 @@ class HybridRAGPipeline:
 - **A/B test** - Measure real user impact

 ### Don'ts
+
 - **Don't assume one size fits all** - Different queries need different weights
 - **Don't skip keyword search** - Handles exact matches better
 - **Don't over-fetch** - Balance recall vs latency
--- a/plugins/llm-application-dev/skills/langchain-architecture/SKILL.md
+++ b/plugins/llm-application-dev/skills/langchain-architecture/SKILL.md
@@ -33,9 +33,11 @@ langchain-pinecone        # Pinecone vector store
 ## Core Concepts

 ### 1. LangGraph Agents
+
 LangGraph is the standard for building agents in 2026. It provides:

 **Key Features:**
+
 - **StateGraph**: Explicit state management with typed state
 - **Durable Execution**: Agents persist through failures
 - **Human-in-the-Loop**: Inspect and modify state at any point
@@ -43,12 +45,14 @@ LangGraph is the standard for building agents in 2026. It provides:
 - **Checkpointing**: Save and resume agent state

 **Agent Patterns:**
+
 - **ReAct**: Reasoning + Acting with `create_react_agent`
 - **Plan-and-Execute**: Separate planning and execution nodes
 - **Multi-Agent**: Supervisor routing between specialized agents
 - **Tool-Calling**: Structured tool invocation with Pydantic schemas

 ### 2. State Management
+
 LangGraph uses TypedDict for explicit state:

 ```python
@@ -69,6 +73,7 @@ class CustomState(TypedDict):
 ```

 ### 3. Memory Systems
+
 Modern memory implementations:

 - **ConversationBufferMemory**: Stores all messages (short conversations)
@@ -78,15 +83,18 @@ Modern memory implementations:
 - **LangGraph Checkpointers**: Persistent state across sessions

 ### 4. Document Processing
+
 Loading, transforming, and storing documents:

 **Components:**
+
 - **Document Loaders**: Load from various sources
 - **Text Splitters**: Chunk documents intelligently
 - **Vector Stores**: Store and retrieve embeddings
 - **Retrievers**: Fetch relevant documents

 ### 5. Callbacks & Tracing
+
 LangSmith is the standard for observability:

 - Request/response logging
--- a/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md
+++ b/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md
@@ -20,9 +20,11 @@ Master comprehensive evaluation strategies for LLM applications, from automated
 ## Core Evaluation Types

 ### 1. Automated Metrics
+
 Fast, repeatable, scalable evaluation using computed scores.

 **Text Generation:**
+
 - **BLEU**: N-gram overlap (translation)
 - **ROUGE**: Recall-oriented (summarization)
 - **METEOR**: Semantic similarity
@@ -30,21 +32,25 @@ Fast, repeatable, scalable evaluation using computed scores.
 - **Perplexity**: Language model confidence

 **Classification:**
+
 - **Accuracy**: Percentage correct
 - **Precision/Recall/F1**: Class-specific performance
 - **Confusion Matrix**: Error patterns
 - **AUC-ROC**: Ranking quality

 **Retrieval (RAG):**
+
 - **MRR**: Mean Reciprocal Rank
 - **NDCG**: Normalized Discounted Cumulative Gain
 - **Precision@K**: Relevant in top K
 - **Recall@K**: Coverage in top K

 ### 2. Human Evaluation
+
 Manual assessment for quality aspects difficult to automate.

 **Dimensions:**
+
 - **Accuracy**: Factual correctness
 - **Coherence**: Logical flow
 - **Relevance**: Answers the question
@@ -53,9 +59,11 @@ Manual assessment for quality aspects difficult to automate.
 - **Helpfulness**: Useful to the user

 ### 3. LLM-as-Judge
+
 Use stronger LLMs to evaluate weaker model outputs.

 **Approaches:**
+
 - **Pointwise**: Score individual responses
 - **Pairwise**: Compare two responses
 - **Reference-based**: Compare to gold standard
@@ -134,6 +142,7 @@ results = await suite.evaluate(model=your_model, test_cases=test_cases)
 ## Automated Metrics Implementation

 ### BLEU Score
+
 ```python
 from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

@@ -149,6 +158,7 @@ def calculate_bleu(reference: str, hypothesis: str, **kwargs) -> float:
 ```

 ### ROUGE Score
+
 ```python
 from rouge_score import rouge_scorer

@@ -168,6 +178,7 @@ def calculate_rouge(reference: str, hypothesis: str, **kwargs) -> dict:
 ```

 ### BERTScore
+
 ```python
 from bert_score import score

@@ -192,6 +203,7 @@ def calculate_bertscore(
 ```

 ### Custom Metrics
+
 ```python
 def calculate_groundedness(response: str, context: str, **kwargs) -> float:
    """Check if response is grounded in provided context."""
@@ -232,6 +244,7 @@ def calculate_factuality(claim: str, sources: list[str], **kwargs) -> float:
 ## LLM-as-Judge Patterns

 ### Single Output Evaluation
+
 ```python
 from anthropic import Anthropic
 from pydantic import BaseModel, Field
@@ -280,6 +293,7 @@ Provide ratings in JSON format:
 ```

 ### Pairwise Comparison
+
 ```python
 from pydantic import BaseModel, Field
 from typing import Literal
@@ -324,6 +338,7 @@ Answer with JSON:
 ```

 ### Reference-Based Evaluation
+
 ```python
 class ReferenceEvaluation(BaseModel):
    semantic_similarity: float = Field(ge=0, le=1)
@@ -371,6 +386,7 @@ Respond in JSON:
 ## Human Evaluation Frameworks

 ### Annotation Guidelines
+
 ```python
 from dataclasses import dataclass, field
 from typing import Optional
@@ -412,6 +428,7 @@ class AnnotationTask:
 ```

 ### Inter-Rater Agreement
+
 ```python
 from sklearn.metrics import cohen_kappa_score

@@ -444,6 +461,7 @@ def calculate_agreement(
 ## A/B Testing

 ### Statistical Testing Framework
+
 ```python
 from scipy import stats
 import numpy as np
@@ -504,6 +522,7 @@ class ABTest:
 ## Regression Testing

 ### Regression Detection
+
 ```python
 from dataclasses import dataclass

@@ -595,6 +614,7 @@ print(f"Mean score: {experiment_results.aggregate_metrics['qa']['mean']}")
 ## Benchmarking

 ### Running Benchmarks
+
 ```python
 from dataclasses import dataclass
 import numpy as np
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/SKILL.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/SKILL.md
@@ -21,6 +21,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 ## Core Capabilities

 ### 1. Few-Shot Learning
+
 - Example selection strategies (semantic similarity, diversity sampling)
 - Balancing example count with context window constraints
 - Constructing effective demonstrations with input-output pairs
@@ -28,6 +29,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 - Handling edge cases through strategic example selection

 ### 2. Chain-of-Thought Prompting
+
 - Step-by-step reasoning elicitation
 - Zero-shot CoT with "Let's think step by step"
 - Few-shot CoT with reasoning traces
@@ -35,12 +37,14 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 - Verification and validation steps

 ### 3. Structured Outputs
+
 - JSON mode for reliable parsing
 - Pydantic schema enforcement
 - Type-safe response handling
 - Error handling for malformed outputs

 ### 4. Prompt Optimization
+
 - Iterative refinement workflows
 - A/B testing prompt variations
 - Measuring prompt performance metrics (accuracy, consistency, latency)
@@ -48,6 +52,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 - Handling edge cases and failure modes

 ### 5. Template Systems
+
 - Variable interpolation and formatting
 - Conditional prompt sections
 - Multi-turn conversation templates
@@ -55,6 +60,7 @@ Master advanced prompt engineering techniques to maximize LLM performance, relia
 - Modular prompt components

 ### 6. System Prompt Design
+
 - Setting model behavior and constraints
 - Defining output formats and structure
 - Establishing role and expertise
@@ -395,6 +401,7 @@ Response:"""
 ## Performance Optimization

 ### Token Efficiency
+
 ```python
 # Before: Verbose prompt (150+ tokens)
 verbose_prompt = """
@@ -457,6 +464,7 @@ response = client.messages.create(
 ## Success Metrics

 Track these KPIs for your prompts:
+
 - **Accuracy**: Correctness of outputs
 - **Consistency**: Reproducibility across similar inputs
 - **Latency**: Response time (P50, P95, P99)
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/assets/prompt-template-library.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/assets/prompt-template-library.md
@@ -3,6 +3,7 @@
 ## Classification Templates

 ### Sentiment Analysis
+
 ```
 Classify the sentiment of the following text as Positive, Negative, or Neutral.

@@ -12,6 +13,7 @@ Sentiment:
 ```

 ### Intent Detection
+
 ```
 Determine the user's intent from the following message.

@@ -23,6 +25,7 @@ Intent:
 ```

 ### Topic Classification
+
 ```
 Classify the following article into one of these categories: {categories}

@@ -35,6 +38,7 @@ Category:
 ## Extraction Templates

 ### Named Entity Recognition
+
 ```
 Extract all named entities from the text and categorize them.

@@ -50,6 +54,7 @@ Entities (JSON format):
 ```

 ### Structured Data Extraction
+
 ```
 Extract structured information from the job posting.

@@ -70,6 +75,7 @@ Extracted Information (JSON):
 ## Generation Templates

 ### Email Generation
+
 ```
 Write a professional {email_type} email.

@@ -84,6 +90,7 @@ Body:
 ```

 ### Code Generation
+
 ```
 Generate {language} code for the following task:

@@ -101,6 +108,7 @@ Code:
 ```

 ### Creative Writing
+
 ```
 Write a {length}-word {style} story about {topic}.

@@ -115,6 +123,7 @@ Story:
 ## Transformation Templates

 ### Summarization
+
 ```
 Summarize the following text in {num_sentences} sentences.

@@ -125,6 +134,7 @@ Summary:
 ```

 ### Translation with Context
+
 ```
 Translate the following {source_lang} text to {target_lang}.

@@ -137,6 +147,7 @@ Translation:
 ```

 ### Format Conversion
+
 ```
 Convert the following {source_format} to {target_format}.

@@ -149,6 +160,7 @@ Output ({target_format}):
 ## Analysis Templates

 ### Code Review
+
 ```
 Review the following code for:
 1. Bugs and errors
@@ -163,6 +175,7 @@ Review:
 ```

 ### SWOT Analysis
+
 ```
 Conduct a SWOT analysis for: {subject}

@@ -185,6 +198,7 @@ Threats:
 ## Question Answering Templates

 ### RAG Template
+
 ```
 Answer the question based on the provided context. If the context doesn't contain enough information, say so.

@@ -197,6 +211,7 @@ Answer:
 ```

 ### Multi-Turn Q&A
+
 ```
 Previous conversation:
 {conversation_history}
@@ -209,6 +224,7 @@ Answer (continue naturally from conversation):
 ## Specialized Templates

 ### SQL Query Generation
+
 ```
 Generate a SQL query for the following request.

@@ -221,6 +237,7 @@ SQL Query:
 ```

 ### Regex Pattern Creation
+
 ```
 Create a regex pattern to match: {requirement}

@@ -234,6 +251,7 @@ Regex pattern:
 ```

 ### API Documentation
+
 ```
 Generate API documentation for this function:

--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/chain-of-thought.md
@@ -7,6 +7,7 @@ Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from LLMs, drama
 ## Core Techniques

 ### Zero-Shot CoT
+
 Add a simple trigger phrase to elicit reasoning:

 ```python
@@ -29,6 +30,7 @@ prompt = zero_shot_cot(query)
 ```

 ### Few-Shot CoT
+
 Provide examples with explicit reasoning chains:

 ```python
@@ -53,6 +55,7 @@ A: Let's think step by step:"""
 ```

 ### Self-Consistency
+
 Generate multiple reasoning paths and take the majority vote:

 ```python
@@ -85,6 +88,7 @@ def self_consistency_cot(query, n=5, temperature=0.7):
 ## Advanced Patterns

 ### Least-to-Most Prompting
+
 Break complex problems into simpler subproblems:

 ```python
@@ -125,6 +129,7 @@ Final Answer:"""
 ```

 ### Tree-of-Thought (ToT)
+
 Explore multiple reasoning branches:

 ```python
@@ -176,6 +181,7 @@ Score:"""
 ```

 ### Verification Step
+
 Add explicit verification to catch errors:

 ```python
@@ -220,6 +226,7 @@ Corrected solution:"""
 ## Domain-Specific CoT

 ### Math Problems
+
 ```python
 math_cot_template = """
 Problem: {problem}
@@ -248,6 +255,7 @@ Answer: {final_answer}
 ```

 ### Code Debugging
+
 ```python
 debug_cot_template = """
 Code with error:
@@ -278,6 +286,7 @@ Fixed code:
 ```

 ### Logical Reasoning
+
 ```python
 logic_cot_template = """
 Premises:
@@ -305,6 +314,7 @@ Answer: {final_answer}
 ## Performance Optimization

 ### Caching Reasoning Patterns
+
 ```python
 class ReasoningCache:
    def __init__(self):
@@ -328,6 +338,7 @@ class ReasoningCache:
 ```

 ### Adaptive Reasoning Depth
+
 ```python
 def adaptive_cot(problem, initial_depth=3):
    depth = initial_depth
@@ -378,6 +389,7 @@ def evaluate_cot_quality(reasoning_chain):
 ## When to Use CoT

 **Use CoT for:**
+
 - Math and arithmetic problems
 - Logical reasoning tasks
 - Multi-step planning
@@ -385,6 +397,7 @@ def evaluate_cot_quality(reasoning_chain):
 - Complex decision making

 **Skip CoT for:**
+
 - Simple factual queries
 - Direct lookups
 - Creative writing
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/few-shot-learning.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/few-shot-learning.md
@@ -7,6 +7,7 @@ Few-shot learning enables LLMs to perform tasks by providing a small number of e
 ## Example Selection Strategies

 ### 1. Semantic Similarity
+
 Select examples most similar to the input query using embedding-based retrieval.

 ```python
@@ -29,6 +30,7 @@ class SemanticExampleSelector:
 **Best For**: Question answering, text classification, extraction tasks

 ### 2. Diversity Sampling
+
 Maximize coverage of different patterns and edge cases.

 ```python
@@ -58,6 +60,7 @@ class DiversityExampleSelector:
 **Best For**: Demonstrating task variability, edge case handling

 ### 3. Difficulty-Based Selection
+
 Gradually increase example complexity to scaffold learning.

 ```python
@@ -75,6 +78,7 @@ class ProgressiveExampleSelector:
 **Best For**: Complex reasoning tasks, code generation

 ### 4. Error-Based Selection
+
 Include examples that address common failure modes.

 ```python
@@ -98,6 +102,7 @@ class ErrorGuidedSelector:
 ## Example Construction Best Practices

 ### Format Consistency
+
 All examples should follow identical formatting:

 ```python
@@ -121,6 +126,7 @@ examples = [
 ```

 ### Input-Output Alignment
+
 Ensure examples demonstrate the exact task you want the model to perform:

 ```python
@@ -138,6 +144,7 @@ example = {
 ```

 ### Complexity Balance
+
 Include examples spanning the expected difficulty range:

 ```python
@@ -156,6 +163,7 @@ examples = [
 ## Context Window Management

 ### Token Budget Allocation
+
 Typical distribution for a 4K context window:

 ```
@@ -166,6 +174,7 @@ Response:            1500 tokens  (38%)
 ```

 ### Dynamic Example Truncation
+
 ```python
 class TokenAwareSelector:
    def __init__(self, examples, tokenizer, max_tokens=1500):
@@ -197,6 +206,7 @@ class TokenAwareSelector:
 ## Edge Case Handling

 ### Include Boundary Examples
+
 ```python
 edge_case_examples = [
    # Empty input
@@ -216,6 +226,7 @@ edge_case_examples = [
 ## Few-Shot Prompt Templates

 ### Classification Template
+
 ```python
 def build_classification_prompt(examples, query, labels):
    prompt = f"Classify the text into one of these categories: {', '.join(labels)}\n\n"
@@ -228,6 +239,7 @@ def build_classification_prompt(examples, query, labels):
 ```

 ### Extraction Template
+
 ```python
 def build_extraction_prompt(examples, query):
    prompt = "Extract structured information from the text.\n\n"
@@ -240,6 +252,7 @@ def build_extraction_prompt(examples, query):
 ```

 ### Transformation Template
+
 ```python
 def build_transformation_prompt(examples, query):
    prompt = "Transform the input according to the pattern shown in examples.\n\n"
@@ -254,6 +267,7 @@ def build_transformation_prompt(examples, query):
 ## Evaluation and Optimization

 ### Example Quality Metrics
+
 ```python
 def evaluate_example_quality(example, validation_set):
    metrics = {
@@ -266,6 +280,7 @@ def evaluate_example_quality(example, validation_set):
 ```

 ### A/B Testing Example Sets
+
 ```python
 class ExampleSetTester:
    def __init__(self, llm_client):
@@ -295,6 +310,7 @@ class ExampleSetTester:
 ## Advanced Techniques

 ### Meta-Learning (Learning to Select)
+
 Train a small model to predict which examples will be most effective:

 ```python
@@ -334,6 +350,7 @@ class LearnedExampleSelector:
 ```

 ### Adaptive Example Count
+
 Dynamically adjust the number of examples based on task difficulty:

 ```python
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/prompt-optimization.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/prompt-optimization.md
@@ -3,6 +3,7 @@
 ## Systematic Refinement Process

 ### 1. Baseline Establishment
+
 ```python
 def establish_baseline(prompt, test_cases):
    results = {
@@ -26,6 +27,7 @@ def establish_baseline(prompt, test_cases):
 ```

 ### 2. Iterative Refinement Workflow
+
 ```
 Initial Prompt → Test → Analyze Failures → Refine → Test → Repeat
 ```
@@ -64,6 +66,7 @@ class PromptOptimizer:
 ```

 ### 3. A/B Testing Framework
+
 ```python
 class PromptABTest:
    def __init__(self, variant_a, variant_b):
@@ -116,6 +119,7 @@ class PromptABTest:
 ## Optimization Strategies

 ### Token Reduction
+
 ```python
 def optimize_for_tokens(prompt):
    optimizations = [
@@ -144,6 +148,7 @@ def optimize_for_tokens(prompt):
 ```

 ### Latency Reduction
+
 ```python
 def optimize_for_latency(prompt):
    strategies = {
@@ -167,6 +172,7 @@ def optimize_for_latency(prompt):
 ```

 ### Accuracy Improvement
+
 ```python
 def improve_accuracy(prompt, failure_cases):
    improvements = []
@@ -194,6 +200,7 @@ def improve_accuracy(prompt, failure_cases):
 ## Performance Metrics

 ### Core Metrics
+
 ```python
 class PromptMetrics:
    @staticmethod
@@ -230,6 +237,7 @@ class PromptMetrics:
 ```

 ### Automated Evaluation
+
 ```python
 def evaluate_prompt_comprehensively(prompt, test_suite):
    results = {
@@ -274,6 +282,7 @@ def evaluate_prompt_comprehensively(prompt, test_suite):
 ## Failure Analysis

 ### Categorizing Failures
+
 ```python
 class FailureAnalyzer:
    def categorize_failures(self, test_results):
@@ -326,6 +335,7 @@ class FailureAnalyzer:
 ## Versioning and Rollback

 ### Prompt Version Control
+
 ```python
 class PromptVersionControl:
    def __init__(self, storage_path):
@@ -381,24 +391,28 @@ class PromptVersionControl:
 ## Common Optimization Patterns

 ### Pattern 1: Add Structure
+
 ```
 Before: "Analyze this text"
 After: "Analyze this text for:\n1. Main topic\n2. Key arguments\n3. Conclusion"
 ```

 ### Pattern 2: Add Examples
+
 ```
 Before: "Extract entities"
 After: "Extract entities\\n\\nExample:\\nText: Apple released iPhone\\nEntities: {company: Apple, product: iPhone}"
 ```

 ### Pattern 3: Add Constraints
+
 ```
 Before: "Summarize this"
 After: "Summarize in exactly 3 bullet points, 15 words each"
 ```

 ### Pattern 4: Add Verification
+
 ```
 Before: "Calculate..."
 After: "Calculate... Then verify your calculation is correct before responding."
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/prompt-templates.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/prompt-templates.md
@@ -3,6 +3,7 @@
 ## Template Architecture

 ### Basic Template Structure
+
 ```python
 class PromptTemplate:
    def __init__(self, template_string, variables=None):
@@ -30,6 +31,7 @@ prompt = template.render(
 ```

 ### Conditional Templates
+
 ```python
 class ConditionalTemplate(PromptTemplate):
    def render(self, **kwargs):
@@ -84,6 +86,7 @@ Reference examples:
 ```

 ### Modular Template Composition
+
 ```python
 class ModularTemplate:
    def __init__(self):
@@ -133,6 +136,7 @@ advanced_prompt = builder.render(
 ## Common Template Patterns

 ### Classification Template
+
 ```python
 CLASSIFICATION_TEMPLATE = """
 Classify the following {content_type} into one of these categories: {categories}
@@ -153,6 +157,7 @@ Category:"""
 ```

 ### Extraction Template
+
 ```python
 EXTRACTION_TEMPLATE = """
 Extract structured information from the {content_type}.
@@ -171,6 +176,7 @@ Extracted information (JSON):"""
 ```

 ### Generation Template
+
 ```python
 GENERATION_TEMPLATE = """
 Generate {output_type} based on the following {input_type}.
@@ -198,6 +204,7 @@ Examples:
 ```

 ### Transformation Template
+
 ```python
 TRANSFORMATION_TEMPLATE = """
 Transform the input {source_format} to {target_format}.
@@ -219,6 +226,7 @@ Output {target_format}:"""
 ## Advanced Features

 ### Template Inheritance
+
 ```python
 class TemplateRegistry:
    def __init__(self):
@@ -251,6 +259,7 @@ registry.register('sentiment_analysis', {
 ```

 ### Variable Validation
+
 ```python
 class ValidatedTemplate:
    def __init__(self, template, schema):
@@ -294,6 +303,7 @@ template = ValidatedTemplate(
 ```

 ### Template Caching
+
 ```python
 class CachedTemplate:
    def __init__(self, template):
@@ -323,6 +333,7 @@ class CachedTemplate:
 ## Multi-Turn Templates

 ### Conversation Template
+
 ```python
 class ConversationTemplate:
    def __init__(self, system_prompt):
@@ -349,6 +360,7 @@ class ConversationTemplate:
 ```

 ### State-Based Templates
+
 ```python
 class StatefulTemplate:
    def __init__(self):
@@ -406,6 +418,7 @@ Here's the result: {result}
 ## Template Libraries

 ### Question Answering
+
 ```python
 QA_TEMPLATES = {
    'factual': """Answer the question based on the context.
@@ -432,6 +445,7 @@ Assistant:"""
 ```

 ### Content Generation
+
 ```python
 GENERATION_TEMPLATES = {
    'blog_post': """Write a blog post about {topic}.
--- a/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/system-prompts.md
+++ b/plugins/llm-application-dev/skills/prompt-engineering-patterns/references/system-prompts.md
@@ -11,6 +11,7 @@ System prompts set the foundation for LLM behavior. They define role, expertise,
 ```

 ### Example: Code Assistant
+
 ```
 You are an expert software engineer with deep knowledge of Python, JavaScript, and system design.

@@ -36,6 +37,7 @@ Output format:
 ## Pattern Library

 ### 1. Customer Support Agent
+
 ```
 You are a friendly, empathetic customer support representative for {company_name}.

@@ -59,6 +61,7 @@ Constraints:
 ```

 ### 2. Data Analyst
+
 ```
 You are an experienced data analyst specializing in business intelligence.

@@ -85,6 +88,7 @@ Output:
 ```

 ### 3. Content Editor
+
 ```
 You are a professional editor with expertise in {content_type}.

@@ -112,6 +116,7 @@ Format your feedback as:
 ## Advanced Techniques

 ### Dynamic Role Adaptation
+
 ```python
 def build_adaptive_system_prompt(task_type, difficulty):
    base = "You are an expert assistant"
@@ -136,6 +141,7 @@ Expertise level: {difficulty}
 ```

 ### Constraint Specification
+
 ```
 Hard constraints (MUST follow):
 - Never generate harmful, biased, or illegal content
--- a/plugins/llm-application-dev/skills/rag-implementation/SKILL.md
+++ b/plugins/llm-application-dev/skills/rag-implementation/SKILL.md
@@ -20,9 +20,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
 ## Core Components

 ### 1. Vector Databases
+
 **Purpose**: Store and retrieve document embeddings efficiently

 **Options:**
+
 - **Pinecone**: Managed, scalable, serverless
 - **Weaviate**: Open-source, hybrid search, GraphQL
 - **Milvus**: High performance, on-premise
@@ -31,6 +33,7 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
 - **pgvector**: PostgreSQL extension, SQL integration

 ### 2. Embeddings
+
 **Purpose**: Convert text to numerical vectors for similarity search

 **Models (2026):**
@@ -44,7 +47,9 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
 | **multilingual-e5-large** | 1024 | Multi-language support |

 ### 3. Retrieval Strategies
+
 **Approaches:**
+
 - **Dense Retrieval**: Semantic similarity via embeddings
 - **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
 - **Hybrid Search**: Combine dense + sparse with weighted fusion
@@ -52,9 +57,11 @@ Master Retrieval-Augmented Generation (RAG) to build LLM applications that provi
 - **HyDE**: Generate hypothetical documents for better retrieval

 ### 4. Reranking
+
 **Purpose**: Improve retrieval quality by reordering results

 **Methods:**
+
 - **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
 - **Cohere Rerank**: API-based reranking
 - **Maximal Marginal Relevance (MMR)**: Diversity + relevance
@@ -255,6 +262,7 @@ hyde_rag = builder.compile()
 ## Document Chunking Strategies

 ### Recursive Character Text Splitter
+
 ```python
 from langchain_text_splitters import RecursiveCharacterTextSplitter

@@ -269,6 +277,7 @@ chunks = splitter.split_documents(documents)
 ```

 ### Token-Based Splitting
+
 ```python
 from langchain_text_splitters import TokenTextSplitter

@@ -280,6 +289,7 @@ splitter = TokenTextSplitter(
 ```

 ### Semantic Chunking
+
 ```python
 from langchain_experimental.text_splitter import SemanticChunker

@@ -291,6 +301,7 @@ splitter = SemanticChunker(
 ```

 ### Markdown Header Splitter
+
 ```python
 from langchain_text_splitters import MarkdownHeaderTextSplitter

@@ -309,6 +320,7 @@ splitter = MarkdownHeaderTextSplitter(
 ## Vector Store Configurations

 ### Pinecone (Serverless)
+
 ```python
 from pinecone import Pinecone, ServerlessSpec
 from langchain_pinecone import PineconeVectorStore
@@ -331,6 +343,7 @@ vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
 ```

 ### Weaviate
+
 ```python
 import weaviate
 from langchain_weaviate import WeaviateVectorStore
@@ -346,6 +359,7 @@ vectorstore = WeaviateVectorStore(
 ```

 ### Chroma (Local Development)
+
 ```python
 from langchain_chroma import Chroma

@@ -357,6 +371,7 @@ vectorstore = Chroma(
 ```

 ### pgvector (PostgreSQL)
+
 ```python
 from langchain_postgres.vectorstores import PGVector

@@ -372,6 +387,7 @@ vectorstore = PGVector(
 ## Retrieval Optimization

 ### 1. Metadata Filtering
+
 ```python
 from langchain_core.documents import Document

@@ -394,6 +410,7 @@ results = await vectorstore.asimilarity_search(
 ```

 ### 2. Maximal Marginal Relevance (MMR)
+
 ```python
 # Balance relevance with diversity
 results = await vectorstore.amax_marginal_relevance_search(
@@ -405,6 +422,7 @@ results = await vectorstore.amax_marginal_relevance_search(
 ```

 ### 3. Reranking with Cross-Encoder
+
 ```python
 from sentence_transformers import CrossEncoder

@@ -424,6 +442,7 @@ async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
 ```

 ### 4. Cohere Rerank
+
 ```python
 from langchain.retrievers import CohereRerank
 from langchain_cohere import CohereRerank
@@ -440,6 +459,7 @@ reranked_retriever = ContextualCompressionRetriever(
 ## Prompt Engineering for RAG

 ### Contextual Prompt with Citations
+
 ```python
 rag_prompt = ChatPromptTemplate.from_template(
    """Answer the question based on the context below. Include citations using [1], [2], etc.
@@ -461,6 +481,7 @@ rag_prompt = ChatPromptTemplate.from_template(
 ```

 ### Structured Output for RAG
+
 ```python
 from pydantic import BaseModel, Field

--- a/plugins/llm-application-dev/skills/similarity-search-patterns/SKILL.md
+++ b/plugins/llm-application-dev/skills/similarity-search-patterns/SKILL.md
@@ -20,12 +20,12 @@ Patterns for implementing efficient similarity search in production systems.

 ### 1. Distance Metrics

-| Metric | Formula | Best For |
-|--------|---------|----------|
-| **Cosine** | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
-| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
-| **Dot Product** | A·B | Magnitude matters |
-| **Manhattan (L1)** | Σ|a-b| | Sparse vectors |
+| Metric             | Formula            | Best For              |
+| ------------------ | ------------------ | --------------------- | --- | -------------- |
+| **Cosine**         | 1 - (A·B)/(‖A‖‖B‖) | Normalized embeddings |
+| **Euclidean (L2)** | √Σ(a-b)²           | Raw embeddings        |
+| **Dot Product**    | A·B                | Magnitude matters     |
+| **Manhattan (L1)** | Σ                  | a-b                   |     | Sparse vectors |

 ### 2. Index Types

@@ -538,6 +538,7 @@ class WeaviateVectorStore:
 ## Best Practices

 ### Do's
+
 - **Use appropriate index** - HNSW for most cases
 - **Tune parameters** - ef_search, nprobe for recall/speed
 - **Implement hybrid search** - Combine with keyword search
@@ -545,6 +546,7 @@ class WeaviateVectorStore:
 - **Pre-filter when possible** - Reduce search space

 ### Don'ts
+
 - **Don't skip evaluation** - Measure before optimizing
 - **Don't over-index** - Start with flat, scale up
 - **Don't ignore latency** - P99 matters for UX
--- a/plugins/llm-application-dev/skills/vector-index-tuning/SKILL.md
+++ b/plugins/llm-application-dev/skills/vector-index-tuning/SKILL.md
@@ -31,11 +31,11 @@ Data Size           Recommended Index

 ### 2. HNSW Parameters

-| Parameter | Default | Effect |
-|-----------|---------|--------|
-| **M** | 16 | Connections per node, ↑ = better recall, more memory |
-| **efConstruction** | 100 | Build quality, ↑ = better index, slower build |
-| **efSearch** | 50 | Search quality, ↑ = better recall, slower search |
+| Parameter          | Default | Effect                                               |
+| ------------------ | ------- | ---------------------------------------------------- |
+| **M**              | 16      | Connections per node, ↑ = better recall, more memory |
+| **efConstruction** | 100     | Build quality, ↑ = better index, slower build        |
+| **efSearch**       | 50      | Search quality, ↑ = better recall, slower search     |

 ### 3. Quantization Types

@@ -502,6 +502,7 @@ def profile_index_build(
 ## Best Practices

 ### Do's
+
 - **Benchmark with real queries** - Synthetic may not represent production
 - **Monitor recall continuously** - Can degrade with data drift
 - **Start with defaults** - Tune only when needed
@@ -509,6 +510,7 @@ def profile_index_build(
 - **Consider tiered storage** - Hot/cold data separation

 ### Don'ts
+
 - **Don't over-optimize early** - Profile first
 - **Don't ignore build time** - Index updates have cost
 - **Don't forget reindexing** - Plan for maintenance