mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 09:37:15 +00:00
fix(skills): remove phantom resource references and fix CoC links (#447)
Remove references to non-existent resource files (references/, assets/, scripts/, examples/) from 115 skill SKILL.md files. These sections pointed to directories and files that were never created, causing confusion when users install skills. Also fix broken Code of Conduct links in issue templates to use absolute GitHub URLs instead of relative paths that 404.
This commit is contained in:
@@ -598,11 +598,3 @@ def compare_embedding_models(
|
||||
- **Don't skip preprocessing**: Garbage in, garbage out
|
||||
- **Don't over-chunk**: Lose important context
|
||||
- **Don't forget metadata**: Essential for filtering and debugging
|
||||
|
||||
## Resources
|
||||
|
||||
- [Voyage AI Documentation](https://docs.voyageai.com/)
|
||||
- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
|
||||
- [Sentence Transformers](https://www.sbert.net/)
|
||||
- [MTEB Benchmark](https://huggingface.co/spaces/mteb/leaderboard)
|
||||
- [LangChain Embedding Models](https://python.langchain.com/docs/integrations/text_embedding/)
|
||||
|
||||
@@ -562,9 +562,3 @@ class HybridRAGPipeline:
|
||||
- **Don't skip keyword search** - Handles exact matches better
|
||||
- **Don't over-fetch** - Balance recall vs latency
|
||||
- **Don't ignore edge cases** - Empty results, single word queries
|
||||
|
||||
## Resources
|
||||
|
||||
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)
|
||||
- [Vespa Hybrid Search](https://blog.vespa.ai/improving-text-ranking-with-few-shot-prompting/)
|
||||
- [Cohere Rerank](https://docs.cohere.com/docs/reranking)
|
||||
|
||||
@@ -632,35 +632,3 @@ index = pc.Index("my-index")
|
||||
# Create vector store with existing index
|
||||
vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [LangChain Documentation](https://python.langchain.com/docs/)
|
||||
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
|
||||
- [LangSmith Platform](https://smith.langchain.com/)
|
||||
- [LangChain GitHub](https://github.com/langchain-ai/langchain)
|
||||
- [LangGraph GitHub](https://github.com/langchain-ai/langgraph)
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Using Deprecated APIs**: Use LangGraph for agents, not `initialize_agent`
|
||||
2. **Memory Overflow**: Use checkpointers with TTL for long-running agents
|
||||
3. **Poor Tool Descriptions**: Clear descriptions help LLM select correct tools
|
||||
4. **Context Window Exceeded**: Use summarization or sliding window memory
|
||||
5. **No Error Handling**: Wrap tool functions with try/except
|
||||
6. **Blocking Operations**: Use async methods (`ainvoke`, `astream`)
|
||||
7. **Missing Observability**: Always enable LangSmith tracing in production
|
||||
|
||||
## Production Checklist
|
||||
|
||||
- [ ] Use LangGraph StateGraph for agent orchestration
|
||||
- [ ] Implement async patterns throughout (`ainvoke`, `astream`)
|
||||
- [ ] Add production checkpointer (PostgreSQL, Redis)
|
||||
- [ ] Enable LangSmith tracing
|
||||
- [ ] Implement structured tools with Pydantic schemas
|
||||
- [ ] Add timeout limits for agent execution
|
||||
- [ ] Implement rate limiting
|
||||
- [ ] Add comprehensive error handling
|
||||
- [ ] Set up health checks
|
||||
- [ ] Version control prompts and configurations
|
||||
- [ ] Write integration tests for agent workflows
|
||||
|
||||
@@ -664,32 +664,3 @@ class BenchmarkRunner:
|
||||
for metric, scores in results.items()
|
||||
}
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [LangSmith Evaluation Guide](https://docs.smith.langchain.com/evaluation)
|
||||
- [RAGAS Framework](https://docs.ragas.io/)
|
||||
- [DeepEval Library](https://docs.deepeval.com/)
|
||||
- [Arize Phoenix](https://docs.arize.com/phoenix/)
|
||||
- [HELM Benchmark](https://crfm.stanford.edu/helm/)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Multiple Metrics**: Use diverse metrics for comprehensive view
|
||||
2. **Representative Data**: Test on real-world, diverse examples
|
||||
3. **Baselines**: Always compare against baseline performance
|
||||
4. **Statistical Rigor**: Use proper statistical tests for comparisons
|
||||
5. **Continuous Evaluation**: Integrate into CI/CD pipeline
|
||||
6. **Human Validation**: Combine automated metrics with human judgment
|
||||
7. **Error Analysis**: Investigate failures to understand weaknesses
|
||||
8. **Version Control**: Track evaluation results over time
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Single Metric Obsession**: Optimizing for one metric at the expense of others
|
||||
- **Small Sample Size**: Drawing conclusions from too few examples
|
||||
- **Data Contamination**: Testing on training data
|
||||
- **Ignoring Variance**: Not accounting for statistical uncertainty
|
||||
- **Metric Mismatch**: Using metrics not aligned with business goals
|
||||
- **Position Bias**: In pairwise evals, randomize order
|
||||
- **Overfitting Prompts**: Optimizing for test set instead of real use
|
||||
|
||||
@@ -471,10 +471,3 @@ Track these KPIs for your prompts:
|
||||
- **Token Usage**: Average tokens per request
|
||||
- **Success Rate**: Percentage of valid, parseable outputs
|
||||
- **User Satisfaction**: Ratings and feedback
|
||||
|
||||
## Resources
|
||||
|
||||
- [Anthropic Prompt Engineering Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering)
|
||||
- [Claude Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
|
||||
- [OpenAI Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering)
|
||||
- [LangChain Prompts](https://python.langchain.com/docs/concepts/prompts/)
|
||||
|
||||
@@ -540,31 +540,3 @@ async def evaluate_rag_system(
|
||||
|
||||
return {k: sum(v) / len(v) for k, v in metrics.items()}
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [LangChain RAG Tutorial](https://python.langchain.com/docs/tutorials/rag/)
|
||||
- [LangGraph RAG Examples](https://langchain-ai.github.io/langgraph/tutorials/rag/)
|
||||
- [Pinecone Best Practices](https://docs.pinecone.io/guides/get-started/overview)
|
||||
- [Voyage AI Embeddings](https://docs.voyageai.com/)
|
||||
- [RAG Evaluation Guide](https://docs.ragas.io/)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Chunk Size**: Balance between context (larger) and specificity (smaller) - typically 500-1000 tokens
|
||||
2. **Overlap**: Use 10-20% overlap to preserve context at boundaries
|
||||
3. **Metadata**: Include source, page, timestamp for filtering and debugging
|
||||
4. **Hybrid Search**: Combine semantic and keyword search for best recall
|
||||
5. **Reranking**: Use cross-encoder reranking for precision-critical applications
|
||||
6. **Citations**: Always return source documents for transparency
|
||||
7. **Evaluation**: Continuously test retrieval quality and answer accuracy
|
||||
8. **Monitoring**: Track retrieval metrics and latency in production
|
||||
|
||||
## Common Issues
|
||||
|
||||
- **Poor Retrieval**: Check embedding quality, chunk size, query formulation
|
||||
- **Irrelevant Results**: Add metadata filtering, use hybrid search, rerank
|
||||
- **Missing Information**: Ensure documents are properly indexed, check chunking
|
||||
- **Slow Queries**: Optimize vector store, use caching, reduce k
|
||||
- **Hallucinations**: Improve grounding prompt, add verification step
|
||||
- **Context Too Long**: Use compression or parent document retriever
|
||||
|
||||
@@ -551,10 +551,3 @@ class WeaviateVectorStore:
|
||||
- **Don't over-index** - Start with flat, scale up
|
||||
- **Don't ignore latency** - P99 matters for UX
|
||||
- **Don't forget costs** - Vector storage adds up
|
||||
|
||||
## Resources
|
||||
|
||||
- [Pinecone Docs](https://docs.pinecone.io/)
|
||||
- [Qdrant Docs](https://qdrant.tech/documentation/)
|
||||
- [pgvector](https://github.com/pgvector/pgvector)
|
||||
- [Weaviate Docs](https://weaviate.io/developers/weaviate)
|
||||
|
||||
@@ -515,9 +515,3 @@ def profile_index_build(
|
||||
- **Don't ignore build time** - Index updates have cost
|
||||
- **Don't forget reindexing** - Plan for maintenance
|
||||
- **Don't skip warming** - Cold indexes are slow
|
||||
|
||||
## Resources
|
||||
|
||||
- [HNSW Paper](https://arxiv.org/abs/1603.09320)
|
||||
- [Faiss Wiki](https://github.com/facebookresearch/faiss/wiki)
|
||||
- [ANN Benchmarks](https://ann-benchmarks.com/)
|
||||
|
||||
Reference in New Issue
Block a user