Files
agents/tools/data-pipeline.md
Seth Hobson 3802bca865 Refine plugin marketplace for launch readiness
Plugin Scope Improvements:
- Remove language-specialists plugin (not task-focused)
- Split specialized-domains into 5 focused plugins:
  * blockchain-web3 - Smart contract development only
  * quantitative-trading - Financial modeling and trading only
  * payment-processing - Payment gateway integration only
  * game-development - Unity and Minecraft only
  * accessibility-compliance - WCAG auditing only
- Split business-operations into 3 focused plugins:
  * business-analytics - Metrics and reporting only
  * hr-legal-compliance - HR and legal docs only
  * customer-sales-automation - Support and sales workflows only
- Fix infrastructure-devops scope:
  * Remove database concerns (db-migrate, database-admin)
  * Remove observability concerns (observability-engineer)
  * Move slo-implement to incident-response
  * Focus purely on container orchestration (K8s, Docker, Terraform)
- Fix customer-sales-automation scope:
  * Remove content-marketer (unrelated to customer/sales workflows)

Marketplace Statistics:
- Total plugins: 27 (was 22)
- Tool coverage: 100% (42/42 tools referenced)
- Fat plugins removed: 3 (language-specialists, specialized-domains, business-operations)
- All plugins now have clear, focused tasks

Model Migration:
- Migrate all 42 tools from claude-sonnet-4-0/opus-4-1 to model: sonnet
- Migrate all 15 workflows from claude-opus-4-1 to model: sonnet
- Use short model syntax consistent with agent files

Documentation Updates:
- Update README.md with refined plugin structure
- Update plugin descriptions to be task-focused
- Remove anthropomorphic and marketing language
- Improve category organization (now 16 distinct categories)

Ready for October 9, 2025 @ 9am PST launch
2025-10-08 20:54:29 -04:00

1.5 KiB

model
model
sonnet

Data Pipeline Architecture

Design and implement a scalable data pipeline for: $ARGUMENTS

Create a production-ready data pipeline including:

  1. Data Ingestion:

    • Multiple source connectors (APIs, databases, files, streams)
    • Schema evolution handling
    • Incremental/batch loading
    • Data quality checks at ingestion
    • Dead letter queue for failures
  2. Transformation Layer:

    • ETL/ELT architecture decision
    • Apache Beam/Spark transformations
    • Data cleansing and normalization
    • Feature engineering pipeline
    • Business logic implementation
  3. Orchestration:

    • Airflow/Prefect DAGs
    • Dependency management
    • Retry and failure handling
    • SLA monitoring
    • Dynamic pipeline generation
  4. Storage Strategy:

    • Data lake architecture
    • Partitioning strategy
    • Compression choices
    • Retention policies
    • Hot/cold storage tiers
  5. Streaming Pipeline:

    • Kafka/Kinesis integration
    • Real-time processing
    • Windowing strategies
    • Late data handling
    • Exactly-once semantics
  6. Data Quality:

    • Automated testing
    • Data profiling
    • Anomaly detection
    • Lineage tracking
    • Quality metrics and dashboards
  7. Performance & Scale:

    • Horizontal scaling
    • Resource optimization
    • Caching strategies
    • Query optimization
    • Cost management

Include monitoring, alerting, and data governance considerations. Make it cloud-agnostic with specific implementation examples for AWS/GCP/Azure.