Files
agents/tools/data-pipeline.md
Seth Hobson d2f3886ae1 Consolidate workflows and tools from commands repository
Repository Restructure:
- Move all 83 agent .md files to agents/ subdirectory
- Add 15 workflow orchestrators from commands repo to workflows/
- Add 42 development tools from commands repo to tools/
- Update README for unified repository structure

The commands repository functionality is now fully integrated, providing
complete workflow orchestration and development tooling alongside agents.

Directory Structure:
- agents/    - 83 specialized AI agents
- workflows/ - 15 multi-agent orchestration commands
- tools/     - 42 focused development utilities

No breaking changes to agent functionality - all agents remain accessible
with same names and behavior. Adds workflow and tool commands for enhanced
multi-agent coordination capabilities.
2025-10-08 08:28:33 -04:00

1.5 KiB

model
model
claude-sonnet-4-0

Data Pipeline Architecture

Design and implement a scalable data pipeline for: $ARGUMENTS

Create a production-ready data pipeline including:

  1. Data Ingestion:

    • Multiple source connectors (APIs, databases, files, streams)
    • Schema evolution handling
    • Incremental/batch loading
    • Data quality checks at ingestion
    • Dead letter queue for failures
  2. Transformation Layer:

    • ETL/ELT architecture decision
    • Apache Beam/Spark transformations
    • Data cleansing and normalization
    • Feature engineering pipeline
    • Business logic implementation
  3. Orchestration:

    • Airflow/Prefect DAGs
    • Dependency management
    • Retry and failure handling
    • SLA monitoring
    • Dynamic pipeline generation
  4. Storage Strategy:

    • Data lake architecture
    • Partitioning strategy
    • Compression choices
    • Retention policies
    • Hot/cold storage tiers
  5. Streaming Pipeline:

    • Kafka/Kinesis integration
    • Real-time processing
    • Windowing strategies
    • Late data handling
    • Exactly-once semantics
  6. Data Quality:

    • Automated testing
    • Data profiling
    • Anomaly detection
    • Lineage tracking
    • Quality metrics and dashboards
  7. Performance & Scale:

    • Horizontal scaling
    • Resource optimization
    • Caching strategies
    • Query optimization
    • Cost management

Include monitoring, alerting, and data governance considerations. Make it cloud-agnostic with specific implementation examples for AWS/GCP/Azure.