style: format all files with prettier

2026-03-18 09:37:15 +00:00 · 2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions
--- a/plugins/data-engineering/commands/data-pipeline.md
+++ b/plugins/data-engineering/commands/data-pipeline.md
@@ -20,26 +20,32 @@ $ARGUMENTS
 ## Instructions

 ### 1. Architecture Design
+
 - Assess: sources, volume, latency requirements, targets
 - Select pattern: ETL (transform before load), ELT (load then transform), Lambda (batch + speed layers), Kappa (stream-only), Lakehouse (unified)
 - Design flow: sources → ingestion → processing → storage → serving
 - Add observability touchpoints

 ### 2. Ingestion Implementation
+
 **Batch**
+
 - Incremental loading with watermark columns
 - Retry logic with exponential backoff
 - Schema validation and dead letter queue for invalid records
- Metadata tracking (_extracted_at, _source)
+- Metadata tracking (\_extracted_at, \_source)

 **Streaming**
+
 - Kafka consumers with exactly-once semantics
 - Manual offset commits within transactions
 - Windowing for time-based aggregations
 - Error handling and replay capability

 ### 3. Orchestration
+
 **Airflow**
+
 - Task groups for logical organization
 - XCom for inter-task communication
 - SLA monitoring and email alerts
@@ -47,12 +53,14 @@ $ARGUMENTS
 - Retry with exponential backoff

 **Prefect**
+
 - Task caching for idempotency
 - Parallel execution with .submit()
 - Artifacts for visibility
 - Automatic retries with configurable delays

 ### 4. Transformation with dbt
+
 - Staging layer: incremental materialization, deduplication, late-arriving data handling
 - Marts layer: dimensional models, aggregations, business logic
 - Tests: unique, not_null, relationships, accepted_values, custom data quality tests
@@ -60,7 +68,9 @@ $ARGUMENTS
 - Incremental strategy: merge or delete+insert

 ### 5. Data Quality Framework
+
 **Great Expectations**
+
 - Table-level: row count, column count
 - Column-level: uniqueness, nullability, type validation, value sets, ranges
 - Checkpoints for validation execution
@@ -68,12 +78,15 @@ $ARGUMENTS
 - Failure notifications

 **dbt Tests**
+
 - Schema tests in YAML
 - Custom data quality tests with dbt-expectations
 - Test results tracked in metadata

 ### 6. Storage Strategy
+
 **Delta Lake**
+
 - ACID transactions with append/overwrite/merge modes
 - Upsert with predicate-based matching
 - Time travel for historical queries
@@ -81,6 +94,7 @@ $ARGUMENTS
 - Vacuum to remove old files

 **Apache Iceberg**
+
 - Partitioning and sort order optimization
 - MERGE INTO for upserts
 - Snapshot isolation and time travel
@@ -88,7 +102,9 @@ $ARGUMENTS
 - Snapshot expiration for cleanup

 ### 7. Monitoring & Cost Optimization
+
 **Monitoring**
+
 - Track: records processed/failed, data size, execution time, success/failure rates
 - CloudWatch metrics and custom namespaces
 - SNS alerts for critical/warning/info events
@@ -96,6 +112,7 @@ $ARGUMENTS
 - Performance trend analysis

 **Cost Optimization**
+
 - Partitioning: date/entity-based, avoid over-partitioning (keep >1GB)
 - File sizes: 512MB-1GB for Parquet
 - Lifecycle policies: hot (Standard) → warm (IA) → cold (Glacier)
@@ -144,12 +161,14 @@ ingester.save_dead_letter_queue('s3://lake/dlq/orders')
 ## Output Deliverables

 ### 1. Architecture Documentation
+
 - Architecture diagram with data flow
 - Technology stack with justification
 - Scalability analysis and growth patterns
 - Failure modes and recovery strategies

 ### 2. Implementation Code
+
 - Ingestion: batch/streaming with error handling
 - Transformation: dbt models (staging → marts) or Spark jobs
 - Orchestration: Airflow/Prefect DAGs with dependencies
@@ -157,18 +176,21 @@ ingester.save_dead_letter_queue('s3://lake/dlq/orders')
 - Data quality: Great Expectations suites and dbt tests

 ### 3. Configuration Files
+
 - Orchestration: DAG definitions, schedules, retry policies
 - dbt: models, sources, tests, project config
 - Infrastructure: Docker Compose, K8s manifests, Terraform
 - Environment: dev/staging/prod configs

 ### 4. Monitoring & Observability
+
 - Metrics: execution time, records processed, quality scores
 - Alerts: failures, performance degradation, data freshness
 - Dashboards: Grafana/CloudWatch for pipeline health
 - Logging: structured logs with correlation IDs

 ### 5. Operations Guide
+
 - Deployment procedures and rollback strategy
 - Troubleshooting guide for common issues
 - Scaling guide for increased volume
@@ -176,6 +198,7 @@ ingester.save_dead_letter_queue('s3://lake/dlq/orders')
 - Disaster recovery and backup procedures

 ## Success Criteria
+
 - Pipeline meets defined SLA (latency, throughput)
 - Data quality checks pass with >99% success rate
 - Automatic retry and alerting on failures