style: format all files with prettier

2026-03-18 09:37:15 +00:00 · 2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions
--- a/plugins/data-engineering/commands/data-driven-feature.md
+++ b/plugins/data-engineering/commands/data-driven-feature.md
@@ -7,17 +7,20 @@ Build features guided by data insights, A/B testing, and continuous measurement
 ## Phase 1: Data Analysis and Hypothesis Formation

 ### 1. Exploratory Data Analysis
+
 - Use Task tool with subagent_type="machine-learning-ops::data-scientist"
 - Prompt: "Perform exploratory data analysis for feature: $ARGUMENTS. Analyze existing user behavior data, identify patterns and opportunities, segment users by behavior, and calculate baseline metrics. Use modern analytics tools (Amplitude, Mixpanel, Segment) to understand current user journeys, conversion funnels, and engagement patterns."
 - Output: EDA report with visualizations, user segments, behavioral patterns, baseline metrics

 ### 2. Business Hypothesis Development
+
 - Use Task tool with subagent_type="business-analytics::business-analyst"
 - Context: Data scientist's EDA findings and behavioral patterns
 - Prompt: "Formulate business hypotheses for feature: $ARGUMENTS based on data analysis. Define clear success metrics, expected impact on key business KPIs, target user segments, and minimum detectable effects. Create measurable hypotheses using frameworks like ICE scoring or RICE prioritization."
 - Output: Hypothesis document, success metrics definition, expected ROI calculations

 ### 3. Statistical Experiment Design
+
 - Use Task tool with subagent_type="machine-learning-ops::data-scientist"
 - Context: Business hypotheses and success metrics
 - Prompt: "Design statistical experiment for feature: $ARGUMENTS. Calculate required sample size for statistical power, define control and treatment groups, specify randomization strategy, and plan for multiple testing corrections. Consider Bayesian A/B testing approaches for faster decision making. Design for both primary and guardrail metrics."
@@ -26,18 +29,21 @@ Build features guided by data insights, A/B testing, and continuous measurement
 ## Phase 2: Feature Architecture and Analytics Design

 ### 4. Feature Architecture Planning
+
 - Use Task tool with subagent_type="data-engineering::backend-architect"
 - Context: Business requirements and experiment design
 - Prompt: "Design feature architecture for: $ARGUMENTS with A/B testing capability. Include feature flag integration (LaunchDarkly, Split.io, or Optimizely), gradual rollout strategy, circuit breakers for safety, and clean separation between control and treatment logic. Ensure architecture supports real-time configuration updates."
 - Output: Architecture diagrams, feature flag schema, rollout strategy

 ### 5. Analytics Instrumentation Design
+
 - Use Task tool with subagent_type="data-engineering::data-engineer"
 - Context: Feature architecture and success metrics
 - Prompt: "Design comprehensive analytics instrumentation for: $ARGUMENTS. Define event schemas for user interactions, specify properties for segmentation and analysis, design funnel tracking and conversion events, plan cohort analysis capabilities. Implement using modern SDKs (Segment, Amplitude, Mixpanel) with proper event taxonomy."
 - Output: Event tracking plan, analytics schema, instrumentation guide

 ### 6. Data Pipeline Architecture
+
 - Use Task tool with subagent_type="data-engineering::data-engineer"
 - Context: Analytics requirements and existing data infrastructure
 - Prompt: "Design data pipelines for feature: $ARGUMENTS. Include real-time streaming for live metrics (Kafka, Kinesis), batch processing for detailed analysis, data warehouse integration (Snowflake, BigQuery), and feature store for ML if applicable. Ensure proper data governance and GDPR compliance."
@@ -46,18 +52,21 @@ Build features guided by data insights, A/B testing, and continuous measurement
 ## Phase 3: Implementation with Instrumentation

 ### 7. Backend Implementation
+
 - Use Task tool with subagent_type="backend-development::backend-architect"
 - Context: Architecture design and feature requirements
 - Prompt: "Implement backend for feature: $ARGUMENTS with full instrumentation. Include feature flag checks at decision points, comprehensive event tracking for all user actions, performance metrics collection, error tracking and monitoring. Implement proper logging for experiment analysis."
 - Output: Backend code with analytics, feature flag integration, monitoring setup

 ### 8. Frontend Implementation
+
 - Use Task tool with subagent_type="frontend-mobile-development::frontend-developer"
 - Context: Backend APIs and analytics requirements
 - Prompt: "Build frontend for feature: $ARGUMENTS with analytics tracking. Implement event tracking for all user interactions, session recording integration if applicable, performance metrics (Core Web Vitals), and proper error boundaries. Ensure consistent experience between control and treatment groups."
 - Output: Frontend code with analytics, A/B test variants, performance monitoring

 ### 9. ML Model Integration (if applicable)
+
 - Use Task tool with subagent_type="machine-learning-ops::ml-engineer"
 - Context: Feature requirements and data pipelines
 - Prompt: "Integrate ML models for feature: $ARGUMENTS if needed. Implement online inference with low latency, A/B testing between model versions, model performance tracking, and automatic fallback mechanisms. Set up model monitoring for drift detection."
@@ -66,12 +75,14 @@ Build features guided by data insights, A/B testing, and continuous measurement
 ## Phase 4: Pre-Launch Validation

 ### 10. Analytics Validation
+
 - Use Task tool with subagent_type="data-engineering::data-engineer"
 - Context: Implemented tracking and event schemas
 - Prompt: "Validate analytics implementation for: $ARGUMENTS. Test all event tracking in staging, verify data quality and completeness, validate funnel definitions, ensure proper user identification and session tracking. Run end-to-end tests for data pipeline."
 - Output: Validation report, data quality metrics, tracking coverage analysis

 ### 11. Experiment Setup
+
 - Use Task tool with subagent_type="cloud-infrastructure::deployment-engineer"
 - Context: Feature flags and experiment design
 - Prompt: "Configure experiment infrastructure for: $ARGUMENTS. Set up feature flags with proper targeting rules, configure traffic allocation (start with 5-10%), implement kill switches, set up monitoring alerts for key metrics. Test randomization and assignment logic."
@@ -80,12 +91,14 @@ Build features guided by data insights, A/B testing, and continuous measurement
 ## Phase 5: Launch and Experimentation

 ### 12. Gradual Rollout
+
 - Use Task tool with subagent_type="cloud-infrastructure::deployment-engineer"
 - Context: Experiment configuration and monitoring setup
 - Prompt: "Execute gradual rollout for feature: $ARGUMENTS. Start with internal dogfooding, then beta users (1-5%), gradually increase to target traffic. Monitor error rates, performance metrics, and early indicators. Implement automated rollback on anomalies."
 - Output: Rollout execution, monitoring alerts, health metrics

 ### 13. Real-time Monitoring
+
 - Use Task tool with subagent_type="observability-monitoring::observability-engineer"
 - Context: Deployed feature and success metrics
 - Prompt: "Set up comprehensive monitoring for: $ARGUMENTS. Create real-time dashboards for experiment metrics, configure alerts for statistical significance, monitor guardrail metrics for negative impacts, track system performance and error rates. Use tools like Datadog, New Relic, or custom dashboards."
@@ -94,18 +107,21 @@ Build features guided by data insights, A/B testing, and continuous measurement
 ## Phase 6: Analysis and Decision Making

 ### 14. Statistical Analysis
+
 - Use Task tool with subagent_type="machine-learning-ops::data-scientist"
 - Context: Experiment data and original hypotheses
 - Prompt: "Analyze A/B test results for: $ARGUMENTS. Calculate statistical significance with confidence intervals, check for segment-level effects, analyze secondary metrics impact, investigate any unexpected patterns. Use both frequentist and Bayesian approaches. Account for multiple testing if applicable."
 - Output: Statistical analysis report, significance tests, segment analysis

 ### 15. Business Impact Assessment
+
 - Use Task tool with subagent_type="business-analytics::business-analyst"
 - Context: Statistical analysis and business metrics
 - Prompt: "Assess business impact of feature: $ARGUMENTS. Calculate actual vs expected ROI, analyze impact on key business metrics, evaluate cost-benefit including operational overhead, project long-term value. Make recommendation on full rollout, iteration, or rollback."
 - Output: Business impact report, ROI analysis, recommendation document

 ### 16. Post-Launch Optimization
+
 - Use Task tool with subagent_type="machine-learning-ops::data-scientist"
 - Context: Launch results and user feedback
 - Prompt: "Identify optimization opportunities for: $ARGUMENTS based on data. Analyze user behavior patterns in treatment group, identify friction points in user journey, suggest improvements based on data, plan follow-up experiments. Use cohort analysis for long-term impact."
@@ -118,7 +134,7 @@ experiment_config:
  min_sample_size: 10000
  confidence_level: 0.95
  runtime_days: 14
-  traffic_allocation: "gradual"  # gradual, fixed, or adaptive
+  traffic_allocation: "gradual" # gradual, fixed, or adaptive

 analytics_platforms:
  - amplitude
@@ -126,7 +142,7 @@ analytics_platforms:
  - mixpanel

 feature_flags:
-  provider: "launchdarkly"  # launchdarkly, split, optimizely, unleash
+  provider: "launchdarkly" # launchdarkly, split, optimizely, unleash

 statistical_methods:
  - frequentist
@@ -157,4 +173,4 @@ monitoring:
 - Statistical rigor balanced with business practicality and speed to market
 - Continuous learning loop feeds back into next feature development cycle

-Feature to develop with data-driven approach: $ARGUMENTS
+Feature to develop with data-driven approach: $ARGUMENTS
--- a/plugins/data-engineering/commands/data-pipeline.md
+++ b/plugins/data-engineering/commands/data-pipeline.md
@@ -20,26 +20,32 @@ $ARGUMENTS
 ## Instructions

 ### 1. Architecture Design
+
 - Assess: sources, volume, latency requirements, targets
 - Select pattern: ETL (transform before load), ELT (load then transform), Lambda (batch + speed layers), Kappa (stream-only), Lakehouse (unified)
 - Design flow: sources → ingestion → processing → storage → serving
 - Add observability touchpoints

 ### 2. Ingestion Implementation
+
 **Batch**
+
 - Incremental loading with watermark columns
 - Retry logic with exponential backoff
 - Schema validation and dead letter queue for invalid records
- Metadata tracking (_extracted_at, _source)
+- Metadata tracking (\_extracted_at, \_source)

 **Streaming**
+
 - Kafka consumers with exactly-once semantics
 - Manual offset commits within transactions
 - Windowing for time-based aggregations
 - Error handling and replay capability

 ### 3. Orchestration
+
 **Airflow**
+
 - Task groups for logical organization
 - XCom for inter-task communication
 - SLA monitoring and email alerts
@@ -47,12 +53,14 @@ $ARGUMENTS
 - Retry with exponential backoff

 **Prefect**
+
 - Task caching for idempotency
 - Parallel execution with .submit()
 - Artifacts for visibility
 - Automatic retries with configurable delays

 ### 4. Transformation with dbt
+
 - Staging layer: incremental materialization, deduplication, late-arriving data handling
 - Marts layer: dimensional models, aggregations, business logic
 - Tests: unique, not_null, relationships, accepted_values, custom data quality tests
@@ -60,7 +68,9 @@ $ARGUMENTS
 - Incremental strategy: merge or delete+insert

 ### 5. Data Quality Framework
+
 **Great Expectations**
+
 - Table-level: row count, column count
 - Column-level: uniqueness, nullability, type validation, value sets, ranges
 - Checkpoints for validation execution
@@ -68,12 +78,15 @@ $ARGUMENTS
 - Failure notifications

 **dbt Tests**
+
 - Schema tests in YAML
 - Custom data quality tests with dbt-expectations
 - Test results tracked in metadata

 ### 6. Storage Strategy
+
 **Delta Lake**
+
 - ACID transactions with append/overwrite/merge modes
 - Upsert with predicate-based matching
 - Time travel for historical queries
@@ -81,6 +94,7 @@ $ARGUMENTS
 - Vacuum to remove old files

 **Apache Iceberg**
+
 - Partitioning and sort order optimization
 - MERGE INTO for upserts
 - Snapshot isolation and time travel
@@ -88,7 +102,9 @@ $ARGUMENTS
 - Snapshot expiration for cleanup

 ### 7. Monitoring & Cost Optimization
+
 **Monitoring**
+
 - Track: records processed/failed, data size, execution time, success/failure rates
 - CloudWatch metrics and custom namespaces
 - SNS alerts for critical/warning/info events
@@ -96,6 +112,7 @@ $ARGUMENTS
 - Performance trend analysis

 **Cost Optimization**
+
 - Partitioning: date/entity-based, avoid over-partitioning (keep >1GB)
 - File sizes: 512MB-1GB for Parquet
 - Lifecycle policies: hot (Standard) → warm (IA) → cold (Glacier)
@@ -144,12 +161,14 @@ ingester.save_dead_letter_queue('s3://lake/dlq/orders')
 ## Output Deliverables

 ### 1. Architecture Documentation
+
 - Architecture diagram with data flow
 - Technology stack with justification
 - Scalability analysis and growth patterns
 - Failure modes and recovery strategies

 ### 2. Implementation Code
+
 - Ingestion: batch/streaming with error handling
 - Transformation: dbt models (staging → marts) or Spark jobs
 - Orchestration: Airflow/Prefect DAGs with dependencies
@@ -157,18 +176,21 @@ ingester.save_dead_letter_queue('s3://lake/dlq/orders')
 - Data quality: Great Expectations suites and dbt tests

 ### 3. Configuration Files
+
 - Orchestration: DAG definitions, schedules, retry policies
 - dbt: models, sources, tests, project config
 - Infrastructure: Docker Compose, K8s manifests, Terraform
 - Environment: dev/staging/prod configs

 ### 4. Monitoring & Observability
+
 - Metrics: execution time, records processed, quality scores
 - Alerts: failures, performance degradation, data freshness
 - Dashboards: Grafana/CloudWatch for pipeline health
 - Logging: structured logs with correlation IDs

 ### 5. Operations Guide
+
 - Deployment procedures and rollback strategy
 - Troubleshooting guide for common issues
 - Scaling guide for increased volume
@@ -176,6 +198,7 @@ ingester.save_dead_letter_queue('s3://lake/dlq/orders')
 - Disaster recovery and backup procedures

 ## Success Criteria
+
 - Pipeline meets defined SLA (latency, throughput)
 - Data quality checks pass with >99% success rate
 - Automatic retry and alerting on failures