style: format all files with prettier

2026-03-18 17:47:16 +00:00 · 2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions
--- a/plugins/data-engineering/skills/airflow-dag-patterns/SKILL.md
+++ b/plugins/data-engineering/skills/airflow-dag-patterns/SKILL.md
@@ -20,12 +20,12 @@ Production-ready patterns for Apache Airflow including DAG design, operators, se

 ### 1. DAG Design Principles

-| Principle | Description |
-|-----------|-------------|
-| **Idempotent** | Running twice produces same result |
-| **Atomic** | Tasks succeed or fail completely |
-| **Incremental** | Process only new/changed data |
-| **Observable** | Logs, metrics, alerts at every step |
+| Principle       | Description                         |
+| --------------- | ----------------------------------- |
+| **Idempotent**  | Running twice produces same result  |
+| **Atomic**      | Tasks succeed or fail completely    |
+| **Incremental** | Process only new/changed data       |
+| **Observable**  | Logs, metrics, alerts at every step |

 ### 2. Task Dependencies

@@ -503,6 +503,7 @@ airflow/
 ## Best Practices

 ### Do's
+
 - **Use TaskFlow API** - Cleaner code, automatic XCom
 - **Set timeouts** - Prevent zombie tasks
 - **Use `mode='reschedule'`** - For sensors, free up workers
@@ -510,6 +511,7 @@ airflow/
 - **Idempotent tasks** - Safe to retry

 ### Don'ts
+
 - **Don't use `depends_on_past=True`** - Creates bottlenecks
 - **Don't hardcode dates** - Use `{{ ds }}` macros
 - **Don't use global state** - Tasks should be stateless
--- a/plugins/data-engineering/skills/data-quality-frameworks/SKILL.md
+++ b/plugins/data-engineering/skills/data-quality-frameworks/SKILL.md
@@ -20,14 +20,14 @@ Production patterns for implementing data quality with Great Expectations, dbt t

 ### 1. Data Quality Dimensions

-| Dimension | Description | Example Check |
-|-----------|-------------|---------------|
-| **Completeness** | No missing values | `expect_column_values_to_not_be_null` |
-| **Uniqueness** | No duplicates | `expect_column_values_to_be_unique` |
-| **Validity** | Values in expected range | `expect_column_values_to_be_in_set` |
-| **Accuracy** | Data matches reality | Cross-reference validation |
-| **Consistency** | No contradictions | `expect_column_pair_values_A_to_be_greater_than_B` |
-| **Timeliness** | Data is recent | `expect_column_max_to_be_between` |
+| Dimension        | Description              | Example Check                                      |
+| ---------------- | ------------------------ | -------------------------------------------------- |
+| **Completeness** | No missing values        | `expect_column_values_to_not_be_null`              |
+| **Uniqueness**   | No duplicates            | `expect_column_values_to_be_unique`                |
+| **Validity**     | Values in expected range | `expect_column_values_to_be_in_set`                |
+| **Accuracy**     | Data matches reality     | Cross-reference validation                         |
+| **Consistency**  | No contradictions        | `expect_column_pair_values_A_to_be_greater_than_B` |
+| **Timeliness**   | Data is recent           | `expect_column_max_to_be_between`                  |

 ### 2. Testing Pyramid for Data

@@ -191,7 +191,7 @@ validations:
      data_connector_name: default_inferred_data_connector_name
      data_asset_name: orders
      data_connector_query:
-        index: -1  # Latest batch
+        index: -1 # Latest batch
    expectation_suite_name: orders_suite

 action_list:
@@ -270,7 +270,8 @@ models:
      - name: order_status
        tests:
          - accepted_values:
-              values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
+              values:
+                ["pending", "processing", "shipped", "delivered", "cancelled"]

      - name: total_amount
        tests:
@@ -566,6 +567,7 @@ if not all(r.passed for r in results.values()):
 ## Best Practices

 ### Do's
+
 - **Test early** - Validate source data before transformations
 - **Test incrementally** - Add tests as you find issues
 - **Document expectations** - Clear descriptions for each test
@@ -573,6 +575,7 @@ if not all(r.passed for r in results.values()):
 - **Version contracts** - Track schema changes

 ### Don'ts
+
 - **Don't test everything** - Focus on critical columns
 - **Don't ignore warnings** - They often precede failures
 - **Don't skip freshness** - Stale data is bad data
--- a/plugins/data-engineering/skills/dbt-transformation-patterns/SKILL.md
+++ b/plugins/data-engineering/skills/dbt-transformation-patterns/SKILL.md
@@ -32,19 +32,19 @@ marts/            Final analytics tables

 ### 2. Naming Conventions

-| Layer | Prefix | Example |
-|-------|--------|---------|
-| Staging | `stg_` | `stg_stripe__payments` |
-| Intermediate | `int_` | `int_payments_pivoted` |
-| Marts | `dim_`, `fct_` | `dim_customers`, `fct_orders` |
+| Layer        | Prefix         | Example                       |
+| ------------ | -------------- | ----------------------------- |
+| Staging      | `stg_`         | `stg_stripe__payments`        |
+| Intermediate | `int_`         | `int_payments_pivoted`        |
+| Marts        | `dim_`, `fct_` | `dim_customers`, `fct_orders` |

 ## Quick Start

 ```yaml
 # dbt_project.yml
-name: 'analytics'
-version: '1.0.0'
-profile: 'analytics'
+name: "analytics"
+version: "1.0.0"
+profile: "analytics"

 model-paths: ["models"]
 analysis-paths: ["analyses"]
@@ -53,7 +53,7 @@ seed-paths: ["seeds"]
 macro-paths: ["macros"]

 vars:
-  start_date: '2020-01-01'
+  start_date: "2020-01-01"

 models:
  analytics:
@@ -107,8 +107,8 @@ sources:
    loader: fivetran
    loaded_at_field: _fivetran_synced
    freshness:
-      warn_after: {count: 12, period: hour}
-      error_after: {count: 24, period: hour}
+      warn_after: { count: 12, period: hour }
+      error_after: { count: 24, period: hour }
    tables:
      - name: customers
        description: Stripe customer records
@@ -409,7 +409,7 @@ models:
        description: Customer value tier based on lifetime value
        tests:
          - accepted_values:
-              values: ['high', 'medium', 'low']
+              values: ["high", "medium", "low"]

      - name: lifetime_value
        description: Total amount paid by customer
@@ -540,6 +540,7 @@ dbt ls --select tag:critical     # List models by tag
 ## Best Practices

 ### Do's
+
 - **Use staging layer** - Clean data once, use everywhere
 - **Test aggressively** - Not null, unique, relationships
 - **Document everything** - Column descriptions, model descriptions
@@ -547,6 +548,7 @@ dbt ls --select tag:critical     # List models by tag
 - **Version control** - dbt project in Git

 ### Don'ts
+
 - **Don't skip staging** - Raw → mart is tech debt
 - **Don't hardcode dates** - Use `{{ var('start_date') }}`
 - **Don't repeat logic** - Extract to macros
--- a/plugins/data-engineering/skills/spark-optimization/SKILL.md
+++ b/plugins/data-engineering/skills/spark-optimization/SKILL.md
@@ -32,13 +32,13 @@ Tasks (one per partition)

 ### 2. Key Performance Factors

-| Factor | Impact | Solution |
-|--------|--------|----------|
-| **Shuffle** | Network I/O, disk I/O | Minimize wide transformations |
-| **Data Skew** | Uneven task duration | Salting, broadcast joins |
-| **Serialization** | CPU overhead | Use Kryo, columnar formats |
-| **Memory** | GC pressure, spills | Tune executor memory |
-| **Partitions** | Parallelism | Right-size partitions |
+| Factor            | Impact                | Solution                      |
+| ----------------- | --------------------- | ----------------------------- |
+| **Shuffle**       | Network I/O, disk I/O | Minimize wide transformations |
+| **Data Skew**     | Uneven task duration  | Salting, broadcast joins      |
+| **Serialization** | CPU overhead          | Use Kryo, columnar formats    |
+| **Memory**        | GC pressure, spills   | Tune executor memory          |
+| **Partitions**    | Parallelism           | Right-size partitions         |

 ## Quick Start

@@ -395,6 +395,7 @@ spark_configs = {
 ## Best Practices

 ### Do's
+
 - **Enable AQE** - Adaptive query execution handles many issues
 - **Use Parquet/Delta** - Columnar formats with compression
 - **Broadcast small tables** - Avoid shuffle for small joins
@@ -402,6 +403,7 @@ spark_configs = {
 - **Right-size partitions** - 128MB - 256MB per partition

 ### Don'ts
+
 - **Don't collect large data** - Keep data distributed
 - **Don't use UDFs unnecessarily** - Use built-in functions
 - **Don't over-cache** - Memory is limited