style: format all files with prettier

This commit is contained in:
Seth Hobson
2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions

View File

@@ -32,13 +32,13 @@ Tasks (one per partition)
### 2. Key Performance Factors
| Factor | Impact | Solution |
|--------|--------|----------|
| **Shuffle** | Network I/O, disk I/O | Minimize wide transformations |
| **Data Skew** | Uneven task duration | Salting, broadcast joins |
| **Serialization** | CPU overhead | Use Kryo, columnar formats |
| **Memory** | GC pressure, spills | Tune executor memory |
| **Partitions** | Parallelism | Right-size partitions |
| Factor | Impact | Solution |
| ----------------- | --------------------- | ----------------------------- |
| **Shuffle** | Network I/O, disk I/O | Minimize wide transformations |
| **Data Skew** | Uneven task duration | Salting, broadcast joins |
| **Serialization** | CPU overhead | Use Kryo, columnar formats |
| **Memory** | GC pressure, spills | Tune executor memory |
| **Partitions** | Parallelism | Right-size partitions |
## Quick Start
@@ -395,6 +395,7 @@ spark_configs = {
## Best Practices
### Do's
- **Enable AQE** - Adaptive query execution handles many issues
- **Use Parquet/Delta** - Columnar formats with compression
- **Broadcast small tables** - Avoid shuffle for small joins
@@ -402,6 +403,7 @@ spark_configs = {
- **Right-size partitions** - 128MB - 256MB per partition
### Don'ts
- **Don't collect large data** - Keep data distributed
- **Don't use UDFs unnecessarily** - Use built-in functions
- **Don't over-cache** - Memory is limited