mirror of
https://github.com/wshobson/agents.git
synced 2026-03-18 09:37:15 +00:00
style: format all files with prettier
This commit is contained in:
@@ -32,13 +32,13 @@ Tasks (one per partition)
|
||||
|
||||
### 2. Key Performance Factors
|
||||
|
||||
| Factor | Impact | Solution |
|
||||
|--------|--------|----------|
|
||||
| **Shuffle** | Network I/O, disk I/O | Minimize wide transformations |
|
||||
| **Data Skew** | Uneven task duration | Salting, broadcast joins |
|
||||
| **Serialization** | CPU overhead | Use Kryo, columnar formats |
|
||||
| **Memory** | GC pressure, spills | Tune executor memory |
|
||||
| **Partitions** | Parallelism | Right-size partitions |
|
||||
| Factor | Impact | Solution |
|
||||
| ----------------- | --------------------- | ----------------------------- |
|
||||
| **Shuffle** | Network I/O, disk I/O | Minimize wide transformations |
|
||||
| **Data Skew** | Uneven task duration | Salting, broadcast joins |
|
||||
| **Serialization** | CPU overhead | Use Kryo, columnar formats |
|
||||
| **Memory** | GC pressure, spills | Tune executor memory |
|
||||
| **Partitions** | Parallelism | Right-size partitions |
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -395,6 +395,7 @@ spark_configs = {
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
|
||||
- **Enable AQE** - Adaptive query execution handles many issues
|
||||
- **Use Parquet/Delta** - Columnar formats with compression
|
||||
- **Broadcast small tables** - Avoid shuffle for small joins
|
||||
@@ -402,6 +403,7 @@ spark_configs = {
|
||||
- **Right-size partitions** - 128MB - 256MB per partition
|
||||
|
||||
### Don'ts
|
||||
|
||||
- **Don't collect large data** - Keep data distributed
|
||||
- **Don't use UDFs unnecessarily** - Use built-in functions
|
||||
- **Don't over-cache** - Memory is limited
|
||||
|
||||
Reference in New Issue
Block a user