style: format all files with prettier

This commit is contained in:
Seth Hobson
2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions

View File

@@ -19,12 +19,14 @@ The analysis scope may include specific error messages, stack traces, log files,
Classify errors into these categories to inform your debugging strategy:
**By Severity:**
- **Critical**: System down, data loss, security breach, complete service unavailability
- **High**: Major feature broken, significant user impact, data corruption risk
- **Medium**: Partial feature degradation, workarounds available, performance issues
- **Low**: Minor bugs, cosmetic issues, edge cases with minimal impact
**By Type:**
- **Runtime Errors**: Exceptions, crashes, segmentation faults, null pointer dereferences
- **Logic Errors**: Incorrect behavior, wrong calculations, invalid state transitions
- **Integration Errors**: API failures, network timeouts, external service issues
@@ -33,6 +35,7 @@ Classify errors into these categories to inform your debugging strategy:
- **Security Errors**: Authentication failures, authorization violations, injection attempts
**By Observability:**
- **Deterministic**: Consistently reproducible with known inputs
- **Intermittent**: Occurs sporadically, often timing or race condition related
- **Environmental**: Only happens in specific environments or configurations
@@ -106,6 +109,7 @@ For errors in microservices and distributed systems:
Extract maximum information from stack traces:
**Key Elements:**
- **Error Type**: What kind of exception/error occurred
- **Error Message**: Contextual information about the failure
- **Origin Point**: The deepest frame where the error was thrown
@@ -114,6 +118,7 @@ Extract maximum information from stack traces:
- **Async Boundaries**: Identify where asynchronous operations break the trace
**Analysis Strategy:**
1. Start at the top of the stack (origin of error)
2. Identify the first frame in your application code (not framework/library)
3. Examine that frame's context: input parameters, local variables, state
@@ -134,28 +139,34 @@ Modern error tracking tools provide enhanced stack traces:
### Common Stack Trace Patterns
**Pattern: Null Pointer Exception Deep in Framework Code**
```
NullPointerException
at java.util.HashMap.hash(HashMap.java:339)
at java.util.HashMap.get(HashMap.java:556)
at com.myapp.service.UserService.findUser(UserService.java:45)
```
Root Cause: Application passed null to framework code. Focus on UserService.java:45.
**Pattern: Timeout After Long Wait**
```
TimeoutException: Operation timed out after 30000ms
at okhttp3.internal.http2.Http2Stream.waitForIo
at com.myapp.api.PaymentClient.processPayment(PaymentClient.java:89)
```
Root Cause: External service slow/unresponsive. Need retry logic and circuit breaker.
**Pattern: Race Condition in Concurrent Code**
```
ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification
at com.myapp.processor.BatchProcessor.process(BatchProcessor.java:112)
```
Root Cause: Collection modified while being iterated. Need thread-safe data structures or synchronization.
## Log Aggregation and Pattern Matching
@@ -165,6 +176,7 @@ Root Cause: Collection modified while being iterated. Need thread-safe data stru
Implement JSON-based structured logging for machine-readable logs:
**Standard Log Schema:**
```json
{
"timestamp": "2025-10-11T14:23:45.123Z",
@@ -203,6 +215,7 @@ Implement JSON-based structured logging for machine-readable logs:
```
**Key Fields to Always Include:**
- `timestamp`: ISO 8601 format in UTC
- `level`: ERROR, WARN, INFO, DEBUG, TRACE
- `correlation_id`: Unique ID for the entire request chain
@@ -216,48 +229,52 @@ Implement JSON-based structured logging for machine-readable logs:
Implement correlation IDs to track requests across distributed systems:
**Node.js/Express Middleware:**
```javascript
const { v4: uuidv4 } = require('uuid');
const asyncLocalStorage = require('async-local-storage');
const { v4: uuidv4 } = require("uuid");
const asyncLocalStorage = require("async-local-storage");
// Middleware to generate/propagate correlation ID
function correlationIdMiddleware(req, res, next) {
const correlationId = req.headers['x-correlation-id'] || uuidv4();
const correlationId = req.headers["x-correlation-id"] || uuidv4();
req.correlationId = correlationId;
res.setHeader('x-correlation-id', correlationId);
res.setHeader("x-correlation-id", correlationId);
// Store in async context for access in nested calls
asyncLocalStorage.run(new Map(), () => {
asyncLocalStorage.set('correlationId', correlationId);
asyncLocalStorage.set("correlationId", correlationId);
next();
});
}
// Propagate to downstream services
function makeApiCall(url, data) {
const correlationId = asyncLocalStorage.get('correlationId');
const correlationId = asyncLocalStorage.get("correlationId");
return axios.post(url, data, {
headers: {
'x-correlation-id': correlationId,
'x-source-service': 'api-gateway'
}
"x-correlation-id": correlationId,
"x-source-service": "api-gateway",
},
});
}
// Include in all log statements
function log(level, message, context = {}) {
const correlationId = asyncLocalStorage.get('correlationId');
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
level,
correlation_id: correlationId,
message,
...context
}));
const correlationId = asyncLocalStorage.get("correlationId");
console.log(
JSON.stringify({
timestamp: new Date().toISOString(),
level,
correlation_id: correlationId,
message,
...context,
}),
);
}
```
**Python/Flask Implementation:**
```python
import uuid
import logging
@@ -302,6 +319,7 @@ def log_structured(level, message, **context):
### Log Aggregation Architecture
**Centralized Logging Pipeline:**
1. **Application**: Outputs structured JSON logs to stdout/stderr
2. **Log Shipper**: Fluentd/Fluent Bit/Vector collects logs from containers
3. **Log Aggregator**: Elasticsearch/Loki/DataDog receives and indexes logs
@@ -309,6 +327,7 @@ def log_structured(level, message, **context):
5. **Alerting**: Trigger alerts on error patterns and thresholds
**Log Query Examples (Elasticsearch DSL):**
```json
// Find all errors for a specific correlation ID
{
@@ -382,6 +401,7 @@ Use log analysis to identify patterns:
For deterministic errors in development:
**Debugger Setup:**
1. Set breakpoint before the error occurs
2. Step through code execution line by line
3. Inspect variable values and object state
@@ -390,6 +410,7 @@ For deterministic errors in development:
6. Modify variables to test hypotheses
**Modern Debugging Tools:**
- **VS Code Debugger**: Integrated debugging for JavaScript, Python, Go, Java, C++
- **Chrome DevTools**: Frontend debugging with network, performance, and memory profiling
- **pdb/ipdb (Python)**: Interactive debugger with post-mortem analysis
@@ -412,6 +433,7 @@ For errors in production environments where debuggers aren't available:
8. **Traffic Mirroring**: Replay production traffic in staging for safe investigation
**Remote Debugging (Use Cautiously):**
- Attach debugger to running process only in non-critical services
- Use read-only breakpoints that don't pause execution
- Time-box debugging sessions strictly
@@ -420,10 +442,11 @@ For errors in production environments where debuggers aren't available:
### Memory and Performance Debugging
**Memory Leak Detection:**
```javascript
// Node.js heap snapshot comparison
const v8 = require('v8');
const fs = require('fs');
const v8 = require("v8");
const fs = require("fs");
function takeHeapSnapshot(filename) {
const snapshot = v8.writeHeapSnapshot(filename);
@@ -431,15 +454,16 @@ function takeHeapSnapshot(filename) {
}
// Take snapshots at intervals
takeHeapSnapshot('heap-before.heapsnapshot');
takeHeapSnapshot("heap-before.heapsnapshot");
// ... run operations that might leak ...
takeHeapSnapshot('heap-after.heapsnapshot');
takeHeapSnapshot("heap-after.heapsnapshot");
// Analyze in Chrome DevTools Memory profiler
// Look for objects with increasing retained size
```
**Performance Profiling:**
```python
# Python profiling with cProfile
import cProfile
@@ -465,6 +489,7 @@ def profile_function():
### Input Validation and Type Safety
**Defensive Programming:**
```typescript
// TypeScript: Leverage type system for compile-time safety
interface PaymentRequest {
@@ -477,19 +502,19 @@ interface PaymentRequest {
function processPayment(request: PaymentRequest): PaymentResult {
// Runtime validation for external inputs
if (request.amount <= 0) {
throw new ValidationError('Amount must be positive');
throw new ValidationError("Amount must be positive");
}
if (!['USD', 'EUR', 'GBP'].includes(request.currency)) {
throw new ValidationError('Unsupported currency');
if (!["USD", "EUR", "GBP"].includes(request.currency)) {
throw new ValidationError("Unsupported currency");
}
// Use Zod or Yup for complex validation
const schema = z.object({
amount: z.number().positive().max(1000000),
currency: z.enum(['USD', 'EUR', 'GBP']),
currency: z.enum(["USD", "EUR", "GBP"]),
customerId: z.string().uuid(),
paymentMethodId: z.string().min(1)
paymentMethodId: z.string().min(1),
});
const validated = schema.parse(request);
@@ -500,6 +525,7 @@ function processPayment(request: PaymentRequest): PaymentResult {
```
**Python Type Hints and Validation:**
```python
from typing import Optional
from pydantic import BaseModel, validator, Field
@@ -532,6 +558,7 @@ def process_payment(request: PaymentRequest) -> PaymentResult:
### Error Boundaries and Graceful Degradation
**React Error Boundaries:**
```typescript
import React, { Component, ErrorInfo, ReactNode } from 'react';
import * as Sentry from '@sentry/react';
@@ -589,6 +616,7 @@ export default ErrorBoundary;
```
**Circuit Breaker Pattern:**
```python
from datetime import datetime, timedelta
from enum import Enum
@@ -672,8 +700,8 @@ async function retryWithBackoff<T>(
maxAttempts: 3,
baseDelayMs: 1000,
maxDelayMs: 30000,
exponentialBase: 2
}
exponentialBase: 2,
},
): Promise<T> {
let lastError: Error;
@@ -684,23 +712,27 @@ async function retryWithBackoff<T>(
lastError = error as Error;
// Check if error is retryable
if (options.retryableErrors &&
!options.retryableErrors.includes(error.name)) {
if (
options.retryableErrors &&
!options.retryableErrors.includes(error.name)
) {
throw error; // Don't retry non-retryable errors
}
if (attempt < options.maxAttempts - 1) {
const delay = Math.min(
options.baseDelayMs * Math.pow(options.exponentialBase, attempt),
options.maxDelayMs
options.maxDelayMs,
);
// Add jitter to prevent thundering herd
const jitter = Math.random() * 0.1 * delay;
const actualDelay = delay + jitter;
console.log(`Attempt ${attempt + 1} failed, retrying in ${actualDelay}ms`);
await new Promise(resolve => setTimeout(resolve, actualDelay));
console.log(
`Attempt ${attempt + 1} failed, retrying in ${actualDelay}ms`,
);
await new Promise((resolve) => setTimeout(resolve, actualDelay));
}
}
}
@@ -710,14 +742,14 @@ async function retryWithBackoff<T>(
// Usage
const result = await retryWithBackoff(
() => fetch('https://api.example.com/data'),
() => fetch("https://api.example.com/data"),
{
maxAttempts: 3,
baseDelayMs: 1000,
maxDelayMs: 10000,
exponentialBase: 2,
retryableErrors: ['NetworkError', 'TimeoutError']
}
retryableErrors: ["NetworkError", "TimeoutError"],
},
);
```
@@ -726,6 +758,7 @@ const result = await retryWithBackoff(
### Modern Observability Stack (2025)
**Recommended Architecture:**
- **Metrics**: Prometheus + Grafana or DataDog
- **Logs**: Elasticsearch/Loki + Fluentd or DataDog Logs
- **Traces**: OpenTelemetry + Jaeger/Tempo or DataDog APM
@@ -736,9 +769,10 @@ const result = await retryWithBackoff(
### Sentry Integration
**Node.js/Express Setup:**
```javascript
const Sentry = require('@sentry/node');
const { ProfilingIntegration } = require('@sentry/profiling-node');
const Sentry = require("@sentry/node");
const { ProfilingIntegration } = require("@sentry/profiling-node");
Sentry.init({
dsn: process.env.SENTRY_DSN,
@@ -766,11 +800,11 @@ Sentry.init({
event.tags = {
...event.tags,
region: process.env.AWS_REGION,
instance_id: process.env.INSTANCE_ID
instance_id: process.env.INSTANCE_ID,
};
return event;
}
},
});
// Express middleware
@@ -790,19 +824,19 @@ function processOrder(orderId) {
} catch (error) {
Sentry.captureException(error, {
tags: {
operation: 'process_order',
order_id: orderId
operation: "process_order",
order_id: orderId,
},
contexts: {
order: {
id: orderId,
status: order?.status,
amount: order?.amount
}
amount: order?.amount,
},
},
user: {
id: order?.customerId
}
id: order?.customerId,
},
});
throw error;
}
@@ -812,6 +846,7 @@ function processOrder(orderId) {
### DataDog APM Integration
**Python/Flask Setup:**
```python
from ddtrace import patch_all, tracer
from ddtrace.contrib.flask import TraceMiddleware
@@ -854,6 +889,7 @@ def charge_payment():
### OpenTelemetry Implementation
**Go Service with OpenTelemetry:**
```go
package main
@@ -968,7 +1004,7 @@ monitors:
- name: "New Error Type Detected"
type: log
query: "logs(\"level:ERROR service:payment-service\").rollup(\"count\").by(\"error.fingerprint\").last(\"5m\") > 0"
query: 'logs("level:ERROR service:payment-service").rollup("count").by("error.fingerprint").last("5m") > 0'
message: |
New error type detected in payment service: {{error.fingerprint}}
@@ -1001,6 +1037,7 @@ monitors:
### Incident Response Workflow
**Phase 1: Detection and Triage (0-5 minutes)**
1. Acknowledge the alert/incident
2. Check incident severity and user impact
3. Assign incident commander
@@ -1008,6 +1045,7 @@ monitors:
5. Update status page if customer-facing
**Phase 2: Investigation (5-30 minutes)**
1. Gather observability data:
- Error rates from Sentry/DataDog
- Traces showing failed requests
@@ -1022,6 +1060,7 @@ monitors:
4. Document findings in incident log
**Phase 3: Mitigation (Immediate)**
1. Implement immediate fix based on hypothesis:
- Rollback recent deployment
- Scale up resources
@@ -1032,6 +1071,7 @@ monitors:
3. Monitor for 15-30 minutes to ensure stability
**Phase 4: Recovery and Validation**
1. Verify all systems operational
2. Check data consistency
3. Process queued/failed requests
@@ -1039,6 +1079,7 @@ monitors:
5. Notify stakeholders
**Phase 5: Post-Incident Review**
1. Schedule postmortem within 48 hours
2. Create detailed timeline of events
3. Identify root cause (may differ from initial hypothesis)
@@ -1090,6 +1131,7 @@ GET /logs-*/_search
### Communication Templates
**Initial Incident Notification:**
```
🚨 INCIDENT: Payment Processing Errors
@@ -1113,6 +1155,7 @@ Status Page: https://status.company.com/incident/abc123
```
**Mitigation Notification:**
```
✅ INCIDENT UPDATE: Mitigation Applied

File diff suppressed because it is too large Load Diff

View File

@@ -5,6 +5,7 @@ You are an expert AI-assisted debugging specialist with deep knowledge of modern
Process issue from: $ARGUMENTS
Parse for:
- Error messages/stack traces
- Reproduction steps
- Affected components/services
@@ -15,7 +16,9 @@ Parse for:
## Workflow
### 1. Initial Triage
Use Task tool (subagent_type="debugger") for AI-powered analysis:
- Error pattern recognition
- Stack trace analysis with probable causes
- Component dependency analysis
@@ -24,7 +27,9 @@ Use Task tool (subagent_type="debugger") for AI-powered analysis:
- Recommend debugging strategy
### 2. Observability Data Collection
For production/staging issues, gather:
- Error tracking (Sentry, Rollbar, Bugsnag)
- APM metrics (DataDog, New Relic, Dynatrace)
- Distributed traces (Jaeger, Zipkin, Honeycomb)
@@ -32,6 +37,7 @@ For production/staging issues, gather:
- Session replays (LogRocket, FullStory)
Query for:
- Error frequency/trends
- Affected user cohorts
- Environment-specific patterns
@@ -40,7 +46,9 @@ Query for:
- Deployment timeline correlation
### 3. Hypothesis Generation
For each hypothesis include:
- Probability score (0-100%)
- Supporting evidence from logs/traces/code
- Falsification criteria
@@ -48,6 +56,7 @@ For each hypothesis include:
- Expected symptoms if true
Common categories:
- Logic errors (race conditions, null handling)
- State management (stale cache, incorrect transitions)
- Integration failures (API changes, timeouts, auth)
@@ -56,6 +65,7 @@ Common categories:
- Data corruption (schema mismatches, encoding)
### 4. Strategy Selection
Select based on issue characteristics:
**Interactive Debugging**: Reproducible locally → VS Code/Chrome DevTools, step-through
@@ -65,7 +75,9 @@ Select based on issue characteristics:
**Statistical**: Small % of cases → Delta debugging, compare success vs failure
### 5. Intelligent Instrumentation
AI suggests optimal breakpoint/logpoint locations:
- Entry points to affected functionality
- Decision nodes where behavior diverges
- State mutation points
@@ -75,6 +87,7 @@ AI suggests optimal breakpoint/logpoint locations:
Use conditional breakpoints and logpoints for production-like environments.
### 6. Production-Safe Techniques
**Dynamic Instrumentation**: OpenTelemetry spans, non-invasive attributes
**Feature-Flagged Debug Logging**: Conditional logging for specific users
**Sampling-Based Profiling**: Continuous profiling with minimal overhead (Pyroscope)
@@ -82,7 +95,9 @@ Use conditional breakpoints and logpoints for production-like environments.
**Gradual Traffic Shifting**: Canary deploy debug version to 10% traffic
### 7. Root Cause Analysis
AI-powered code flow analysis:
- Full execution path reconstruction
- Variable state tracking at decision points
- External dependency interaction analysis
@@ -92,7 +107,9 @@ AI-powered code flow analysis:
- Fix complexity estimation
### 8. Fix Implementation
AI generates fix with:
- Code changes required
- Impact assessment
- Risk level
@@ -100,19 +117,23 @@ AI generates fix with:
- Rollback strategy
### 9. Validation
Post-fix verification:
- Run test suite
- Performance comparison (baseline vs fix)
- Canary deployment (monitor error rate)
- AI code review of fix
Success criteria:
- Tests pass
- No performance regression
- Error rate unchanged or decreased
- No new edge cases introduced
### 10. Prevention
- Generate regression tests using AI
- Update knowledge base with root cause
- Add monitoring/alerts for similar issues
@@ -127,7 +148,7 @@ Success criteria:
const analysis = await aiAnalyze({
error: "Payment processing timeout",
frequency: "5% of checkouts",
environment: "production"
environment: "production",
});
// AI suggests: "Likely N+1 query or external API timeout"
@@ -136,7 +157,7 @@ const sentryData = await getSentryIssue("CHECKOUT_TIMEOUT");
const ddTraces = await getDataDogTraces({
service: "checkout",
operation: "process_payment",
duration: ">5000ms"
duration: ">5000ms",
});
// 3. Analyze traces
@@ -144,8 +165,8 @@ const ddTraces = await getDataDogTraces({
// Hypothesis: N+1 query in payment method loading
// 4. Add instrumentation
span.setAttribute('debug.queryCount', queryCount);
span.setAttribute('debug.paymentMethodId', methodId);
span.setAttribute("debug.queryCount", queryCount);
span.setAttribute("debug.paymentMethodId", methodId);
// 5. Deploy to 10% traffic, monitor
// Confirmed: N+1 pattern in payment verification
@@ -162,6 +183,7 @@ span.setAttribute('debug.paymentMethodId', methodId);
## Output Format
Provide structured report:
1. **Issue Summary**: Error, frequency, impact
2. **Root Cause**: Detailed diagnosis with evidence
3. **Fix Proposal**: Code changes, risk, impact