style: format all files with prettier

2026-03-18 09:37:15 +00:00 · 2026-01-19 17:07:03 -05:00
parent 8d37048deb
commit 56848874a2
355 changed files with 15215 additions and 10241 deletions
--- a/plugins/incident-response/skills/incident-runbook-templates/SKILL.md
+++ b/plugins/incident-response/skills/incident-runbook-templates/SKILL.md
@@ -20,12 +20,12 @@ Production-ready templates for incident response runbooks covering detection, tr

 ### 1. Incident Severity Levels

-| Severity | Impact | Response Time | Example |
-|----------|--------|---------------|---------|
-| **SEV1** | Complete outage, data loss | 15 min | Production down |
-| **SEV2** | Major degradation | 30 min | Critical feature broken |
-| **SEV3** | Minor impact | 2 hours | Non-critical bug |
-| **SEV4** | Minimal impact | Next business day | Cosmetic issue |
+| Severity | Impact                     | Response Time     | Example                 |
+| -------- | -------------------------- | ----------------- | ----------------------- |
+| **SEV1** | Complete outage, data loss | 15 min            | Production down         |
+| **SEV2** | Major degradation          | 30 min            | Critical feature broken |
+| **SEV3** | Minor impact               | 2 hours           | Non-critical bug        |
+| **SEV4** | Minimal impact             | Next business day | Cosmetic issue          |

 ### 2. Runbook Structure

@@ -45,28 +45,33 @@ Production-ready templates for incident response runbooks covering detection, tr

 ### Template 1: Service Outage Runbook

-```markdown
+````markdown
 # [Service Name] Outage Runbook

 ## Overview
+
 **Service**: Payment Processing Service
 **Owner**: Platform Team
 **Slack**: #payments-incidents
 **PagerDuty**: payments-oncall

 ## Impact Assessment
+
 - [ ] Which customers are affected?
 - [ ] What percentage of traffic is impacted?
 - [ ] Are there financial implications?
 - [ ] What's the blast radius?

 ## Detection
+
 ### Alerts
+
 - `payment_error_rate > 5%` (PagerDuty)
 - `payment_latency_p99 > 2s` (Slack)
 - `payment_success_rate < 95%` (PagerDuty)

 ### Dashboards
+
 - [Payment Service Dashboard](https://grafana/d/payments)
 - [Error Tracking](https://sentry.io/payments)
 - [Dependency Status](https://status.stripe.com)
@@ -74,6 +79,7 @@ Production-ready templates for incident response runbooks covering detection, tr
 ## Initial Triage (First 5 Minutes)

 ### 1. Assess Scope
+
 ```bash
 # Check service health
 kubectl get pods -n payments -l app=payment-service
@@ -84,24 +90,28 @@ kubectl rollout history deployment/payment-service -n payments
 # Check error rates
 curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total{status=~'5..'}[5m]))"
 ```
+````

 ### 2. Quick Health Checks
+
 - [ ] Can you reach the service? `curl -I https://api.company.com/payments/health`
 - [ ] Database connectivity? Check connection pool metrics
 - [ ] External dependencies? Check Stripe, bank API status
 - [ ] Recent changes? Check deploy history

 ### 3. Initial Classification
-| Symptom | Likely Cause | Go To Section |
-|---------|--------------|---------------|
-| All requests failing | Service down | Section 4.1 |
-| High latency | Database/dependency | Section 4.2 |
-| Partial failures | Code bug | Section 4.3 |
-| Spike in errors | Traffic surge | Section 4.4 |
+
+| Symptom              | Likely Cause        | Go To Section |
+| -------------------- | ------------------- | ------------- |
+| All requests failing | Service down        | Section 4.1   |
+| High latency         | Database/dependency | Section 4.2   |
+| Partial failures     | Code bug            | Section 4.3   |
+| Spike in errors      | Traffic surge       | Section 4.4   |

 ## Mitigation Procedures

 ### 4.1 Service Completely Down
+
 ```bash
 # Step 1: Check pod status
 kubectl get pods -n payments
@@ -123,6 +133,7 @@ kubectl rollout status deployment/payment-service -n payments
 ```

 ### 4.2 High Latency
+
 ```bash
 # Step 1: Check database connections
 kubectl exec -n payments deploy/payment-service -- \
@@ -147,6 +158,7 @@ kubectl set env deployment/payment-service \
 ```

 ### 4.3 Partial Failures (Specific Errors)
+
 ```bash
 # Step 1: Identify error pattern
 kubectl logs -n payments -l app=payment-service --tail=500 | \
@@ -167,6 +179,7 @@ psql -h $DB_HOST -c "
 ```

 ### 4.4 Traffic Surge
+
 ```bash
 # Step 1: Check current request rate
 kubectl top pods -n payments
@@ -200,6 +213,7 @@ EOF
 ```

 ## Verification Steps
+
 ```bash
 # Verify service is healthy
 curl -s https://api.company.com/payments/health | jq
@@ -215,6 +229,7 @@ curl -s "http://prometheus:9090/api/v1/query?query=histogram_quantile(0.99,sum(r
 ```

 ## Rollback Procedures
+
 ```bash
 # Rollback Kubernetes deployment
 kubectl rollout undo deployment/payment-service -n payments
@@ -229,16 +244,17 @@ curl -X POST https://api.company.com/internal/feature-flags \

 ## Escalation Matrix

-| Condition | Escalate To | Contact |
-|-----------|-------------|---------|
-| > 15 min unresolved SEV1 | Engineering Manager | @manager (Slack) |
-| Data breach suspected | Security Team | #security-incidents |
-| Financial impact > $10k | Finance + Legal | @finance-oncall |
-| Customer communication needed | Support Lead | @support-lead |
+| Condition                     | Escalate To         | Contact             |
+| ----------------------------- | ------------------- | ------------------- |
+| > 15 min unresolved SEV1      | Engineering Manager | @manager (Slack)    |
+| Data breach suspected         | Security Team       | #security-incidents |
+| Financial impact > $10k       | Finance + Legal     | @finance-oncall     |
+| Customer communication needed | Support Lead        | @support-lead       |

 ## Communication Templates

 ### Initial Notification (Internal)
+
 ```
 🚨 INCIDENT: Payment Service Degradation

@@ -257,6 +273,7 @@ Updates in #payments-incidents
 ```

 ### Status Update
+
 ```
 📊 UPDATE: Payment Service Incident

@@ -276,6 +293,7 @@ ETA to Resolution: ~15 minutes
 ```

 ### Resolution Notification
+
 ```
 ✅ RESOLVED: Payment Service Incident

@@ -291,7 +309,8 @@ Follow-up:
 - Postmortem scheduled for [DATE]
 - Bug fix in progress
 ```
-```
+
+````

 ### Template 2: Database Incident Runbook

@@ -325,9 +344,10 @@ SELECT pg_terminate_backend(pid)
 FROM pg_stat_activity
 WHERE state = 'idle'
 AND query_start < now() - interval '10 minutes';
-```
+````

 ## Replication Lag
+
 ```sql
 -- Check lag on replica
 SELECT
@@ -343,6 +363,7 @@ SELECT
 ```

 ## Disk Space Critical
+
 ```bash
 # Check disk usage
 df -h /var/lib/postgresql/data
@@ -358,6 +379,7 @@ psql -c "VACUUM FULL large_table;"

 # If emergency, delete old data or expand disk
 ```
+
 ```

 ## Best Practices
@@ -381,3 +403,4 @@ psql -c "VACUUM FULL large_table;"
 - [Google SRE Book - Incident Management](https://sre.google/sre-book/managing-incidents/)
 - [PagerDuty Incident Response](https://response.pagerduty.com/)
 - [Atlassian Incident Management](https://www.atlassian.com/incident-management)
+```