agents/workflows/incident-response.md at ce7a5938c16057785df9d18386c11319ac5ca4e1

mirror of https://github.com/wshobson/agents.git synced 2026-03-18 09:37:15 +00:00

Files

Seth Hobson ce7a5938c1 Consolidate workflows and tools from commands repository

Repository Restructure:
- Move all 83 agent .md files to agents/ subdirectory
- Add 15 workflow orchestrators from commands repo to workflows/
- Add 42 development tools from commands repo to tools/
- Update README for unified repository structure

This prepares the repository for unified plugin marketplace integration.
The commands repository functionality is now fully integrated, providing
complete workflow orchestration and development tooling alongside agents.

Directory Structure:
- agents/    - 83 specialized AI agents
- workflows/ - 15 multi-agent orchestration commands
- tools/     - 42 focused development utilities

No breaking changes to agent functionality - all agents remain accessible
with same names and behavior. Adds workflow and tool commands for enhanced
multi-agent coordination capabilities.

2025-10-08 08:25:17 -04:00

4.3 KiB

Raw Blame History

model

model
claude-opus-4-1

Respond to production incidents with coordinated agent expertise for rapid resolution:

[Extended thinking: This workflow handles production incidents with urgency and precision. Multiple specialized agents work together to identify root causes, implement fixes, and prevent recurrence.]

Phase 1: Immediate Response

1. Incident Assessment

Use Task tool with subagent_type="incident-responder"
Prompt: "URGENT: Assess production incident: $ARGUMENTS. Determine severity, impact, and immediate mitigation steps. Time is critical."
Output: Incident severity, impact assessment, immediate actions

2. Initial Troubleshooting

Use Task tool with subagent_type="devops-troubleshooter"
Prompt: "Investigate production issue: $ARGUMENTS. Check logs, metrics, recent deployments, and system health. Identify potential root causes."
Output: Initial findings, suspicious patterns, potential causes

Phase 2: Root Cause Analysis

3. Deep Debugging

Use Task tool with subagent_type="debugger"
Prompt: "Debug production issue: $ARGUMENTS using findings from initial investigation. Analyze stack traces, reproduce issue if possible, identify exact root cause."
Output: Root cause identification, reproduction steps, debug analysis

4. Performance Analysis (if applicable)

Use Task tool with subagent_type="performance-engineer"
Prompt: "Analyze performance aspects of incident: $ARGUMENTS. Check for resource exhaustion, bottlenecks, or performance degradation."
Output: Performance metrics, resource analysis, bottleneck identification

5. Database Investigation (if applicable)

Use Task tool with subagent_type="database-optimizer"
Prompt: "Investigate database-related aspects of incident: $ARGUMENTS. Check for locks, slow queries, connection issues, or data corruption."
Output: Database health report, query analysis, data integrity check

Phase 3: Resolution Implementation

6. Fix Development

Use Task tool with subagent_type="backend-architect"
Prompt: "Design and implement fix for incident: $ARGUMENTS based on root cause analysis. Ensure fix is safe for immediate production deployment."
Output: Fix implementation, safety analysis, rollout strategy

7. Emergency Deployment

Use Task tool with subagent_type="deployment-engineer"
Prompt: "Deploy emergency fix for incident: $ARGUMENTS. Implement with minimal risk, include rollback plan, and monitor deployment closely."
Output: Deployment execution, rollback procedures, monitoring setup

Phase 4: Stabilization and Prevention

8. System Stabilization

Use Task tool with subagent_type="devops-troubleshooter"
Prompt: "Stabilize system after incident fix: $ARGUMENTS. Monitor system health, clear any backlogs, and ensure full recovery."
Output: System health report, recovery metrics, stability confirmation

9. Security Review (if applicable)

Use Task tool with subagent_type="security-auditor"
Prompt: "Review security implications of incident: $ARGUMENTS. Check for any security breaches, data exposure, or vulnerabilities exploited."
Output: Security assessment, breach analysis, hardening recommendations

Phase 5: Post-Incident Activities

10. Monitoring Enhancement

Use Task tool with subagent_type="devops-troubleshooter"
Prompt: "Enhance monitoring to prevent recurrence of: $ARGUMENTS. Add alerts, improve observability, and set up early warning systems."
Output: New monitoring rules, alert configurations, observability improvements

11. Test Coverage

Use Task tool with subagent_type="test-automator"
Prompt: "Create tests to prevent regression of incident: $ARGUMENTS. Include unit tests, integration tests, and chaos engineering scenarios."
Output: Test implementations, regression prevention, chaos tests

12. Documentation

Use Task tool with subagent_type="incident-responder"
Prompt: "Document incident postmortem for: $ARGUMENTS. Include timeline, root cause, impact, resolution, and lessons learned. No blame, focus on improvement."
Output: Postmortem document, action items, process improvements

Coordination Notes

Speed is critical in early phases - parallel agent execution where possible
Communication between agents must be clear and rapid
All changes must be safe and reversible
Document everything for postmortem analysis

Production incident: $ARGUMENTS

4.3 KiB Raw Blame History