mirror of https://github.com/wshobson/agents.git synced 2026-03-18 17:47:16 +00:00

Files

Seth Hobson 65e5cb093a feat: add Agent Skills and restructure documentation

- Add 47 Agent Skills across 14 plugins following Anthropic's specification
  - Python (5): async patterns, testing, packaging, performance, UV package manager
  - JavaScript/TypeScript (4): advanced types, Node.js patterns, testing, modern JS
  - Kubernetes (4): manifests, Helm charts, GitOps, security policies
  - Cloud Infrastructure (4): Terraform, multi-cloud, hybrid networking, cost optimization
  - CI/CD (4): pipeline design, GitHub Actions, GitLab CI, secrets management
  - Backend (3): API design, architecture patterns, microservices
  - LLM Applications (4): LangChain, prompt engineering, RAG, evaluation
  - Blockchain/Web3 (4): DeFi protocols, NFT standards, Solidity security, Web3 testing
  - Framework Migration (4): React, Angular, database, dependency upgrades
  - Observability (4): Prometheus, Grafana, distributed tracing, SLO
  - Payment Processing (4): Stripe, PayPal, PCI compliance, billing
  - API Scaffolding (1): FastAPI templates
  - ML Operations (1): ML pipeline workflow
  - Security (1): SAST configuration

- Restructure documentation into /docs directory
  - agent-skills.md: Complete guide to all 47 skills
  - agents.md: All 85 agents with model configuration
  - plugins.md: Complete catalog of 63 plugins
  - usage.md: Commands, workflows, and best practices
  - architecture.md: Design principles and patterns

- Update README.md
  - Add Agent Skills banner announcement
  - Reduce length by ~75% with links to detailed docs
  - Add What's New section showcasing Agent Skills
  - Add Popular Use Cases with real examples
  - Improve navigation with Core Guides and Quick Links

- Update marketplace.json with skills arrays for 14 plugins

All 47 skills follow Agent Skills Specification:
- Required YAML frontmatter (name, description)
- Use when activation clauses
- Progressive disclosure architecture
- Under 1024 character descriptions

2025-10-16 20:33:27 -04:00

8.1 KiB

Raw Blame History

name, description

name	description
grafana-dashboards	Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Grafana Dashboards

Create and manage production-ready Grafana dashboards for comprehensive system observability.

Purpose

Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.

When to Use

Visualize Prometheus metrics
Create custom dashboards
Implement SLO dashboards
Monitor infrastructure
Track business KPIs

Dashboard Design Principles

1. Hierarchy of Information

┌─────────────────────────────────────┐
│  Critical Metrics (Big Numbers)     │
├─────────────────────────────────────┤
│  Key Trends (Time Series)           │
├─────────────────────────────────────┤
│  Detailed Metrics (Tables/Heatmaps) │
└─────────────────────────────────────┘

2. RED Method (Services)

Rate - Requests per second
Errors - Error rate
Duration - Latency/response time

3. USE Method (Resources)

Utilization - % time resource is busy
Saturation - Queue length/wait time
Errors - Error count

Dashboard Structure

API Monitoring Dashboard

{
  "dashboard": {
    "title": "API Monitoring",
    "tags": ["api", "production"],
    "timezone": "browser",
    "refresh": "30s",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(http_requests_total[5m])) by (service)",
            "legendFormat": "{{service}}"
          }
        ],
        "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8}
      },
      {
        "title": "Error Rate %",
        "type": "graph",
        "targets": [
          {
            "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
            "legendFormat": "Error Rate"
          }
        ],
        "alert": {
          "conditions": [
            {
              "evaluator": {"params": [5], "type": "gt"},
              "operator": {"type": "and"},
              "query": {"params": ["A", "5m", "now"]},
              "type": "query"
            }
          ]
        },
        "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8}
      },
      {
        "title": "P95 Latency",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
            "legendFormat": "{{service}}"
          }
        ],
        "gridPos": {"x": 0, "y": 8, "w": 24, "h": 8}
      }
    ]
  }
}

Reference: See assets/api-dashboard.json

Panel Types

1. Stat Panel (Single Value)

{
  "type": "stat",
  "title": "Total Requests",
  "targets": [{
    "expr": "sum(http_requests_total)"
  }],
  "options": {
    "reduceOptions": {
      "values": false,
      "calcs": ["lastNotNull"]
    },
    "orientation": "auto",
    "textMode": "auto",
    "colorMode": "value"
  },
  "fieldConfig": {
    "defaults": {
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {"value": 0, "color": "green"},
          {"value": 80, "color": "yellow"},
          {"value": 90, "color": "red"}
        ]
      }
    }
  }
}

2. Time Series Graph

{
  "type": "graph",
  "title": "CPU Usage",
  "targets": [{
    "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
  }],
  "yaxes": [
    {"format": "percent", "max": 100, "min": 0},
    {"format": "short"}
  ]
}

3. Table Panel

{
  "type": "table",
  "title": "Service Status",
  "targets": [{
    "expr": "up",
    "format": "table",
    "instant": true
  }],
  "transformations": [
    {
      "id": "organize",
      "options": {
        "excludeByName": {"Time": true},
        "indexByName": {},
        "renameByName": {
          "instance": "Instance",
          "job": "Service",
          "Value": "Status"
        }
      }
    }
  ]
}

4. Heatmap

{
  "type": "heatmap",
  "title": "Latency Heatmap",
  "targets": [{
    "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
    "format": "heatmap"
  }],
  "dataFormat": "tsbuckets",
  "yAxis": {
    "format": "s"
  }
}

Variables

Query Variables

{
  "templating": {
    "list": [
      {
        "name": "namespace",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_pod_info, namespace)",
        "refresh": 1,
        "multi": false
      },
      {
        "name": "service",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
        "refresh": 1,
        "multi": true
      }
    ]
  }
}

Use Variables in Queries

sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))

Alerts in Dashboards

{
  "alert": {
    "name": "High Error Rate",
    "conditions": [
      {
        "evaluator": {
          "params": [5],
          "type": "gt"
        },
        "operator": {"type": "and"},
        "query": {
          "params": ["A", "5m", "now"]
        },
        "reducer": {"type": "avg"},
        "type": "query"
      }
    ],
    "executionErrorState": "alerting",
    "for": "5m",
    "frequency": "1m",
    "message": "Error rate is above 5%",
    "noDataState": "no_data",
    "notifications": [
      {"uid": "slack-channel"}
    ]
  }
}

Dashboard Provisioning

dashboards.yml:

apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: 'General'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /etc/grafana/dashboards

Common Dashboard Patterns

Infrastructure Dashboard

Key Panels:

CPU utilization per node
Memory usage per node
Disk I/O
Network traffic
Pod count by namespace
Node status

Reference: See assets/infrastructure-dashboard.json

Database Dashboard

Key Panels:

Queries per second
Connection pool usage
Query latency (P50, P95, P99)
Active connections
Database size
Replication lag
Slow queries

Reference: See assets/database-dashboard.json

Application Dashboard

Key Panels:

Request rate
Error rate
Response time (percentiles)
Active users/sessions
Cache hit rate
Queue length

Best Practices

Start with templates (Grafana community dashboards)
Use consistent naming for panels and variables
Group related metrics in rows
Set appropriate time ranges (default: Last 6 hours)
Use variables for flexibility
Add panel descriptions for context
Configure units correctly
Set meaningful thresholds for colors
Use consistent colors across dashboards
Test with different time ranges

Dashboard as Code

Terraform Provisioning

resource "grafana_dashboard" "api_monitoring" {
  config_json = file("${path.module}/dashboards/api-monitoring.json")
  folder      = grafana_folder.monitoring.id
}

resource "grafana_folder" "monitoring" {
  title = "Production Monitoring"
}

Ansible Provisioning

- name: Deploy Grafana dashboards
  copy:
    src: "{{ item }}"
    dest: /etc/grafana/dashboards/
  with_fileglob:
    - "dashboards/*.json"
  notify: restart grafana

Reference Files

assets/api-dashboard.json - API monitoring dashboard
assets/infrastructure-dashboard.json - Infrastructure dashboard
assets/database-dashboard.json - Database monitoring dashboard
references/dashboard-design.md - Dashboard design guide

prometheus-configuration - For metric collection
slo-implementation - For SLO dashboards

8.1 KiB Raw Blame History

Grafana Dashboards

Purpose

When to Use

Dashboard Design Principles

1. Hierarchy of Information

2. RED Method (Services)

3. USE Method (Resources)

Dashboard Structure

API Monitoring Dashboard

Panel Types

1. Stat Panel (Single Value)

2. Time Series Graph

3. Table Panel

4. Heatmap

Variables

Query Variables

Use Variables in Queries

Alerts in Dashboards

Dashboard Provisioning

Common Dashboard Patterns

Infrastructure Dashboard

Database Dashboard

Application Dashboard

Best Practices

Dashboard as Code

Terraform Provisioning

Ansible Provisioning

Reference Files

Related Skills

8.1 KiB

Raw Blame History