Add Comprehensive Python Development Skills (#419)

* Add extra python skills covering code style, design patterns, resilience, resource management, testing patterns, and type safety ...etc * fix: correct code examples in Python skills - Clarify Python version requirements for type statement (3.10+ vs 3.12+) - Add missing ValidationError import in configuration example - Add missing httpx import and url parameter in async example --------- Co-authored-by: Seth Hobson <wshobson@gmail.com>
2026-03-18 09:37:15 +00:00 · 2026-01-30 17:52:14 +01:00
parent f9e9598241
commit cbb60494b1
15 changed files with 4311 additions and 18 deletions
--- a/plugins/python-development/skills/python-resilience/SKILL.md
+++ b/plugins/python-development/skills/python-resilience/SKILL.md
@@ -0,0 +1,376 @@
+---
+name: python-resilience
+description: Python resilience patterns including automatic retries, exponential backoff, timeouts, and fault-tolerant decorators. Use when adding retry logic, implementing timeouts, building fault-tolerant services, or handling transient failures.
+---
+
+# Python Resilience Patterns
+
+Build fault-tolerant Python applications that gracefully handle transient failures, network issues, and service outages. Resilience patterns keep systems running when dependencies are unreliable.
+
+## When to Use This Skill
+
+- Adding retry logic to external service calls
+- Implementing timeouts for network operations
+- Building fault-tolerant microservices
+- Handling rate limiting and backpressure
+- Creating infrastructure decorators
+- Designing circuit breakers
+
+## Core Concepts
+
+### 1. Transient vs Permanent Failures
+
+Retry transient errors (network timeouts, temporary service issues). Don't retry permanent errors (invalid credentials, bad requests).
+
+### 2. Exponential Backoff
+
+Increase wait time between retries to avoid overwhelming recovering services.
+
+### 3. Jitter
+
+Add randomness to backoff to prevent thundering herd when many clients retry simultaneously.
+
+### 4. Bounded Retries
+
+Cap both attempt count and total duration to prevent infinite retry loops.
+
+## Quick Start
+
+```python
+from tenacity import retry, stop_after_attempt, wait_exponential_jitter
+
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential_jitter(initial=1, max=10),
+)
+def call_external_service(request: dict) -> dict:
+    return httpx.post("https://api.example.com", json=request).json()
+```
+
+## Fundamental Patterns
+
+### Pattern 1: Basic Retry with Tenacity
+
+Use the `tenacity` library for production-grade retry logic. For simpler cases, consider built-in retry functionality or a lightweight custom implementation.
+
+```python
+from tenacity import (
+    retry,
+    stop_after_attempt,
+    stop_after_delay,
+    wait_exponential_jitter,
+    retry_if_exception_type,
+)
+
+TRANSIENT_ERRORS = (ConnectionError, TimeoutError, OSError)
+
+@retry(
+    retry=retry_if_exception_type(TRANSIENT_ERRORS),
+    stop=stop_after_attempt(5) | stop_after_delay(60),
+    wait=wait_exponential_jitter(initial=1, max=30),
+)
+def fetch_data(url: str) -> dict:
+    """Fetch data with automatic retry on transient failures."""
+    response = httpx.get(url, timeout=30)
+    response.raise_for_status()
+    return response.json()
+```
+
+### Pattern 2: Retry Only Appropriate Errors
+
+Whitelist specific transient exceptions. Never retry:
+
+- `ValueError`, `TypeError` - These are bugs, not transient issues
+- `AuthenticationError` - Invalid credentials won't become valid
+- HTTP 4xx errors (except 429) - Client errors are permanent
+
+```python
+from tenacity import retry, retry_if_exception_type
+import httpx
+
+# Define what's retryable
+RETRYABLE_EXCEPTIONS = (
+    ConnectionError,
+    TimeoutError,
+    httpx.ConnectTimeout,
+    httpx.ReadTimeout,
+)
+
+@retry(
+    retry=retry_if_exception_type(RETRYABLE_EXCEPTIONS),
+    stop=stop_after_attempt(3),
+    wait=wait_exponential_jitter(initial=1, max=10),
+)
+def resilient_api_call(endpoint: str) -> dict:
+    """Make API call with retry on network issues."""
+    return httpx.get(endpoint, timeout=10).json()
+```
+
+### Pattern 3: HTTP Status Code Retries
+
+Retry specific HTTP status codes that indicate transient issues.
+
+```python
+from tenacity import retry, retry_if_result, stop_after_attempt
+import httpx
+
+RETRY_STATUS_CODES = {429, 502, 503, 504}
+
+def should_retry_response(response: httpx.Response) -> bool:
+    """Check if response indicates a retryable error."""
+    return response.status_code in RETRY_STATUS_CODES
+
+@retry(
+    retry=retry_if_result(should_retry_response),
+    stop=stop_after_attempt(3),
+    wait=wait_exponential_jitter(initial=1, max=10),
+)
+def http_request(method: str, url: str, **kwargs) -> httpx.Response:
+    """Make HTTP request with retry on transient status codes."""
+    return httpx.request(method, url, timeout=30, **kwargs)
+```
+
+### Pattern 4: Combined Exception and Status Retry
+
+Handle both network exceptions and HTTP status codes.
+
+```python
+from tenacity import (
+    retry,
+    retry_if_exception_type,
+    retry_if_result,
+    stop_after_attempt,
+    wait_exponential_jitter,
+    before_sleep_log,
+)
+import logging
+import httpx
+
+logger = logging.getLogger(__name__)
+
+TRANSIENT_EXCEPTIONS = (
+    ConnectionError,
+    TimeoutError,
+    httpx.ConnectError,
+    httpx.ReadTimeout,
+)
+RETRY_STATUS_CODES = {429, 500, 502, 503, 504}
+
+def is_retryable_response(response: httpx.Response) -> bool:
+    return response.status_code in RETRY_STATUS_CODES
+
+@retry(
+    retry=(
+        retry_if_exception_type(TRANSIENT_EXCEPTIONS) |
+        retry_if_result(is_retryable_response)
+    ),
+    stop=stop_after_attempt(5),
+    wait=wait_exponential_jitter(initial=1, max=30),
+    before_sleep=before_sleep_log(logger, logging.WARNING),
+)
+def robust_http_call(
+    method: str,
+    url: str,
+    **kwargs,
+) -> httpx.Response:
+    """HTTP call with comprehensive retry handling."""
+    return httpx.request(method, url, timeout=30, **kwargs)
+```
+
+## Advanced Patterns
+
+### Pattern 5: Logging Retry Attempts
+
+Track retry behavior for debugging and alerting.
+
+```python
+from tenacity import retry, stop_after_attempt, wait_exponential
+import structlog
+
+logger = structlog.get_logger()
+
+def log_retry_attempt(retry_state):
+    """Log detailed retry information."""
+    exception = retry_state.outcome.exception()
+    logger.warning(
+        "Retrying operation",
+        attempt=retry_state.attempt_number,
+        exception_type=type(exception).__name__,
+        exception_message=str(exception),
+        next_wait_seconds=retry_state.next_action.sleep if retry_state.next_action else None,
+    )
+
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, max=10),
+    before_sleep=log_retry_attempt,
+)
+def call_with_logging(request: dict) -> dict:
+    """External call with retry logging."""
+    ...
+```
+
+### Pattern 6: Timeout Decorator
+
+Create reusable timeout decorators for consistent timeout handling.
+
+```python
+import asyncio
+from functools import wraps
+from typing import TypeVar, Callable
+
+T = TypeVar("T")
+
+def with_timeout(seconds: float):
+    """Decorator to add timeout to async functions."""
+    def decorator(func: Callable[..., T]) -> Callable[..., T]:
+        @wraps(func)
+        async def wrapper(*args, **kwargs) -> T:
+            return await asyncio.wait_for(
+                func(*args, **kwargs),
+                timeout=seconds,
+            )
+        return wrapper
+    return decorator
+
+@with_timeout(30)
+async def fetch_with_timeout(url: str) -> dict:
+    """Fetch URL with 30 second timeout."""
+    async with httpx.AsyncClient() as client:
+        response = await client.get(url)
+        return response.json()
+```
+
+### Pattern 7: Cross-Cutting Concerns via Decorators
+
+Stack decorators to separate infrastructure from business logic.
+
+```python
+from functools import wraps
+from typing import TypeVar, Callable
+import structlog
+
+logger = structlog.get_logger()
+T = TypeVar("T")
+
+def traced(name: str | None = None):
+    """Add tracing to function calls."""
+    def decorator(func: Callable[..., T]) -> Callable[..., T]:
+        span_name = name or func.__name__
+
+        @wraps(func)
+        async def wrapper(*args, **kwargs) -> T:
+            logger.info("Operation started", operation=span_name)
+            try:
+                result = await func(*args, **kwargs)
+                logger.info("Operation completed", operation=span_name)
+                return result
+            except Exception as e:
+                logger.error("Operation failed", operation=span_name, error=str(e))
+                raise
+        return wrapper
+    return decorator
+
+# Stack multiple concerns
+@traced("fetch_user_data")
+@with_timeout(30)
+@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter())
+async def fetch_user_data(user_id: str) -> dict:
+    """Fetch user with tracing, timeout, and retry."""
+    ...
+```
+
+### Pattern 8: Dependency Injection for Testability
+
+Pass infrastructure components through constructors for easy testing.
+
+```python
+from dataclasses import dataclass
+from typing import Protocol
+
+class Logger(Protocol):
+    def info(self, msg: str, **kwargs) -> None: ...
+    def error(self, msg: str, **kwargs) -> None: ...
+
+class MetricsClient(Protocol):
+    def increment(self, metric: str, tags: dict | None = None) -> None: ...
+    def timing(self, metric: str, value: float) -> None: ...
+
+@dataclass
+class UserService:
+    """Service with injected infrastructure."""
+
+    repository: UserRepository
+    logger: Logger
+    metrics: MetricsClient
+
+    async def get_user(self, user_id: str) -> User:
+        self.logger.info("Fetching user", user_id=user_id)
+        start = time.perf_counter()
+
+        try:
+            user = await self.repository.get(user_id)
+            self.metrics.increment("user.fetch.success")
+            return user
+        except Exception as e:
+            self.metrics.increment("user.fetch.error")
+            self.logger.error("Failed to fetch user", user_id=user_id, error=str(e))
+            raise
+        finally:
+            elapsed = time.perf_counter() - start
+            self.metrics.timing("user.fetch.duration", elapsed)
+
+# Easy to test with fakes
+service = UserService(
+    repository=FakeRepository(),
+    logger=FakeLogger(),
+    metrics=FakeMetrics(),
+)
+```
+
+### Pattern 9: Fail-Safe Defaults
+
+Degrade gracefully when non-critical operations fail.
+
+```python
+from typing import TypeVar
+from collections.abc import Callable
+
+T = TypeVar("T")
+
+def fail_safe(default: T, log_failure: bool = True):
+    """Return default value on failure instead of raising."""
+    def decorator(func: Callable[..., T]) -> Callable[..., T]:
+        @wraps(func)
+        async def wrapper(*args, **kwargs) -> T:
+            try:
+                return await func(*args, **kwargs)
+            except Exception as e:
+                if log_failure:
+                    logger.warning(
+                        "Operation failed, using default",
+                        function=func.__name__,
+                        error=str(e),
+                    )
+                return default
+        return wrapper
+    return decorator
+
+@fail_safe(default=[])
+async def get_recommendations(user_id: str) -> list[str]:
+    """Get recommendations, return empty list on failure."""
+    ...
+```
+
+## Best Practices Summary
+
+1. **Retry only transient errors** - Don't retry bugs or authentication failures
+2. **Use exponential backoff** - Give services time to recover
+3. **Add jitter** - Prevent thundering herd from synchronized retries
+4. **Cap total duration** - `stop_after_attempt(5) | stop_after_delay(60)`
+5. **Log every retry** - Silent retries hide systemic problems
+6. **Use decorators** - Keep retry logic separate from business logic
+7. **Inject dependencies** - Make infrastructure testable
+8. **Set timeouts everywhere** - Every network call needs a timeout
+9. **Fail gracefully** - Return cached/default values for non-critical paths
+10. **Monitor retry rates** - High retry rates indicate underlying issues