Files
agents/tools/tdd-red.md
Seth Hobson a58a9addd9 feat: comprehensive upgrade of 32 tools and workflows
Major quality improvements across all tools and workflows:
- Expanded from 1,952 to 23,686 lines (12.1x growth)
- Added 89 complete code examples with production-ready implementations
- Integrated modern 2024/2025 technologies and best practices
- Established consistent structure across all files
- Added 64 reference workflows with real-world scenarios

Phase 1 - Critical Workflows (4 files):
- git-workflow: 9→118 lines - Complete git workflow orchestration
- legacy-modernize: 10→110 lines - Strangler fig pattern implementation
- multi-platform: 10→181 lines - API-first cross-platform development
- improve-agent: 13→292 lines - Systematic agent optimization

Phase 2 - Unstructured Tools (8 files):
- issue: 33→636 lines - GitHub issue resolution expert
- prompt-optimize: 49→1,207 lines - Advanced prompt engineering
- data-pipeline: 56→2,312 lines - Production-ready pipeline architecture
- data-validation: 56→1,674 lines - Comprehensive validation framework
- error-analysis: 56→1,154 lines - Modern observability and debugging
- langchain-agent: 56→2,735 lines - LangChain 0.1+ with LangGraph
- ai-review: 63→1,597 lines - AI-powered code review system
- deploy-checklist: 71→1,631 lines - GitOps and progressive delivery

Phase 3 - Mid-Length Tools (4 files):
- tdd-red: 111→1,763 lines - Property-based testing and decision frameworks
- tdd-green: 130→842 lines - Implementation patterns and type-driven development
- tdd-refactor: 174→1,860 lines - SOLID examples and architecture refactoring
- refactor-clean: 267→886 lines - AI code review and static analysis integration

Phase 4 - Short Workflows (7 files):
- ml-pipeline: 43→292 lines - MLOps with experiment tracking
- smart-fix: 44→834 lines - Intelligent debugging with AI assistance
- full-stack-feature: 58→113 lines - API-first full-stack development
- security-hardening: 63→118 lines - DevSecOps with zero-trust
- data-driven-feature: 70→160 lines - A/B testing and analytics
- performance-optimization: 70→111 lines - APM and Core Web Vitals
- full-review: 76→124 lines - Multi-phase comprehensive review

Phase 5 - Small Files (9 files):
- onboard: 24→394 lines - Remote-first onboarding specialist
- multi-agent-review: 63→194 lines - Multi-agent orchestration
- context-save: 65→155 lines - Context management with vector DBs
- context-restore: 65→157 lines - Context restoration and RAG
- smart-debug: 65→1,727 lines - AI-assisted debugging with observability
- standup-notes: 68→765 lines - Async-first with Git integration
- multi-agent-optimize: 85→189 lines - Performance optimization framework
- incident-response: 80→146 lines - SRE practices and incident command
- feature-development: 84→144 lines - End-to-end feature workflow

Technologies integrated:
- AI/ML: GitHub Copilot, Claude Code, LangChain 0.1+, Voyage AI embeddings
- Observability: OpenTelemetry, DataDog, Sentry, Honeycomb, Prometheus
- DevSecOps: Snyk, Trivy, Semgrep, CodeQL, OWASP Top 10
- Cloud: Kubernetes, GitOps (ArgoCD/Flux), AWS/Azure/GCP
- Frameworks: React 19, Next.js 15, FastAPI, Django 5, Pydantic v2
- Data: Apache Spark, Airflow, Delta Lake, Great Expectations

All files now include:
- Clear role statements and expertise definitions
- Structured Context/Requirements sections
- 6-8 major instruction sections (tools) or 3-4 phases (workflows)
- Multiple complete code examples in various languages
- Modern framework integrations
- Real-world reference implementations
2025-10-11 15:33:18 -04:00

54 KiB

Write comprehensive failing tests following TDD red phase principles:

[Extended thinking: This tool uses the test-automator agent to generate comprehensive failing tests that properly define expected behavior. It ensures tests fail for the right reasons and establishes a solid foundation for implementation.]

Test Generation Process

Use Task tool with subagent_type="test-automator" to generate failing tests.

Prompt: "Generate comprehensive FAILING tests for: $ARGUMENTS. Follow TDD red phase principles:

  1. Test Structure Setup

    • Choose appropriate testing framework for the language/stack
    • Set up test fixtures and necessary imports
    • Configure test runners and assertion libraries
    • Establish test naming conventions (should_X_when_Y format)
  2. Behavior Definition

    • Define clear expected behaviors from requirements
    • Cover happy path scenarios thoroughly
    • Include edge cases and boundary conditions
    • Add error handling and exception scenarios
    • Consider null/undefined/empty input cases
  3. Test Implementation

    • Write descriptive test names that document intent
    • Keep tests focused on single behaviors (one assertion per test when possible)
    • Use Arrange-Act-Assert (AAA) pattern consistently
    • Implement test data builders for complex objects
    • Avoid test interdependencies - each test must be isolated
  4. Failure Verification

    • Ensure tests actually fail when run
    • Verify failure messages are meaningful and diagnostic
    • Confirm tests fail for the RIGHT reasons (not syntax/import errors)
    • Check that error messages guide implementation
    • Validate test isolation - no cascading failures
  5. Test Categories

    • Unit Tests: Isolated component behavior
    • Integration Tests: Component interaction scenarios
    • Contract Tests: API and interface contracts
    • Property Tests: Invariants and mathematical properties
    • Acceptance Tests: User story validation
  6. Framework-Specific Patterns

    • JavaScript/TypeScript: Jest, Mocha, Vitest patterns
    • Python: pytest fixtures and parameterization
    • Java: JUnit5 annotations and assertions
    • C#: NUnit/xUnit attributes and theory data
    • Go: Table-driven tests and subtests
    • Ruby: RSpec expectations and contexts
  7. Test Quality Checklist ✓ Tests are readable and self-documenting ✓ Failure messages clearly indicate what went wrong ✓ Tests follow DRY principle with appropriate abstractions ✓ Coverage includes positive, negative, and edge cases ✓ Tests can serve as living documentation ✓ No implementation details leaked into tests ✓ Tests use meaningful test data, not 'foo' and 'bar'

  8. Common Anti-Patterns to Avoid

    • Writing tests that pass immediately
    • Testing implementation instead of behavior
    • Overly complex test setup
    • Brittle tests tied to specific implementations
    • Tests with multiple responsibilities
    • Ignored or commented-out tests
    • Tests without clear assertions

Output should include:

  • Complete test file(s) with all necessary imports
  • Clear documentation of what each test validates
  • Verification commands to run tests and see failures
  • Metrics: number of tests, coverage areas, test categories
  • Next steps for moving to green phase"

Validation Steps

After test generation:

  1. Run tests to confirm they fail
  2. Verify failure messages are helpful
  3. Check test independence and isolation
  4. Ensure comprehensive coverage
  5. Document any assumptions made

Recovery Process

If tests don't fail properly:

  • Debug import/syntax issues first
  • Ensure test framework is properly configured
  • Verify assertions are actually checking behavior
  • Add more specific assertions if needed
  • Consider missing test categories

Integration Points

  • Links to tdd-green.md for implementation phase
  • Coordinates with tdd-refactor.md for improvement phase
  • Integrates with CI/CD for automated verification
  • Connects to test coverage reporting tools

Best Practices

  • Start with the simplest failing test
  • One behavior change at a time
  • Tests should tell a story of the feature
  • Prefer many small tests over few large ones
  • Use test naming as documentation
  • Keep test code as clean as production code

Complete Code Examples

Example 1: Test-First API Design (TypeScript/Jest)

Scenario: Designing a user authentication service from tests first

// auth.service.test.ts - RED PHASE
describe('AuthenticationService', () => {
  let authService: AuthenticationService;
  let mockUserRepository: jest.Mocked<UserRepository>;
  let mockHashingService: jest.Mocked<HashingService>;
  let mockTokenGenerator: jest.Mocked<TokenGenerator>;

  beforeEach(() => {
    mockUserRepository = {
      findByEmail: jest.fn(),
      save: jest.fn()
    } as any;
    mockHashingService = {
      hash: jest.fn(),
      verify: jest.fn()
    } as any;
    mockTokenGenerator = {
      generate: jest.fn()
    } as any;

    authService = new AuthenticationService(
      mockUserRepository,
      mockHashingService,
      mockTokenGenerator
    );
  });

  describe('authenticate', () => {
    it('should_return_token_when_credentials_are_valid', async () => {
      // Arrange
      const email = 'user@example.com';
      const password = 'SecurePass123!';
      const hashedPassword = 'hashed_password';
      const expectedToken = 'jwt.token.here';

      const mockUser = {
        id: '123',
        email,
        passwordHash: hashedPassword,
        isActive: true
      };

      mockUserRepository.findByEmail.mockResolvedValue(mockUser);
      mockHashingService.verify.mockResolvedValue(true);
      mockTokenGenerator.generate.mockReturnValue(expectedToken);

      // Act
      const result = await authService.authenticate(email, password);

      // Assert
      expect(result.success).toBe(true);
      expect(result.token).toBe(expectedToken);
      expect(result.userId).toBe('123');
      expect(mockUserRepository.findByEmail).toHaveBeenCalledWith(email);
      expect(mockHashingService.verify).toHaveBeenCalledWith(password, hashedPassword);
    });

    it('should_fail_when_user_does_not_exist', async () => {
      // Arrange
      mockUserRepository.findByEmail.mockResolvedValue(null);

      // Act
      const result = await authService.authenticate('nonexistent@example.com', 'password');

      // Assert
      expect(result.success).toBe(false);
      expect(result.error).toBe('INVALID_CREDENTIALS');
      expect(result.token).toBeUndefined();
      expect(mockHashingService.verify).not.toHaveBeenCalled();
    });

    it('should_fail_when_password_is_incorrect', async () => {
      // Arrange
      const mockUser = {
        id: '123',
        email: 'user@example.com',
        passwordHash: 'hashed',
        isActive: true
      };
      mockUserRepository.findByEmail.mockResolvedValue(mockUser);
      mockHashingService.verify.mockResolvedValue(false);

      // Act
      const result = await authService.authenticate('user@example.com', 'wrong');

      // Assert
      expect(result.success).toBe(false);
      expect(result.error).toBe('INVALID_CREDENTIALS');
      expect(mockTokenGenerator.generate).not.toHaveBeenCalled();
    });

    it('should_fail_when_account_is_inactive', async () => {
      // Arrange
      const mockUser = {
        id: '123',
        email: 'user@example.com',
        passwordHash: 'hashed',
        isActive: false
      };
      mockUserRepository.findByEmail.mockResolvedValue(mockUser);

      // Act
      const result = await authService.authenticate('user@example.com', 'password');

      // Assert
      expect(result.success).toBe(false);
      expect(result.error).toBe('ACCOUNT_INACTIVE');
      expect(mockHashingService.verify).not.toHaveBeenCalled();
    });

    it('should_handle_repository_errors_gracefully', async () => {
      // Arrange
      mockUserRepository.findByEmail.mockRejectedValue(new Error('Database connection failed'));

      // Act & Assert
      await expect(
        authService.authenticate('user@example.com', 'password')
      ).rejects.toThrow('Authentication service unavailable');
    });
  });
});

Key Patterns:

  • Comprehensive mocking strategy for dependencies
  • Clear test naming documenting expected behavior
  • AAA pattern consistently applied
  • Edge cases covered (inactive account, errors)
  • Tests guide the API design (return structure, error handling)

Example 2: Property-Based Testing (Python/Hypothesis)

Scenario: Testing mathematical properties of a sorting algorithm

# test_sorting.py - RED PHASE with property-based testing
from hypothesis import given, strategies as st, assume
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
import pytest

class TestSortFunction:
    """Property-based tests for custom sorting implementation"""

    @given(st.lists(st.integers()))
    def test_sorted_list_length_unchanged(self, input_list):
        """Property: Sorting doesn't change the number of elements"""
        # Act
        result = custom_sort(input_list)

        # Assert
        assert len(result) == len(input_list), \
            f"Expected {len(input_list)} elements, got {len(result)}"

    @given(st.lists(st.integers()))
    def test_sorted_list_is_ordered(self, input_list):
        """Property: Each element <= next element"""
        # Act
        result = custom_sort(input_list)

        # Assert
        for i in range(len(result) - 1):
            assert result[i] <= result[i + 1], \
                f"Elements at {i} and {i+1} are out of order: {result[i]} > {result[i+1]}"

    @given(st.lists(st.integers()))
    def test_sorted_list_contains_same_elements(self, input_list):
        """Property: Sorting is a permutation (same elements, different order)"""
        # Act
        result = custom_sort(input_list)

        # Assert
        assert sorted(input_list) == sorted(result), \
            f"Result contains different elements than input"

    @given(st.lists(st.integers(), min_size=1))
    def test_minimum_element_is_first(self, input_list):
        """Property: First element is the minimum"""
        # Act
        result = custom_sort(input_list)

        # Assert
        assert result[0] == min(input_list), \
            f"First element {result[0]} is not minimum {min(input_list)}"

    @given(st.lists(st.integers(), min_size=1))
    def test_maximum_element_is_last(self, input_list):
        """Property: Last element is the maximum"""
        # Act
        result = custom_sort(input_list)

        # Assert
        assert result[-1] == max(input_list), \
            f"Last element {result[-1]} is not maximum {max(input_list)}"

    @given(st.lists(st.integers()))
    def test_sorting_is_idempotent(self, input_list):
        """Property: Sorting twice gives same result as sorting once"""
        # Act
        sorted_once = custom_sort(input_list)
        sorted_twice = custom_sort(sorted_once)

        # Assert
        assert sorted_once == sorted_twice, \
            "Sorting is not idempotent"

    def test_empty_list_returns_empty_list(self):
        """Edge case: Empty list"""
        assert custom_sort([]) == []

    def test_single_element_unchanged(self):
        """Edge case: Single element"""
        assert custom_sort([42]) == [42]

    def test_already_sorted_list_unchanged(self):
        """Edge case: Already sorted"""
        input_list = [1, 2, 3, 4, 5]
        assert custom_sort(input_list) == input_list

    def test_reverse_sorted_list(self):
        """Edge case: Reverse order"""
        assert custom_sort([5, 4, 3, 2, 1]) == [1, 2, 3, 4, 5]

    def test_duplicates_preserved(self):
        """Edge case: Duplicate elements"""
        assert custom_sort([3, 1, 2, 1, 3]) == [1, 1, 2, 3, 3]

Key Patterns:

  • Property-based testing for algorithmic correctness
  • Mathematical invariants as test oracles
  • Hypothesis generates hundreds of test cases automatically
  • Edge cases still tested explicitly
  • Tests define correctness properties, not specific outputs

Example 3: Test-Driven Bug Fixing (Go)

Scenario: Reproducing and fixing a reported bug in date calculation

// date_calculator_test.go - RED PHASE for bug fix
package timecalc

import (
    "testing"
    "time"
)

// Bug Report: AddBusinessDays fails across month boundaries
// Expected: Adding 5 business days to Friday Jan 27, 2023 should give Feb 3, 2023
// Actual: Returns Feb 1, 2023 (incorrect)

func TestAddBusinessDays_BugReproduction(t *testing.T) {
    tests := []struct {
        name          string
        startDate     time.Time
        daysToAdd     int
        expectedDate  time.Time
        description   string
    }{
        {
            name:         "bug_report_original_case",
            startDate:    time.Date(2023, 1, 27, 0, 0, 0, 0, time.UTC), // Friday
            daysToAdd:    5,
            expectedDate: time.Date(2023, 2, 3, 0, 0, 0, 0, time.UTC),  // Next Friday
            description:  "5 business days from Jan 27 (Fri) should be Feb 3 (Fri), skipping weekend",
        },
        {
            name:         "single_day_within_month",
            startDate:    time.Date(2023, 1, 10, 0, 0, 0, 0, time.UTC), // Tuesday
            daysToAdd:    1,
            expectedDate: time.Date(2023, 1, 11, 0, 0, 0, 0, time.UTC), // Wednesday
            description:  "Simple case: 1 business day, same month",
        },
        {
            name:         "friday_plus_one_skips_weekend",
            startDate:    time.Date(2023, 1, 6, 0, 0, 0, 0, time.UTC),  // Friday
            daysToAdd:    1,
            expectedDate: time.Date(2023, 1, 9, 0, 0, 0, 0, time.UTC),  // Monday
            description:  "1 business day from Friday should be Monday",
        },
        {
            name:         "thursday_plus_three_crosses_weekend",
            startDate:    time.Date(2023, 1, 5, 0, 0, 0, 0, time.UTC),  // Thursday
            daysToAdd:    3,
            expectedDate: time.Date(2023, 1, 10, 0, 0, 0, 0, time.UTC), // Tuesday
            description:  "3 business days from Thursday crosses weekend",
        },
        {
            name:         "crosses_month_boundary_no_weekend",
            startDate:    time.Date(2023, 1, 30, 0, 0, 0, 0, time.UTC), // Monday
            daysToAdd:    3,
            expectedDate: time.Date(2023, 2, 2, 0, 0, 0, 0, time.UTC),  // Thursday
            description:  "Crosses month boundary without weekend interaction",
        },
        {
            name:         "crosses_year_boundary",
            startDate:    time.Date(2023, 12, 28, 0, 0, 0, 0, time.UTC), // Thursday
            daysToAdd:    3,
            expectedDate: time.Date(2024, 1, 2, 0, 0, 0, 0, time.UTC),   // Tuesday
            description:  "Crosses year boundary and weekend",
        },
        {
            name:         "leap_year_february_crossing",
            startDate:    time.Date(2024, 2, 27, 0, 0, 0, 0, time.UTC), // Tuesday
            daysToAdd:    5,
            expectedDate: time.Date(2024, 3, 4, 0, 0, 0, 0, time.UTC),  // Monday (leap year)
            description:  "Crosses leap year February boundary",
        },
        {
            name:         "zero_days_returns_same_date",
            startDate:    time.Date(2023, 1, 15, 0, 0, 0, 0, time.UTC),
            daysToAdd:    0,
            expectedDate: time.Date(2023, 1, 15, 0, 0, 0, 0, time.UTC),
            description:  "Edge case: adding 0 days",
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Act
            result := AddBusinessDays(tt.startDate, tt.daysToAdd)

            // Assert
            if !result.Equal(tt.expectedDate) {
                t.Errorf("%s\nAddBusinessDays(%v, %d)\nExpected: %v\nGot:      %v",
                    tt.description,
                    tt.startDate.Format("Mon Jan 2, 2006"),
                    tt.daysToAdd,
                    tt.expectedDate.Format("Mon Jan 2, 2006"),
                    result.Format("Mon Jan 2, 2006"))
            }
        })
    }
}

func TestAddBusinessDays_StartingOnWeekend(t *testing.T) {
    tests := []struct {
        name      string
        startDate time.Time
        daysToAdd int
        shouldErr bool
    }{
        {
            name:      "saturday_start_should_error",
            startDate: time.Date(2023, 1, 7, 0, 0, 0, 0, time.UTC), // Saturday
            daysToAdd: 1,
            shouldErr: true,
        },
        {
            name:      "sunday_start_should_error",
            startDate: time.Date(2023, 1, 8, 0, 0, 0, 0, time.UTC), // Sunday
            daysToAdd: 1,
            shouldErr: true,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Act
            _, err := AddBusinessDaysWithError(tt.startDate, tt.daysToAdd)

            // Assert
            if tt.shouldErr && err == nil {
                t.Errorf("Expected error for weekend start date, got nil")
            }
            if !tt.shouldErr && err != nil {
                t.Errorf("Unexpected error: %v", err)
            }
        })
    }
}

func TestAddBusinessDays_NegativeDays(t *testing.T) {
    // Edge case: negative days should error or subtract
    startDate := time.Date(2023, 1, 15, 0, 0, 0, 0, time.UTC)

    t.Run("negative_days_should_error", func(t *testing.T) {
        _, err := AddBusinessDaysWithError(startDate, -5)
        if err == nil {
            t.Error("Expected error for negative days, got nil")
        }
    })
}

Key Patterns:

  • Table-driven tests (idiomatic Go)
  • Bug reproduction test as first priority
  • Comprehensive edge case coverage discovered through debugging
  • Clear test naming and descriptions
  • Tests document the expected behavior precisely

Example 4: Integration Test with Database (Python/pytest)

Scenario: Testing a repository layer with real database interactions

# test_user_repository_integration.py - RED PHASE
import pytest
from decimal import Decimal
from datetime import datetime, timedelta
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from models import Base, User, Order, OrderStatus
from repositories import UserRepository

@pytest.fixture(scope="function")
def db_session():
    """Create a fresh database session for each test"""
    # Use in-memory SQLite for fast integration tests
    engine = create_engine('sqlite:///:memory:')
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()

    yield session

    session.close()
    engine.dispose()

@pytest.fixture
def user_repository(db_session):
    """Provide a UserRepository instance with test database"""
    return UserRepository(db_session)

@pytest.fixture
def sample_user(db_session):
    """Create a sample user for tests"""
    user = User(
        email='test@example.com',
        name='Test User',
        created_at=datetime.utcnow()
    )
    db_session.add(user)
    db_session.commit()
    return user

class TestUserRepository_FindByEmail:
    """Integration tests for finding users by email"""

    def test_should_return_user_when_email_exists(self, user_repository, sample_user):
        # Act
        result = user_repository.find_by_email('test@example.com')

        # Assert
        assert result is not None, "Expected user to be found"
        assert result.email == 'test@example.com'
        assert result.name == 'Test User'
        assert result.id == sample_user.id

    def test_should_return_none_when_email_not_found(self, user_repository):
        # Act
        result = user_repository.find_by_email('nonexistent@example.com')

        # Assert
        assert result is None, "Expected None for non-existent email"

    def test_should_be_case_insensitive(self, user_repository, sample_user):
        # Act
        result = user_repository.find_by_email('TEST@EXAMPLE.COM')

        # Assert
        assert result is not None, "Email search should be case-insensitive"
        assert result.id == sample_user.id

    def test_should_handle_email_with_leading_trailing_spaces(self, user_repository, sample_user):
        # Act
        result = user_repository.find_by_email('  test@example.com  ')

        # Assert
        assert result is not None, "Should trim spaces from email"
        assert result.id == sample_user.id

class TestUserRepository_GetUserWithOrders:
    """Integration tests for eager loading user orders"""

    def test_should_load_user_with_orders(self, user_repository, sample_user, db_session):
        # Arrange
        order1 = Order(
            user_id=sample_user.id,
            total=Decimal('99.99'),
            status=OrderStatus.COMPLETED,
            created_at=datetime.utcnow()
        )
        order2 = Order(
            user_id=sample_user.id,
            total=Decimal('149.99'),
            status=OrderStatus.PENDING,
            created_at=datetime.utcnow()
        )
        db_session.add_all([order1, order2])
        db_session.commit()

        # Act
        user = user_repository.get_user_with_orders(sample_user.id)

        # Assert
        assert user is not None
        assert len(user.orders) == 2, f"Expected 2 orders, got {len(user.orders)}"
        assert any(o.total == Decimal('99.99') for o in user.orders)
        assert any(o.total == Decimal('149.99') for o in user.orders)

    def test_should_return_user_with_empty_orders_when_no_orders(self, user_repository, sample_user):
        # Act
        user = user_repository.get_user_with_orders(sample_user.id)

        # Assert
        assert user is not None
        assert len(user.orders) == 0, "Expected empty orders list"

    def test_should_return_none_when_user_not_found(self, user_repository):
        # Act
        user = user_repository.get_user_with_orders(99999)

        # Assert
        assert user is None

class TestUserRepository_GetActiveUsers:
    """Integration tests for querying active users"""

    def test_should_return_users_active_within_timeframe(self, user_repository, db_session):
        # Arrange
        active_user = User(
            email='active@example.com',
            name='Active User',
            last_login=datetime.utcnow() - timedelta(days=5)
        )
        inactive_user = User(
            email='inactive@example.com',
            name='Inactive User',
            last_login=datetime.utcnow() - timedelta(days=35)
        )
        never_logged_in = User(
            email='new@example.com',
            name='New User',
            last_login=None
        )
        db_session.add_all([active_user, inactive_user, never_logged_in])
        db_session.commit()

        # Act
        active_users = user_repository.get_active_users(days=30)

        # Assert
        assert len(active_users) == 1, f"Expected 1 active user, got {len(active_users)}"
        assert active_users[0].email == 'active@example.com'

    def test_should_order_by_last_login_desc(self, user_repository, db_session):
        # Arrange
        user1 = User(email='user1@example.com', last_login=datetime.utcnow() - timedelta(days=1))
        user2 = User(email='user2@example.com', last_login=datetime.utcnow() - timedelta(days=5))
        user3 = User(email='user3@example.com', last_login=datetime.utcnow() - timedelta(days=3))
        db_session.add_all([user1, user2, user3])
        db_session.commit()

        # Act
        active_users = user_repository.get_active_users(days=30)

        # Assert
        assert len(active_users) == 3
        assert active_users[0].email == 'user1@example.com', "Most recent should be first"
        assert active_users[1].email == 'user3@example.com'
        assert active_users[2].email == 'user2@example.com', "Least recent should be last"

class TestUserRepository_TransactionBehavior:
    """Integration tests for transaction handling"""

    def test_should_rollback_on_constraint_violation(self, user_repository, sample_user, db_session):
        # Arrange: sample_user already has email 'test@example.com'
        duplicate_user = User(
            email='test@example.com',  # Duplicate email
            name='Duplicate User'
        )

        # Act & Assert
        with pytest.raises(Exception) as exc_info:
            user_repository.save(duplicate_user)

        # Verify database state unchanged
        users = db_session.query(User).filter_by(email='test@example.com').all()
        assert len(users) == 1, "Should only have original user after rollback"

    def test_should_handle_concurrent_modifications(self, user_repository, sample_user, db_session):
        # This test would fail initially, driving implementation of optimistic locking

        # Arrange: Get same user in two "sessions"
        user_v1 = user_repository.find_by_email('test@example.com')
        user_v2 = user_repository.find_by_email('test@example.com')

        # Act: Modify and save first version
        user_v1.name = 'Updated Name V1'
        user_repository.save(user_v1)

        # Try to save second version (stale data)
        user_v2.name = 'Updated Name V2'

        # Assert: Should detect concurrent modification
        with pytest.raises(Exception) as exc_info:
            user_repository.save(user_v2)

        assert 'concurrent' in str(exc_info.value).lower() or 'stale' in str(exc_info.value).lower()

Key Patterns:

  • Fixture-based test isolation with fresh database per test
  • Real database interactions (in-memory for speed)
  • Transaction behavior testing
  • Complex query scenarios
  • Eager loading verification
  • Concurrent modification testing

Decision Frameworks

Test Level Selection Matrix

Use this matrix to decide which test type to write first:

Scenario Unit Test Integration Test E2E Test Rationale
Pure business logic ✓ PRIMARY - Optional - No Fast feedback, isolated logic
Database queries - Mocks OK ✓ PRIMARY - No Need real DB behavior
External API calls ✓ with mocks ✓ with test server - Optional Balance speed vs realism
User workflows - No ✓ backend only ✓ PRIMARY End-to-end validation needed
Algorithm correctness ✓ PRIMARY - No - No Pure logic, no dependencies
Performance requirements - No ✓ PRIMARY ✓ if UI involved Realistic environment needed
Security requirements ✓ logic only ✓ PRIMARY ✓ for auth flows Multiple layers needed
UI components (React/Vue) ✓ PRIMARY ✓ with routing - Optional Component behavior + integration
Microservice boundaries ✓ per service ✓ CONTRACT ✓ full flow Contract tests prevent breaks
Bug reproduction ✓ if unit-level ✓ if integration-level ✓ if workflow-level Test at failure level

Test Granularity Decision Tree

Is the functionality complex with multiple branches?
├─ YES: Multiple granular tests (one per branch)
└─ NO: Single test may suffice
    │
    ├─ Does it involve external dependencies?
    │   ├─ YES: Integration test preferred
    │   └─ NO: Unit test sufficient
    │
    └─ Is it user-facing behavior?
        ├─ YES: Consider E2E test
        └─ NO: Unit/Integration test

Mock/Stub/Fake Selection Criteria

When to use MOCKS (behavior verification):

  • Verifying methods were called with correct parameters
  • Testing event emission and callbacks
  • Validating side effects occurred
  • Example: Verifying email service was called with correct recipient

When to use STUBS (state verification):

  • Need to control return values for testing paths
  • Simulating error conditions
  • Replacing slow external dependencies
  • Example: Stubbing API response to test error handling

When to use FAKES (realistic implementation):

  • Need realistic behavior without external dependencies
  • Testing complex interactions
  • In-memory database for integration tests
  • Example: Fake email service that stores emails in memory

When to use REAL implementations:

  • Integration tests requiring actual behavior
  • Performance characteristics matter
  • Edge cases only real system can produce
  • Example: Testing actual database transaction behavior

Test Data Strategy Selection

Data Type Strategy Use Case
Simple values Inline literals Quick, obvious test cases
Complex objects Builder pattern Reusable, readable object creation
Large datasets Factory pattern Generate many variations
Realistic data Fixture files API responses, complex structures
Random data Property-based Discovering edge cases
Time-sensitive Fixed timestamps Reproducible time-based tests
User scenarios Scenario builders Multi-step workflows

Framework-Specific Modern Patterns (2024/2025)

Jest/Vitest (JavaScript/TypeScript)

// Modern patterns with Vitest (faster than Jest)
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { render, screen, waitFor } from '@testing-library/react';
import { userEvent } from '@testing-library/user-event';

describe('UserProfileForm', () => {
  // Use vi.fn() for mocks (Vitest API)
  const mockOnSubmit = vi.fn();

  beforeEach(() => {
    vi.clearAllMocks();
  });

  it('should_validate_email_format_before_submission', async () => {
    // Arrange
    render(<UserProfileForm onSubmit={mockOnSubmit} />);
    const emailInput = screen.getByLabelText(/email/i);
    const submitButton = screen.getByRole('button', { name: /submit/i });

    // Act
    await userEvent.type(emailInput, 'invalid-email');
    await userEvent.click(submitButton);

    // Assert
    expect(await screen.findByText(/invalid email format/i)).toBeInTheDocument();
    expect(mockOnSubmit).not.toHaveBeenCalled();
  });

  // Property-based test with fast-check
  it.prop([fc.emailAddress()])('should_accept_any_valid_email', async (email) => {
    render(<UserProfileForm onSubmit={mockOnSubmit} />);
    const emailInput = screen.getByLabelText(/email/i);

    await userEvent.type(emailInput, email);
    await userEvent.click(screen.getByRole('button', { name: /submit/i }));

    await waitFor(() => {
      expect(mockOnSubmit).toHaveBeenCalledWith(
        expect.objectContaining({ email })
      );
    });
  });
});

Pytest (Python)

# Modern pytest patterns with async support
import pytest
from hypothesis import given, strategies as st

# Pytest fixtures with scopes
@pytest.fixture(scope="session")
async def async_client():
    """Async HTTP client for API tests"""
    async with httpx.AsyncClient(base_url="http://testserver") as client:
        yield client

@pytest.fixture
def api_key_header():
    """Reusable authentication header"""
    return {"Authorization": "Bearer test_token_123"}

# Parametrized tests (cleaner than loops)
@pytest.mark.parametrize("status_code,expected_retry", [
    (500, True),
    (502, True),
    (503, True),
    (400, False),
    (404, False),
    (200, False),
])
async def test_should_retry_on_server_errors(
    async_client, status_code, expected_retry
):
    # This test will fail until retry logic is implemented
    with mock.patch('httpx.AsyncClient.get') as mock_get:
        mock_get.return_value.status_code = status_code

        client = RetryableClient(async_client)
        await client.fetch_data("/api/resource")

        if expected_retry:
            assert mock_get.call_count > 1, \
                f"Expected retries for {status_code}"
        else:
            assert mock_get.call_count == 1, \
                f"Should not retry for {status_code}"

# Property-based test
@given(st.lists(st.integers(), min_size=1, max_size=100))
def test_median_calculation_properties(numbers):
    """Test mathematical properties of median function"""
    result = calculate_median(numbers)

    # Property: median should be in the list or between two values
    sorted_nums = sorted(numbers)
    if len(numbers) % 2 == 1:
        assert result in numbers
    else:
        # Even length: median is average of middle two
        mid = len(sorted_nums) // 2
        expected = (sorted_nums[mid-1] + sorted_nums[mid]) / 2
        assert result == expected

Go Testing (Table-Driven + Subtests)

// Modern Go testing patterns (Go 1.23+)
package calculator_test

import (
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestCalculator_Divide(t *testing.T) {
    t.Parallel() // Enable parallel execution

    tests := []struct {
        name          string
        numerator     float64
        denominator   float64
        want          float64
        wantErr       bool
        errContains   string
    }{
        {
            name:        "positive_numbers",
            numerator:   10.0,
            denominator: 2.0,
            want:        5.0,
            wantErr:     false,
        },
        {
            name:        "negative_numerator",
            numerator:   -10.0,
            denominator: 2.0,
            want:        -5.0,
            wantErr:     false,
        },
        {
            name:        "divide_by_zero",
            numerator:   10.0,
            denominator: 0.0,
            wantErr:     true,
            errContains: "division by zero",
        },
        {
            name:        "very_small_denominator",
            numerator:   1.0,
            denominator: 0.0000001,
            want:        10000000.0,
            wantErr:     false,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            t.Parallel() // Each subtest runs in parallel

            // Act
            got, err := Divide(tt.numerator, tt.denominator)

            // Assert
            if tt.wantErr {
                require.Error(t, err, "expected error but got none")
                assert.Contains(t, err.Error(), tt.errContains)
                return
            }

            require.NoError(t, err)
            assert.InDelta(t, tt.want, got, 0.0001, "result outside acceptable delta")
        })
    }
}

// Fuzzing support (Go 1.18+)
func FuzzDivide(f *testing.F) {
    // Seed corpus with interesting cases
    f.Add(10.0, 2.0)
    f.Add(-5.0, 3.0)
    f.Add(0.0, 1.0)

    f.Fuzz(func(t *testing.T, a, b float64) {
        // Property: Division should never panic
        defer func() {
            if r := recover(); r != nil {
                t.Errorf("Divide panicked with inputs (%v, %v): %v", a, b, r)
            }
        }()

        result, err := Divide(a, b)

        // Property: If no error, result * denominator ≈ numerator
        if err == nil && b != 0 {
            reconstructed := result * b
            if !floatsEqual(reconstructed, a, 0.0001) {
                t.Errorf("Property violated: (%v / %v) * %v = %v, expected %v",
                    a, b, b, reconstructed, a)
            }
        }
    })
}

RSpec (Ruby)

# Modern RSpec patterns with let! and shared contexts
RSpec.describe UserService do
  # Lazy-loaded test data
  let(:user_repository) { instance_double(UserRepository) }
  let(:email_service) { instance_double(EmailService) }
  let(:service) { described_class.new(user_repository, email_service) }

  # Eagerly evaluated (runs before each test)
  let!(:test_user) do
    User.new(
      email: 'test@example.com',
      name: 'Test User',
      verified: false
    )
  end

  describe '#send_verification_email' do
    context 'when user exists and is unverified' do
      before do
        allow(user_repository).to receive(:find_by_email)
          .with('test@example.com')
          .and_return(test_user)
        allow(email_service).to receive(:send_verification)
          .and_return(true)
      end

      it 'sends verification email' do
        service.send_verification_email('test@example.com')

        expect(email_service).to have_received(:send_verification)
          .with(
            to: 'test@example.com',
            token: a_string_matching(/^[A-Za-z0-9]{32}$/)
          )
      end

      it 'updates user verification_sent_at timestamp' do
        expect {
          service.send_verification_email('test@example.com')
        }.to change { test_user.verification_sent_at }.from(nil)
      end
    end

    context 'when user is already verified' do
      before do
        test_user.verified = true
        allow(user_repository).to receive(:find_by_email)
          .and_return(test_user)
      end

      it 'raises AlreadyVerifiedError' do
        expect {
          service.send_verification_email('test@example.com')
        }.to raise_error(UserService::AlreadyVerifiedError)
      end

      it 'does not send email' do
        begin
          service.send_verification_email('test@example.com')
        rescue UserService::AlreadyVerifiedError
          # Expected
        end

        expect(email_service).not_to have_received(:send_verification)
      end
    end
  end
end

Edge Case Identification Strategies

Systematic Edge Case Discovery

  1. Boundary Value Analysis

    • Test at, just below, and just above boundaries
    • Empty collections, single item, maximum capacity
    • Min/max numeric values, zero, negative
    • Start/end of time ranges
  2. Equivalence Partitioning

    • Divide input domain into valid/invalid classes
    • Test one value from each partition
    • Example: age groups (child, adult, senior) + invalid (negative, too large)
  3. State Transition Edge Cases

    • Invalid state transitions
    • Concurrent state modifications
    • State after errors/rollbacks
    • Idempotency of operations
  4. Data Type Edge Cases

    • Strings: empty, whitespace-only, very long, special characters, Unicode
    • Numbers: zero, negative, infinity, NaN, precision limits
    • Dates: leap years, timezone boundaries, DST transitions
    • Collections: empty, single element, duplicates, null elements
  5. Error Condition Edge Cases

    • Network failures mid-operation
    • Timeout scenarios
    • Out of memory conditions
    • Permission denied scenarios
    • Resource exhaustion (connections, file handles)

Edge Case Checklist Template

For any function/feature, systematically test:

  • Null/undefined/None inputs (if applicable)
  • Empty inputs (empty string, empty array, empty object)
  • Single element/minimum viable input
  • Maximum size/length inputs
  • Boundary values (min, max, min-1, max+1)
  • Special characters (if string input)
  • Unicode/internationalization (if text handling)
  • Concurrent access (if shared state)
  • Repeated operations (idempotency)
  • Invalid type/format inputs
  • Partial/incomplete inputs
  • Mutually exclusive options
  • Time-dependent behavior (if applicable)
  • Resource exhaustion scenarios
  • Error recovery paths

Test Isolation Patterns

Isolation Techniques by Test Type

Unit Test Isolation:

// BEFORE: Tests with shared state (BAD - tests can interfere)
let sharedCart: ShoppingCart;

beforeAll(() => {
  sharedCart = new ShoppingCart();
});

test('add item increases count', () => {
  sharedCart.addItem(product1);
  expect(sharedCart.itemCount).toBe(1);
});

test('remove item decreases count', () => {
  sharedCart.removeItem(product1); // Depends on previous test!
  expect(sharedCart.itemCount).toBe(0);
});

// AFTER: Isolated tests (GOOD)
describe('ShoppingCart', () => {
  let cart: ShoppingCart;

  beforeEach(() => {
    cart = new ShoppingCart(); // Fresh instance per test
  });

  test('add item increases count', () => {
    cart.addItem(product1);
    expect(cart.itemCount).toBe(1);
  });

  test('remove item decreases count', () => {
    cart.addItem(product1);
    cart.removeItem(product1);
    expect(cart.itemCount).toBe(0);
  });
});

Database Test Isolation:

# Pattern: Transaction rollback for isolation
@pytest.fixture
def db_session(db_engine):
    """Each test gets a transaction that's rolled back"""
    connection = db_engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)

    yield session

    session.close()
    transaction.rollback()  # Undo all changes
    connection.close()

# Pattern: Database truncation between tests
@pytest.fixture(autouse=True)
def truncate_tables(db_session):
    """Clear all tables before each test"""
    yield
    for table in reversed(Base.metadata.sorted_tables):
        db_session.execute(table.delete())
    db_session.commit()

Time-Based Test Isolation:

// Use dependency injection for time
type Clock interface {
    Now() time.Time
}

type RealClock struct{}
func (c RealClock) Now() time.Time { return time.Now() }

type FakeClock struct {
    CurrentTime time.Time
}
func (c *FakeClock) Now() time.Time { return c.CurrentTime }

// In tests
func TestExpiration(t *testing.T) {
    fakeClock := &FakeClock{
        CurrentTime: time.Date(2024, 1, 1, 0, 0, 0, 0, time.UTC),
    }
    service := NewService(fakeClock)

    // Test time-dependent behavior with full control
    assert.False(t, service.IsExpired(item))

    fakeClock.CurrentTime = fakeClock.CurrentTime.Add(48 * time.Hour)
    assert.True(t, service.IsExpired(item))
}

File System Test Isolation:

# Use temporary directories
import tempfile
import pytest

@pytest.fixture
def temp_dir():
    with tempfile.TemporaryDirectory() as tmpdir:
        yield Path(tmpdir)
    # Automatically cleaned up

def test_file_processing(temp_dir):
    input_file = temp_dir / "input.txt"
    input_file.write_text("test data")

    process_file(input_file)

    output_file = temp_dir / "output.txt"
    assert output_file.exists()
    assert output_file.read_text() == "processed: test data"

Modern Testing Practices (2024/2025)

Mutation Testing Integration

Mutation testing ensures your tests actually catch bugs by introducing deliberate code mutations:

// stryker.conf.js - Mutation testing configuration
module.exports = {
  mutator: "javascript",
  packageManager: "npm",
  reporters: ["html", "clear-text", "progress"],
  testRunner: "jest",
  coverageAnalysis: "perTest",
  mutate: [
    "src/**/*.js",
    "!src/**/*.test.js"
  ],
  thresholds: {
    high: 80,
    low: 60,
    break: 50  // Fail build if mutation score below 50%
  }
};

// CI/CD integration
// .github/workflows/test.yml
- name: Run mutation tests
  run: npx stryker run
  continue-on-error: false  // Fail build on low mutation score

AI-Assisted Test Generation

# .github/workflows/ai-test-generation.yml
name: AI Test Suggestions
on: pull_request

jobs:
  suggest-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Analyze code coverage
        run: npm run test:coverage

      - name: Generate test suggestions
        uses: ai-test-generator-action@v1
        with:
          coverage-file: coverage/coverage-summary.json
          min-coverage: 80
          focus-areas: "uncovered-lines,complex-functions"

      - name: Post suggestions as comment
        uses: actions/github-script@v6
        with:
          script: |
            const suggestions = require('./test-suggestions.json');
            const body = formatSuggestions(suggestions);
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });

Contract Testing for Microservices

// Using Pact for consumer-driven contract testing
const { Pact } = require('@pact-foundation/pact');
const { UserApiClient } = require('../src/api-client');

describe('User API Contract', () => {
  const provider = new Pact({
    consumer: 'FrontendApp',
    provider: 'UserService',
    port: 1234,
  });

  beforeAll(() => provider.setup());
  afterAll(() => provider.finalize());

  describe('GET /users/:id', () => {
    it('should_return_user_when_id_exists', async () => {
      // Define expected interaction
      await provider.addInteraction({
        state: 'user 123 exists',
        uponReceiving: 'a request for user 123',
        withRequest: {
          method: 'GET',
          path: '/users/123',
          headers: {
            Accept: 'application/json',
          },
        },
        willRespondWith: {
          status: 200,
          headers: {
            'Content-Type': 'application/json',
          },
          body: {
            id: 123,
            name: 'Test User',
            email: 'test@example.com',
          },
        },
      });

      // Test consumer code against contract
      const client = new UserApiClient('http://localhost:1234');
      const user = await client.getUser(123);

      expect(user.id).toBe(123);
      expect(user.name).toBe('Test User');

      await provider.verify();
    });
  });
});

Snapshot Testing for Complex Output

// React component snapshot test
import { render } from '@testing-library/react';
import { UserProfile } from './UserProfile';

describe('UserProfile', () => {
  it('should_match_snapshot_for_complete_profile', () => {
    const user = {
      name: 'John Doe',
      email: 'john@example.com',
      avatar: 'https://example.com/avatar.jpg',
      bio: 'Software developer',
      joinDate: '2024-01-15',
    };

    const { container } = render(<UserProfile user={user} />);

    expect(container.firstChild).toMatchSnapshot();
  });

  it('should_match_snapshot_for_minimal_profile', () => {
    const user = {
      name: 'Jane Doe',
      email: 'jane@example.com',
    };

    const { container } = render(<UserProfile user={user} />);

    expect(container.firstChild).toMatchSnapshot();
  });
});

// API response snapshot test
describe('GET /api/users', () => {
  it('should_match_response_structure', async () => {
    const response = await request(app).get('/api/users?page=1&limit=10');

    // Snapshot with dynamic data masked
    expect(response.body).toMatchSnapshot({
      data: expect.arrayContaining([
        expect.objectContaining({
          id: expect.any(String),
          createdAt: expect.any(String),
        }),
      ]),
      pagination: {
        page: 1,
        limit: 10,
        total: expect.any(Number),
      },
    });
  });
});

Performance Testing in TDD

# pytest-benchmark for performance testing
def test_search_performance(benchmark):
    """Search should complete within 100ms for 10k items"""
    dataset = generate_test_data(10000)
    search_engine = SearchEngine(dataset)

    # Benchmark the function
    result = benchmark(search_engine.search, query="test")

    # Assertions on performance
    assert benchmark.stats.mean < 0.1, "Mean search time exceeds 100ms"
    assert benchmark.stats.max < 0.5, "Max search time exceeds 500ms"

    # Functional assertions
    assert len(result) > 0
    assert all(item.matches_query("test") for item in result)

# Load testing integration
def test_concurrent_request_handling():
    """System should handle 100 concurrent requests"""
    import concurrent.futures

    def make_request():
        response = client.get('/api/search?q=test')
        return response.status_code, response.elapsed.total_seconds()

    with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
        futures = [executor.submit(make_request) for _ in range(100)]
        results = [f.result() for f in concurrent.futures.as_completed(futures)]

    success_count = sum(1 for status, _ in results if status == 200)
    avg_response_time = sum(elapsed for _, elapsed in results) / len(results)

    assert success_count >= 95, "Less than 95% success rate"
    assert avg_response_time < 1.0, "Average response time exceeds 1 second"

CI/CD Integration Patterns

GitHub Actions TDD Workflow

# .github/workflows/tdd-workflow.yml
name: TDD Workflow

on: [push, pull_request]

jobs:
  test-red-phase:
    name: Verify Tests Fail
    runs-on: ubuntu-latest
    if: contains(github.event.head_commit.message, '[RED]')

    steps:
      - uses: actions/checkout@v3

      - name: Setup environment
        uses: actions/setup-node@v3
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run tests (should fail)
        id: test-run
        run: npm test
        continue-on-error: true

      - name: Verify tests failed
        if: steps.test-run.outcome == 'success'
        run: |
          echo "ERROR: Tests passed but should fail in RED phase"
          exit 1

      - name: Check test output
        run: |
          echo "Tests correctly failing in RED phase ✓"

  test-green-phase:
    name: Verify Tests Pass
    runs-on: ubuntu-latest
    if: contains(github.event.head_commit.message, '[GREEN]')

    steps:
      - uses: actions/checkout@v3

      - name: Setup environment
        uses: actions/setup-node@v3
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run tests (must pass)
        run: npm test

      - name: Generate coverage
        run: npm run test:coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/coverage-final.json
          fail_ci_if_error: true

      - name: Check coverage thresholds
        run: |
          npm run check-coverage -- --lines 80 --branches 75

  test-refactor-phase:
    name: Verify Refactor Safety
    runs-on: ubuntu-latest
    if: contains(github.event.head_commit.message, '[REFACTOR]')

    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Need full history for comparison

      - name: Setup environment
        uses: actions/setup-node@v3
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Run mutation tests
        run: npm run test:mutation

      - name: Verify no behavior changes
        run: |
          # Compare test results with previous commit
          git checkout HEAD~1
          npm ci
          npm test -- --json > /tmp/before.json
          git checkout -
          npm test -- --json > /tmp/after.json
          node scripts/compare-test-results.js /tmp/before.json /tmp/after.json

  full-tdd-cycle:
    name: Complete TDD Cycle
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Setup
        uses: actions/setup-node@v3
        with:
          node-version: '20'

      - run: npm ci

      - name: Unit Tests
        run: npm run test:unit

      - name: Integration Tests
        run: npm run test:integration

      - name: E2E Tests
        run: npm run test:e2e

      - name: Mutation Testing
        run: npm run test:mutation
        continue-on-error: true

      - name: Coverage Report
        run: npm run coverage:report

      - name: Quality Gates
        run: |
          node scripts/quality-gates.js \
            --min-coverage 80 \
            --min-mutation-score 60 \
            --max-test-time 300

Pre-commit Hook for TDD Discipline

#!/bin/bash
# .git/hooks/pre-commit - Enforce TDD discipline

# Check if commit message indicates TDD phase
commit_msg=$(cat .git/COMMIT_EDITMSG 2>/dev/null || echo "")

# Run tests before allowing commit
echo "Running tests before commit..."
npm test

if [ $? -ne 0 ]; then
    if [[ $commit_msg == *"[RED]"* ]]; then
        echo "✓ Tests failing as expected for RED phase"
        exit 0
    else
        echo "✗ Tests failing. Use [RED] in commit message if this is intentional."
        echo "  Or fix the tests before committing."
        exit 1
    fi
else
    if [[ $commit_msg == *"[RED]"* ]]; then
        echo "✗ Tests passing but commit marked as [RED] phase"
        echo "  Remove [RED] tag or ensure tests actually fail"
        exit 1
    else
        echo "✓ All tests passing"
        exit 0
    fi
fi

Test Quality Metrics

Key Metrics to Track

  1. Test Coverage

    • Line coverage: % of code lines executed
    • Branch coverage: % of decision branches taken
    • Function coverage: % of functions called
    • Target: >80% line, >75% branch
  2. Mutation Score

    • % of introduced bugs caught by tests
    • Target: >60% mutation score
    • Measures test effectiveness, not just coverage
  3. Test Execution Time

    • Unit tests: <1s total
    • Integration tests: <30s total
    • E2E tests: <5min total
    • Track trends over time
  4. Test Maintainability

    • Lines of test code / lines of production code ratio
    • Target: 1:1 to 2:1
    • Number of assertion per test (prefer 1-3)
  5. Test Flakiness

    • % of tests that fail intermittently
    • Target: <1% flaky tests
    • Track and fix immediately

Dashboard Example

// scripts/tdd-metrics-dashboard.js
const metrics = {
  coverage: {
    lines: 87.5,
    branches: 82.3,
    functions: 91.2,
    statements: 87.5
  },
  mutation: {
    score: 68.5,
    killed: 137,
    survived: 63,
    noCoverage: 12
  },
  performance: {
    unit: { count: 245, time: 0.8, avgTime: 0.003 },
    integration: { count: 67, time: 18.5, avgTime: 0.276 },
    e2e: { count: 23, time: 145.3, avgTime: 6.317 }
  },
  quality: {
    testToCodeRatio: 1.4,
    avgAssertionsPerTest: 2.1,
    flakyTests: 2,
    flakinessRate: 0.6  // 2/335 = 0.6%
  }
};

console.log(`
TDD Metrics Dashboard
=====================

Coverage:
  Lines:      ${metrics.coverage.lines}% ${status(metrics.coverage.lines, 80)}
  Branches:   ${metrics.coverage.branches}% ${status(metrics.coverage.branches, 75)}
  Functions:  ${metrics.coverage.functions}% ${status(metrics.coverage.functions, 80)}

Mutation Testing:
  Score:      ${metrics.mutation.score}% ${status(metrics.mutation.score, 60)}
  Killed:     ${metrics.mutation.killed}
  Survived:   ${metrics.mutation.survived}

Performance:
  Unit:        ${metrics.performance.unit.count} tests in ${metrics.performance.unit.time}s
  Integration: ${metrics.performance.integration.count} tests in ${metrics.performance.integration.time}s
  E2E:         ${metrics.performance.e2e.count} tests in ${metrics.performance.e2e.time}s

Quality:
  Test/Code Ratio:    ${metrics.quality.testToCodeRatio}:1
  Flaky Tests:        ${metrics.quality.flakyTests} (${metrics.quality.flakinessRate}%)
  Avg Assertions:     ${metrics.quality.avgAssertionsPerTest}
`);

function status(value, threshold) {
  return value >= threshold ? '✓' : '✗ BELOW THRESHOLD';
}

Test requirements: $ARGUMENTS