---
title: "Database Testing Strategies"
description: "Decision framework for database testing — test data management, fixtures vs factories, database containers with Testcontainers, migration testing, performance regression testing, and data masking for test environments."
url: https://agent-zone.ai/knowledge/databases/database-testing-patterns/
section: knowledge
date: 2026-02-22
categories: ["databases"]
tags: ["testing","testcontainers","fixtures","factories","migration-testing","performance-testing","data-masking","test-data"]
skills: ["database-testing","test-data-management","migration-validation","performance-regression-testing"]
tools: ["testcontainers","docker","pg_dump","faker","factory-bot","flyway","liquibase"]
levels: ["intermediate"]
word_count: 957
formats:
  json: https://agent-zone.ai/knowledge/databases/database-testing-patterns/index.json
  html: https://agent-zone.ai/knowledge/databases/database-testing-patterns/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Database+Testing+Strategies
---


# Database Testing Strategies

Database tests are the tests most teams get wrong. They either skip them entirely (testing with mocks, then discovering schema mismatches in production), or they build a fragile suite sharing a single database where tests interfere with each other. The right approach depends on what you are testing and what tradeoffs you can accept.

## Fixtures vs Factories

### Fixtures

Fixtures are static SQL files loaded before a test suite runs:

```sql
-- fixtures/users.sql
INSERT INTO users (id, email, role) VALUES
  (1, 'admin@test.com', 'admin'),
  (2, 'user@test.com', 'member');

-- fixtures/orders.sql
INSERT INTO orders (id, user_id, total, status) VALUES
  (100, 2, 49.99, 'completed'),
  (101, 2, 129.00, 'pending');
```

**Use fixtures when:** the domain model is stable, tests need complex interconnected data, and you want fast loading. **Fixtures break when:** schema changes require updating every fixture file -- a new required column means editing every INSERT across all files.

### Factories

Factories generate test data on demand with sensible defaults:

```python
import factory
from myapp.models import User, Order

class UserFactory(factory.Factory):
    class Meta:
        model = User
    email = factory.Sequence(lambda n: f"user{n}@test.com")
    role = "member"

class OrderFactory(factory.Factory):
    class Meta:
        model = Order
    user = factory.SubFactory(UserFactory)
    total = factory.Faker("pydecimal", left_digits=3, right_digits=2, positive=True)
    status = "pending"

# Each test builds exactly what it needs
def test_order_completion():
    order = OrderFactory(status="pending", total=49.99)
    complete_order(order.id)
    assert Order.get(order.id).status == "completed"
```

**Use factories when:** schema changes frequently and each test should be self-documenting. **Factories break when:** you need complex graphs of related data -- building an order that requires a user, product, inventory, warehouse, and shipping zone becomes a chain of factory calls.

**The pragmatic approach:** use both. Factories for test-specific data, fixtures for reference data that rarely changes (countries, currencies, permission definitions).

## Database Containers with Testcontainers

Testcontainers spins up real database instances in Docker for each test run. No shared test databases, no version mismatches between local and CI.

```java
@Testcontainers
class OrderRepositoryTest {
    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
        .withDatabaseName("testdb")
        .withInitScript("schema.sql");

    @Test
    void shouldCreateOrder() {
        OrderRepository repo = new OrderRepository(dataSource);
        Order order = repo.create(new Order(userId, 49.99));
        assertThat(order.getId()).isNotNull();
    }
}
```

```go
func TestOrderRepository(t *testing.T) {
    ctx := context.Background()
    container, err := postgres.Run(ctx, "postgres:16",
        postgres.WithDatabase("testdb"),
    )
    require.NoError(t, err)
    defer container.Terminate(ctx)

    connStr, _ := container.ConnectionString(ctx, "sslmode=disable")
    db, _ := sql.Open("pgx", connStr)
    runMigrations(db)
    // test against real PostgreSQL
}
```

**Use Testcontainers for:** integration tests against real database engines, testing database-specific features (jsonb queries, full-text search), CI pipelines needing reproducible state. **Skip it for:** unit tests where in-memory databases or mocks suffice and container startup time (2-5 seconds) hurts fast feedback.

### Container Lifecycle

One container per suite with transaction rollback between tests balances isolation and speed:

```python
@pytest.fixture(autouse=True)
def db_transaction(db_connection):
    transaction = db_connection.begin()
    yield db_connection
    transaction.rollback()  # every test starts clean
```

## Migration Testing

Schema migrations are deployments that run DDL against production data. Test them like deployments.

### Forward and Rollback Testing

```bash
#!/bin/bash
# Run in CI on every PR with migrations
docker run -d --name migration-test -e POSTGRES_DB=testdb postgres:16
sleep 3

# Apply all migrations up to previous release
flyway -url=jdbc:postgresql://localhost:5432/testdb \
       -target=previous_release migrate

# Load representative test data
psql -h localhost -d testdb -f test-data/representative-dataset.sql

# Apply new migrations -- this is what we are testing
flyway -url=jdbc:postgresql://localhost:5432/testdb migrate

# Validate schema matches expected state
pg_dump -h localhost -d testdb --schema-only > actual_schema.sql
diff expected_schema.sql actual_schema.sql
```

Every migration should have a rollback script. Test both directions:

```sql
-- V5__add_order_status.sql (forward)
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';

-- V5__add_order_status_rollback.sql (reverse)
ALTER TABLE orders DROP COLUMN status;
```

### Data Migration Testing

When migrations transform data, test with production-scale volumes. A migration taking 200ms on 1000 rows might take 45 minutes on 10 million rows, locking the table the entire time. Run data migrations against a masked copy of production data to discover this before the maintenance window.

## Performance Regression Testing

Database performance degrades gradually. A query that took 5ms last month takes 50ms now because the table grew and nobody noticed the missing index.

### Baseline and Regression Detection

```bash
# Capture EXPLAIN ANALYZE for critical queries in CI
psql -h localhost -d testdb <<'EOF' > baseline/order_lookup.json
EXPLAIN (ANALYZE, FORMAT JSON)
SELECT o.*, u.email FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days';
EOF
```

```python
def check_regression(baseline_file, current_file, threshold=2.0):
    baseline = json.load(open(baseline_file))
    current = json.load(open(current_file))
    baseline_time = baseline[0]["Execution Time"]
    current_time = current[0]["Execution Time"]
    if current_time > baseline_time * threshold:
        print(f"REGRESSION: {baseline_time:.1f}ms -> {current_time:.1f}ms")
        return False
    return True
```

Run with a dataset large enough to reveal performance differences. A 1000-row test database will not expose a missing index on a low-cardinality column.

## Data Masking for Test Environments

Test environments need realistic data. Production data contains PII. Data masking bridges the gap.

```sql
-- Mask emails (preserve domain distribution)
UPDATE users SET email = 'user_' || id || '@' ||
  CASE (id % 3)
    WHEN 0 THEN 'gmail.com'
    WHEN 1 THEN 'company.com'
    WHEN 2 THEN 'outlook.com'
  END;

-- Mask names
UPDATE users SET first_name = 'First' || id, last_name = 'Last' || id;

-- Mask phone numbers (preserve format)
UPDATE users SET phone = '+1555' || LPAD((id % 10000000)::TEXT, 7, '0');
```

Mask in dependency order (parent tables first) and verify referential integrity after masking. Automate the pipeline: nightly dump of production, mask in an isolated database, validate, restore to staging.

## Decision Matrix

| Scenario | Approach | Why |
|---|---|---|
| Unit testing repository logic | Factories + Testcontainers | Isolated, each test defines its own state |
| Testing DB-specific features | Testcontainers | Real engine, reproducible |
| Testing migrations | Forward + rollback against representative data | Catches data bugs before production |
| Performance regression detection | Baseline capture in CI with production-scale data | Prevents gradual degradation |
| Staging environment data | Masked production dump | Realistic without PII exposure |
| Reference/seed data | Fixtures | Stable data loaded once |

The common thread: test against real databases, with realistic data, in isolated environments. Mocks catch logic errors but miss the schema mismatches, query performance issues, and migration failures that cause production incidents.

