---
title: "Temporal Workflow Example: Container Lifecycle Management with Docker"
description: "End-to-end Temporal workflow example implementing container lifecycle management: inspect, stop, snapshot, and tag Docker containers with compensation on failure and dependency injection for testability."
url: https://agent-zone.ai/knowledge/workflow-orchestration/temporal-container-lifecycle-workflow/
section: knowledge
date: 2026-02-22
categories: ["workflow-orchestration"]
tags: ["temporal","docker","container-lifecycle","workflow-example","compensation","child-workflow","azure-vm","snapshots"]
skills: ["container-lifecycle-management","workflow-compensation","docker-api-integration","snapshot-management"]
tools: ["temporal","go","docker"]
levels: ["intermediate"]
word_count: 1729
formats:
  json: https://agent-zone.ai/knowledge/workflow-orchestration/temporal-container-lifecycle-workflow/index.json
  html: https://agent-zone.ai/knowledge/workflow-orchestration/temporal-container-lifecycle-workflow/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Temporal+Workflow+Example%3A+Container+Lifecycle+Management+with+Docker
---


# Container Lifecycle Workflow

This article builds a complete Temporal workflow that manages Docker container lifecycle operations: inspect a container, stop it if running, create a snapshot (commit), and handle failures by restarting the container. It demonstrates every pattern from [Multi-Stage Temporal Workflows](../temporal-multi-stage-workflows/) in a concrete, runnable example.

The full source is in the [companion repo](https://github.com/statherm/temporal-examples) under `container-lifecycle/`.

## The Use Case

You need to automate container maintenance: take a snapshot of a running container for backup or migration purposes. The sequence is:

1. **Inspect** the container to determine its current state.
2. **Stop** the container if it is running (snapshots of running containers are unreliable).
3. **Commit** (snapshot) the stopped container to create an image.
4. Return the snapshot image ID.

This must be idempotent -- running the workflow twice on the same container should not cause errors. If the container is already stopped, skip the stop step. If the commit fails after the container was stopped, restart the container so it is not left in a stopped state.

## Docker Client Interface

The activities need a Docker client. Rather than using the Docker SDK directly, define an interface. This lets you inject a mock client for tests and a real client for production.

```go
type ContainerClient interface {
    Inspect(ctx context.Context, id string) (ContainerInfo, error)
    Stop(ctx context.Context, id string) error
    Start(ctx context.Context, id string) error
    Commit(ctx context.Context, id string, ref string) (string, error)
}

type ContainerInfo struct {
    ID    string
    State string // "running", "exited", "paused", etc.
    Image string
}
```

The production implementation wraps the Docker SDK:

```go
type DockerContainerClient struct {
    client *docker.Client
}

func NewDockerContainerClient() (*DockerContainerClient, error) {
    cli, err := docker.NewClientWithOpts(docker.FromEnv, docker.WithAPIVersionNegotiation())
    if err != nil {
        return nil, fmt.Errorf("create docker client: %w", err)
    }
    return &DockerContainerClient{client: cli}, nil
}

func (d *DockerContainerClient) Inspect(ctx context.Context, id string) (ContainerInfo, error) {
    resp, err := d.client.ContainerInspect(ctx, id)
    if err != nil {
        return ContainerInfo{}, fmt.Errorf("inspect container %s: %w", id, err)
    }
    return ContainerInfo{
        ID:    resp.ID,
        State: resp.State.Status,
        Image: resp.Config.Image,
    }, nil
}

func (d *DockerContainerClient) Stop(ctx context.Context, id string) error {
    timeout := 30
    return d.client.ContainerStop(ctx, id, container.StopOptions{Timeout: &timeout})
}

func (d *DockerContainerClient) Start(ctx context.Context, id string) error {
    return d.client.ContainerStart(ctx, id, container.StartOptions{})
}

func (d *DockerContainerClient) Commit(ctx context.Context, id string, ref string) (string, error) {
    resp, err := d.client.ContainerCommit(ctx, id, container.CommitOptions{Reference: ref})
    if err != nil {
        return "", fmt.Errorf("commit container %s: %w", id, err)
    }
    return resp.ID, nil
}
```

The mock for testing:

```go
type MockContainerClient struct {
    mock.Mock
}

func (m *MockContainerClient) Inspect(ctx context.Context, id string) (ContainerInfo, error) {
    args := m.Called(ctx, id)
    return args.Get(0).(ContainerInfo), args.Error(1)
}

func (m *MockContainerClient) Stop(ctx context.Context, id string) error {
    return m.Called(ctx, id).Error(0)
}

func (m *MockContainerClient) Start(ctx context.Context, id string) error {
    return m.Called(ctx, id).Error(0)
}

func (m *MockContainerClient) Commit(ctx context.Context, id string, ref string) (string, error) {
    args := m.Called(ctx, id, ref)
    return args.String(0), args.Error(1)
}
```

## Activities

Each activity wraps a single Docker operation. Activities are methods on a struct that holds the `ContainerClient`, enabling dependency injection.

```go
type ContainerActivities struct {
    client ContainerClient
}

func NewContainerActivities(client ContainerClient) *ContainerActivities {
    return &ContainerActivities{client: client}
}
```

**InspectContainer** -- Returns the container's current state. No side effects, no idempotency concerns.

```go
type InspectRequest struct {
    ContainerID string
}

func (a *ContainerActivities) InspectContainer(ctx context.Context, req InspectRequest) (ContainerInfo, error) {
    return a.client.Inspect(ctx, req.ContainerID)
}
```

**StopContainer** -- Idempotent. Checks state before stopping. If already stopped, returns success.

```go
type StopRequest struct {
    ContainerID string
}

func (a *ContainerActivities) StopContainer(ctx context.Context, req StopRequest) error {
    info, err := a.client.Inspect(ctx, req.ContainerID)
    if err != nil {
        return fmt.Errorf("inspect before stop: %w", err)
    }

    if info.State != "running" {
        // Already stopped, nothing to do
        return nil
    }

    return a.client.Stop(ctx, req.ContainerID)
}
```

**CommitContainer** -- Creates a snapshot image from the stopped container.

```go
type CommitRequest struct {
    ContainerID string
    Reference   string
}

type CommitResult struct {
    ImageID string
}

func (a *ContainerActivities) CommitContainer(ctx context.Context, req CommitRequest) (CommitResult, error) {
    imageID, err := a.client.Commit(ctx, req.ContainerID, req.Reference)
    if err != nil {
        return CommitResult{}, err
    }
    return CommitResult{ImageID: imageID}, nil
}
```

**StartContainer** -- Idempotent. Used for compensation. Checks state before starting.

```go
type StartRequest struct {
    ContainerID string
}

func (a *ContainerActivities) StartContainer(ctx context.Context, req StartRequest) error {
    info, err := a.client.Inspect(ctx, req.ContainerID)
    if err != nil {
        return fmt.Errorf("inspect before start: %w", err)
    }

    if info.State == "running" {
        return nil
    }

    return a.client.Start(ctx, req.ContainerID)
}
```

## The Workflow

The top-level workflow inspects the container and delegates to a child workflow if the container needs to be stopped and snapshotted.

```go
type LifecycleRequest struct {
    ContainerID string
    SnapshotRef string
}

type LifecycleResult struct {
    ContainerID  string
    PreviousState string
    ImageID      string
}

func ContainerLifecycleWorkflow(ctx workflow.Context, req LifecycleRequest) (LifecycleResult, error) {
    actCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
        StartToCloseTimeout: time.Minute,
        RetryPolicy: &temporal.RetryPolicy{
            MaximumAttempts: 3,
        },
    })

    activities := &ContainerActivities{}

    // Step 1: Inspect
    var info ContainerInfo
    err := workflow.ExecuteActivity(actCtx, activities.InspectContainer, InspectRequest{
        ContainerID: req.ContainerID,
    }).Get(ctx, &info)
    if err != nil {
        return LifecycleResult{}, fmt.Errorf("inspect container: %w", err)
    }

    result := LifecycleResult{
        ContainerID:   req.ContainerID,
        PreviousState: info.State,
    }

    if info.State == "running" {
        // Child workflow: stop and snapshot
        childCtx := workflow.WithChildOptions(ctx, workflow.ChildWorkflowOptions{
            WorkflowID:         fmt.Sprintf("stop-snapshot-%s", req.ContainerID),
            ParentClosePolicy:  enums.PARENT_CLOSE_POLICY_TERMINATE,
            WorkflowRunTimeout: 10 * time.Minute,
        })

        var childResult StopAndSnapshotResult
        err = workflow.ExecuteChildWorkflow(childCtx, StopAndSnapshotWorkflow, StopAndSnapshotRequest{
            ContainerID: req.ContainerID,
            SnapshotRef: req.SnapshotRef,
        }).Get(ctx, &childResult)
        if err != nil {
            return LifecycleResult{}, fmt.Errorf("stop-and-snapshot: %w", err)
        }
        result.ImageID = childResult.ImageID
    } else {
        // Already stopped, just snapshot
        var commitResult CommitResult
        err = workflow.ExecuteActivity(actCtx, activities.CommitContainer, CommitRequest{
            ContainerID: req.ContainerID,
            Reference:   req.SnapshotRef,
        }).Get(ctx, &commitResult)
        if err != nil {
            return LifecycleResult{}, fmt.Errorf("commit stopped container: %w", err)
        }
        result.ImageID = commitResult.ImageID
    }

    return result, nil
}
```

The child workflow handles the stop-then-commit sequence with compensation:

```go
type StopAndSnapshotRequest struct {
    ContainerID string
    SnapshotRef string
}

type StopAndSnapshotResult struct {
    ImageID string
}

func StopAndSnapshotWorkflow(ctx workflow.Context, req StopAndSnapshotRequest) (StopAndSnapshotResult, error) {
    actCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
        StartToCloseTimeout: 2 * time.Minute,
        RetryPolicy: &temporal.RetryPolicy{
            MaximumAttempts: 3,
        },
    })

    activities := &ContainerActivities{}

    // Stop the container
    err := workflow.ExecuteActivity(actCtx, activities.StopContainer, StopRequest{
        ContainerID: req.ContainerID,
    }).Get(ctx, nil)
    if err != nil {
        return StopAndSnapshotResult{}, fmt.Errorf("stop container: %w", err)
    }

    // Commit (snapshot) the container
    var commitResult CommitResult
    err = workflow.ExecuteActivity(actCtx, activities.CommitContainer, CommitRequest{
        ContainerID: req.ContainerID,
        Reference:   req.SnapshotRef,
    }).Get(ctx, &commitResult)
    if err != nil {
        // Compensation: restart the container since we stopped it
        compensateCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
            StartToCloseTimeout: time.Minute,
            RetryPolicy: &temporal.RetryPolicy{
                MaximumAttempts: 5,
            },
        })
        _ = workflow.ExecuteActivity(compensateCtx, activities.StartContainer, StartRequest{
            ContainerID: req.ContainerID,
        }).Get(ctx, nil)

        return StopAndSnapshotResult{}, fmt.Errorf("commit failed, container restarted: %w", err)
    }

    return StopAndSnapshotResult{ImageID: commitResult.ImageID}, nil
}
```

## Compensation in Action

The compensation flow deserves a closer look. Consider what happens when the commit fails:

1. The workflow calls `StopContainer`. It succeeds. The container is now stopped.
2. The workflow calls `CommitContainer`. It fails after all retries.
3. The workflow enters the error branch. It calls `StartContainer` as compensation.
4. `StartContainer` is idempotent -- it checks if the container is already running before calling `Start`.
5. The workflow returns an error that includes both the original failure and the fact that compensation ran.

The compensation activity has its own retry policy with 5 attempts. If the container cannot be restarted, the workflow still fails, but now you have a clear audit trail in Temporal's event history showing what happened and what was attempted.

Without compensation, the container would be left stopped -- a silent failure that might not be noticed until someone investigates why a service is down.

## Testing the Workflow

Test the happy path, the idempotent path, and the compensation path. See [Testing Temporal Workflows](../temporal-workflow-testing/) for the full test suite setup.

**Happy path -- running container:**

```go
func (s *LifecycleTestSuite) TestRunningContainer_StopsAndSnapshots() {
    s.env.OnActivity(activities.InspectContainer, mock.Anything, mock.Anything).
        Return(ContainerInfo{ID: "abc123", State: "running"}, nil)
    s.env.OnActivity(activities.StopContainer, mock.Anything, mock.Anything).Return(nil)
    s.env.OnActivity(activities.CommitContainer, mock.Anything, mock.Anything).
        Return(CommitResult{ImageID: "sha256:snapshot1"}, nil)

    s.env.ExecuteWorkflow(ContainerLifecycleWorkflow, LifecycleRequest{
        ContainerID: "abc123",
        SnapshotRef: "backup:latest",
    })

    s.Require().True(s.env.IsWorkflowCompleted())
    s.Require().NoError(s.env.GetWorkflowError())

    var result LifecycleResult
    s.Require().NoError(s.env.GetWorkflowResult(&result))
    s.Require().Equal("sha256:snapshot1", result.ImageID)
    s.Require().Equal("running", result.PreviousState)
}
```

**Already stopped -- skips stop:**

```go
func (s *LifecycleTestSuite) TestAlreadyStopped_SkipsStop() {
    s.env.OnActivity(activities.InspectContainer, mock.Anything, mock.Anything).
        Return(ContainerInfo{ID: "abc123", State: "exited"}, nil)
    s.env.OnActivity(activities.CommitContainer, mock.Anything, mock.Anything).
        Return(CommitResult{ImageID: "sha256:snapshot2"}, nil)

    s.env.ExecuteWorkflow(ContainerLifecycleWorkflow, LifecycleRequest{
        ContainerID: "abc123",
        SnapshotRef: "backup:latest",
    })

    s.Require().True(s.env.IsWorkflowCompleted())
    s.Require().NoError(s.env.GetWorkflowError())
}
```

**Commit fails -- compensation restarts container:**

```go
func (s *LifecycleTestSuite) TestCommitFails_RestartsContainer() {
    s.env.OnActivity(activities.InspectContainer, mock.Anything, mock.Anything).
        Return(ContainerInfo{ID: "abc123", State: "running"}, nil)
    s.env.OnActivity(activities.StopContainer, mock.Anything, mock.Anything).Return(nil)
    s.env.OnActivity(activities.CommitContainer, mock.Anything, mock.Anything).
        Return(CommitResult{}, errors.New("disk full"))
    s.env.OnActivity(activities.StartContainer, mock.Anything, mock.Anything).Return(nil)

    s.env.ExecuteWorkflow(ContainerLifecycleWorkflow, LifecycleRequest{
        ContainerID: "abc123",
        SnapshotRef: "backup:latest",
    })

    s.Require().True(s.env.IsWorkflowCompleted())
    s.Require().Error(s.env.GetWorkflowError())
}
```

## Running with Docker

The companion repo includes a Docker Compose setup for running the workflow against real containers:

```yaml
# docker-compose.yaml
services:
  temporal:
    image: temporalio/auto-setup:latest
    ports:
      - "7233:7233"

  test-container:
    image: nginx:alpine
    container_name: lifecycle-test-target

  worker:
    build: .
    depends_on:
      - temporal
      - test-container
    environment:
      - TEMPORAL_ADDRESS=temporal:7233
      - DOCKER_HOST=unix:///var/run/docker.sock
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
```

```bash
# Start everything
docker compose up -d

# Run the workflow
make start-container-workflow CONTAINER_ID=lifecycle-test-target REF=backup:latest
```

The Makefile target uses the Temporal CLI to start the workflow:

```bash
temporal workflow start \
    --type ContainerLifecycleWorkflow \
    --task-queue container-lifecycle \
    --input '{"ContainerID":"lifecycle-test-target","SnapshotRef":"backup:latest"}'
```

## Appendix: Azure VM Equivalent

The dependency injection pattern makes it straightforward to adapt this workflow for different infrastructure providers. The workflow logic does not change -- only the client interface implementation.

| Docker | Azure VM |
|--------|----------|
| `docker inspect` | `az vm show` |
| `docker stop` | `az vm stop && az vm deallocate` |
| `docker commit` | `az snapshot create` |
| `docker start` | `az vm start` |

The Azure client interface:

```go
type VMClient interface {
    GetVM(ctx context.Context, resourceGroup, vmName string) (VMInfo, error)
    StopVM(ctx context.Context, resourceGroup, vmName string) error
    StartVM(ctx context.Context, resourceGroup, vmName string) error
    CreateSnapshot(ctx context.Context, resourceGroup, vmName, snapshotName string) (string, error)
}

type VMInfo struct {
    Name          string
    PowerState    string // "running", "deallocated", "stopped"
    ResourceGroup string
    OSDiskID      string
}
```

The activities struct changes to accept the different client:

```go
type VMActivities struct {
    client VMClient
}

func (a *VMActivities) InspectVM(ctx context.Context, req VMInspectRequest) (VMInfo, error) {
    return a.client.GetVM(ctx, req.ResourceGroup, req.VMName)
}

func (a *VMActivities) StopVM(ctx context.Context, req VMStopRequest) error {
    info, err := a.client.GetVM(ctx, req.ResourceGroup, req.VMName)
    if err != nil {
        return err
    }
    if info.PowerState != "running" {
        return nil // Already stopped, idempotent
    }
    return a.client.StopVM(ctx, req.ResourceGroup, req.VMName)
}
```

The workflow itself only changes in which activities it calls. The structure -- inspect, conditionally stop, snapshot, compensate on failure -- remains identical. This is the power of separating workflow logic from infrastructure interaction through interfaces.

The Azure implementation is documented in the companion repo but not integration-tested, since it requires an Azure subscription. The mock-based unit tests cover the workflow logic identically.

## Key Takeaways

This workflow demonstrates several production patterns working together: dependency injection for testability, idempotent activities that check state before acting, child workflows for reusable sub-processes, compensation that restores system state on failure, and interface-based design that enables provider swapping. These patterns apply to any multi-step infrastructure automation -- the container lifecycle is just one instance. For more on signal-based workflows where a human approves or rejects the snapshot, see [Temporal Signals](../temporal-signals-manual/).

