Production Deployment Guide

This guide walks through every step of deploying a Shaperail application to production, from building the release binary to monitoring a running cluster.

1. Building for production

Shaperail provides two build paths: a native release binary and a Docker image.

Native release binary

shaperail build

This runs cargo build --release and produces an optimized binary in target/release/<project-name>. Use this when you deploy to bare-metal servers or have your own container build pipeline.

Docker image

shaperail build --docker

This generates a multi-stage Dockerfile and builds a Docker image tagged with your project name. The generated Dockerfile uses:

Builder stage – rust:1.85-slim with musl toolchain for static linking
Runtime stage – FROM scratch (no OS layer)

The result is a statically linked binary on a minimal image. The target is x86_64-unknown-linux-musl.

# Generated by: shaperail build --docker
FROM rust:1.85-slim AS builder
RUN apt-get update && apt-get install -y musl-tools pkg-config ca-certificates \
    && rm -rf /var/lib/apt/lists/*
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /app
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/my-app /my-app
USER 10001:10001
EXPOSE 3000
ENTRYPOINT ["/my-app"]

Size targets:

Metric	Target
Final Docker image	< 25 MB
Release binary	< 20 MB
Idle memory at runtime	< 60 MB

The scratch base image contains no shell, no package manager, and no libc – the attack surface is minimal. CA certificates are copied from the builder stage so outbound TLS (database connections, webhook delivery) works correctly.

2. Environment configuration

Shaperail reads configuration from two sources:

shaperail.config.yaml – project configuration (checked into source control)
Environment variables – secrets and environment-specific overrides

Required environment variables

Variable	Example	Description
`DATABASE_URL`	`postgresql://user:pass@db.example.com:5432/myapp`	Primary database connection string
`REDIS_URL`	`redis://:password@redis.example.com:6379`	Redis connection string
`JWT_SECRET`	`a-long-random-string-at-least-32-chars`	HMAC signing secret for JWT tokens

Optional environment variables

Variable	Default	Description
`SHAPERAIL_PORT`	`3000`	Process-level override for the HTTP server port
`RUST_LOG`	`info`	Log level filter (e.g., `warn`, `info,shaperail_runtime=debug`)
`OTEL_EXPORTER_OTLP_ENDPOINT`	unset	OTLP gRPC endpoint; unset disables tracing
`OTEL_SERVICE_NAME`	`shaperail`	Service name in distributed traces
`SHAPERAIL_SLOW_QUERY_MS`	unset	Log warnings for queries exceeding this threshold (ms)
`WEBHOOK_SECRET`	unset	HMAC secret for outbound webhook signatures

.env files vs environment variables

In development, Shaperail reads .env from the project root. In production, do not ship a .env file. Instead, inject variables through your platform:

# Kubernetes: use Secrets (see section 7)
# Docker Compose: use environment: block or env_file:
# Systemd: use Environment= or EnvironmentFile=
# Cloud Run / ECS: use the platform's secret manager integration

Config file with environment interpolation

Use ${VAR} and ${VAR:default} in shaperail.config.yaml to reference environment variables without hardcoding secrets:

project: my-app
port: ${SHAPERAIL_PORT:3000}
workers: auto

databases:
  default:
    engine: postgres
    url: ${DATABASE_URL}
    pool_size: ${DB_POOL_SIZE:20}

cache:
  type: redis
  url: ${REDIS_URL}

auth:
  provider: jwt
  secret_env: JWT_SECRET
  expiry: 24h
  refresh_expiry: 30d

logging:
  level: info
  format: json

If a referenced variable has no default and is unset, the parser halts with an error naming the missing variable. This is intentional – production apps should never start with missing configuration.

3. Database setup

Connection pooling

Shaperail uses sqlx connection pools. The pool_size setting controls the maximum number of simultaneous connections to PostgreSQL.

Tuning guidelines:

Deployment size	Recommended `pool_size`	Notes
Single instance	10-20	Default of 20 is fine for most workloads
2-5 replicas	10 per replica	Total connections = replicas x pool_size; keep under Postgres `max_connections`
5+ replicas	5-10 per replica	Use PgBouncer in front of Postgres to multiplex connections

Key rule: The total connections across all replicas must stay below your PostgreSQL max_connections setting (default: 100). Leave headroom for migrations, monitoring, and admin connections.

# For a 4-replica deployment with Postgres max_connections=100
databases:
  default:
    engine: postgres
    url: ${DATABASE_URL}
    pool_size: 20  # 4 replicas x 20 = 80 connections, leaving 20 for admin

For multi-database setups, configure each connection independently:

databases:
  default:
    engine: postgres
    url: ${DATABASE_URL}
    pool_size: 15
  analytics:
    engine: postgres
    url: ${ANALYTICS_DATABASE_URL}
    pool_size: 5

Migrations in production

Run migrations before deploying new application code:

# Apply pending SQL migrations (uses sqlx-cli under the hood)
DATABASE_URL=postgresql://user:pass@db.example.com:5432/myapp shaperail migrate

Migration workflow for production:

Keep reviewed SQL files in migrations/ under source control
For brand-new resources, shaperail migrate can generate the missing initial create_<resource> files
For later schema changes, write the follow-up SQL migration files manually
Apply pending migrations against production as a separate deployment step
Deploy the new application code after migrations succeed

Rollback strategy:

# Revert the last applied migration
shaperail migrate --rollback

Rollbacks revert one migration at a time. For safe rollbacks:

Always write backward-compatible migrations (add columns as nullable first, backfill, then add constraints)
Test rollback in staging before applying migrations to production
Keep the previous application version ready to redeploy if a migration causes issues

Connection string format

postgresql://username:password@hostname:5432/database_name?sslmode=require

Always use sslmode=require (or sslmode=verify-full) for production database connections.

4. Redis setup

Cache configuration

cache:
  type: redis
  url: ${REDIS_URL}

The REDIS_URL should include authentication for production:

redis://:your-password@redis.example.com:6379/0

Cache sizing

Shaperail uses Redis for endpoint-level caching (configured per-endpoint with cache: { ttl: 60 }) and background job queues.

Sizing guidelines:

Workload	Recommended Redis memory
Light caching, few jobs	256 MB
Moderate caching, active job queue	1-2 GB
Heavy caching with large payloads	4+ GB

Set a maxmemory policy on your Redis instance to prevent unbounded growth:

# redis.conf or managed Redis parameter group
maxmemory 1gb
maxmemory-policy allkeys-lru

Connection pool

The Redis connection pool is managed by deadpool-redis. For most deployments, the default pool size is sufficient. If you see connection timeouts under high load, increase the Redis instance’s maxclients setting and ensure your deployment has enough file descriptors.

5. Health checks

Shaperail registers two health endpoints automatically.

`GET /health` – liveness probe

Returns 200 OK if the process is running. Does not check dependencies.

{ "status": "ok" }

Use this for liveness probes. If this endpoint stops responding, the process is hung or dead and should be restarted.

`GET /health/ready` – readiness probe

Checks database and Redis connectivity. Returns 200 OK when all backends are reachable, 503 Service Unavailable when any check fails.

{
  "status": "ok",
  "checks": {
    "database": { "status": "ok" },
    "redis": { "status": "ok" }
  }
}

Degraded response (503):

{
  "status": "degraded",
  "checks": {
    "database": { "status": "ok" },
    "redis": { "status": "error", "message": "Redis PING failed: ..." }
  }
}

Use this for readiness probes. Only route traffic to instances where /health/ready returns 200.

6. Kubernetes deployment

Namespace and Secret

apiVersion: v1
kind: Namespace
metadata:
  name: my-app
---
apiVersion: v1
kind: Secret
metadata:
  name: my-app-secrets
  namespace: my-app
type: Opaque
stringData:
  DATABASE_URL: "postgresql://user:password@db.example.com:5432/myapp?sslmode=require"
  REDIS_URL: "redis://:password@redis.example.com:6379/0"
  JWT_SECRET: "your-production-jwt-secret-at-least-32-characters"
  WEBHOOK_SECRET: "your-webhook-signing-secret"

ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-app-config
  namespace: my-app
data:
  RUST_LOG: "info"
  OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring:4317"
  OTEL_SERVICE_NAME: "my-app"
  SHAPERAIL_SLOW_QUERY_MS: "100"

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: my-app
  labels:
    app: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  template:
    metadata:
      labels:
        app: my-app
    spec:
      securityContext:
        runAsUser: 10001
        runAsGroup: 10001
        runAsNonRoot: true
      containers:
        - name: my-app
          image: registry.example.com/my-app:v1.0.0
          ports:
            - containerPort: 3000
              name: http
          envFrom:
            - secretRef:
                name: my-app-secrets
            - configMapRef:
                name: my-app-config
          resources:
            requests:
              cpu: "100m"
              memory: "64Mi"
            limits:
              cpu: "1000m"
              memory: "256Mi"
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 2
            periodSeconds: 2
            failureThreshold: 15

Service

apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: my-app
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  type: ClusterIP

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
  namespace: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Ingress (optional)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  namespace: my-app
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
    - hosts:
        - api.example.com
      secretName: api-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

7. Docker Compose for staging

Use this Compose file for staging environments that mirror production topology:

# docker-compose.staging.yml
services:
  app:
    image: registry.example.com/my-app:latest
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: "postgresql://myapp:secret@postgres:5432/myapp?sslmode=disable"
      REDIS_URL: "redis://redis:6379/0"
      JWT_SECRET: "staging-jwt-secret-change-in-production"
      RUST_LOG: "info"
      OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
      OTEL_SERVICE_NAME: "my-app-staging"
      SHAPERAIL_SLOW_QUERY_MS: "50"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health/ready"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: myapp
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: myapp
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U myapp"]
      interval: 5s
      timeout: 3s
      retries: 5

  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    ports:
      - "4317:4317"

volumes:
  pgdata:
  redisdata:

Supporting Prometheus config for the staging stack:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: shaperail
    static_configs:
      - targets: ["app:3000"]

8. Logging in production

JSON log format

Shaperail outputs structured JSON logs by default. Every log line includes a request_id for correlation.

{"timestamp":"2026-03-17T12:00:00Z","level":"INFO","request_id":"abc-123","target":"shaperail_runtime::handlers","message":"GET /v1/users 200 12ms"}

Log level configuration

Set log levels via RUST_LOG:

# Production default -- info level
RUST_LOG=info

# Quieter -- warnings and errors only
RUST_LOG=warn

# Debug a specific module without flooding logs
RUST_LOG=info,shaperail_runtime::handlers=debug

# Trace-level for deep debugging (not recommended in production)
RUST_LOG=trace

PII redaction

Fields marked sensitive: true in resource schemas are automatically replaced with "[REDACTED]" in all log output. Mark email, password, SSN, and similar fields as sensitive:

schema:
  email:    { type: string, format: email, sensitive: true }
  password: { type: string, sensitive: true }

Slow query logging

Enable slow query warnings to catch performance regressions:

SHAPERAIL_SLOW_QUERY_MS=100

This logs a warning for any database query exceeding 100ms.

Shipping logs to external systems

Since logs are JSON to stdout, pipe them to any log aggregator:

Docker / Kubernetes (stdout is captured automatically):

ELK: Use Filebeat or Fluentd DaemonSet to ship container logs
Datadog: Install the Datadog Agent; it tails container stdout
Grafana Loki: Use Promtail DaemonSet

Systemd (bare metal):

# Forward to a file for Filebeat
ExecStart=/usr/local/bin/my-app
StandardOutput=append:/var/log/my-app/app.log

Fluentd example config for Kubernetes:

<source>
  @type tail
  path /var/log/containers/my-app-*.log
  pos_file /var/log/fluentd-my-app.pos
  tag my-app
  <parse>
    @type json
  </parse>
</source>

<match my-app>
  @type elasticsearch
  host elasticsearch.monitoring
  port 9200
  index_name my-app-logs
</match>

9. Monitoring

Prometheus metrics endpoint

Shaperail exposes Prometheus metrics at GET /metrics. Key metrics:

Metric	Type	Description	Alert threshold
`shaperail_http_requests_total`	counter	Total HTTP requests by method, path, status	Rate of 5xx > 1% of total
`shaperail_http_request_duration_seconds`	histogram	Request latency	P99 > 500ms
`shaperail_db_pool_size`	gauge	Active DB connections	Approaching `pool_size` limit
`shaperail_cache_total`	counter	Cache hits and misses	Hit rate < 80%
`shaperail_job_queue_depth`	gauge	Pending background jobs	Sustained > 1000
`shaperail_errors_total`	counter	Errors by type	Spike above baseline

Prometheus scrape config

# prometheus.yml
scrape_configs:
  - job_name: shaperail
    scrape_interval: 15s
    static_configs:
      - targets: ["my-app.default.svc.cluster.local:3000"]
    # For Kubernetes service discovery:
    # kubernetes_sd_configs:
    #   - role: pod
    # relabel_configs:
    #   - source_labels: [__meta_kubernetes_pod_label_app]
    #     regex: my-app
    #     action: keep

Recommended alerting rules

# prometheus-alerts.yml
groups:
  - name: shaperail
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(shaperail_http_requests_total{status=~"5.."}[5m]))
          / sum(rate(shaperail_http_requests_total[5m])) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate exceeds 1% for 5 minutes"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.99, rate(shaperail_http_request_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency exceeds 500ms"

      - alert: DBPoolExhaustion
        expr: shaperail_db_pool_size > 18
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Database pool nearing capacity"

      - alert: JobQueueBacklog
        expr: shaperail_job_queue_depth > 1000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Background job queue backlog exceeding 1000"

OpenTelemetry distributed tracing

Enable trace export by setting the OTLP endpoint:

OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_SERVICE_NAME=my-app

Shaperail creates spans for HTTP requests, database queries, cache operations, and job execution. Point the exporter at any OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, Datadog).

Grafana dashboard tips

Build dashboards around these panels:

Request rate – rate(shaperail_http_requests_total[5m]) broken down by status code
Latency heatmap – shaperail_http_request_duration_seconds_bucket as a heatmap
Error rate – 5xx requests as a percentage of total
DB pool usage – shaperail_db_pool_size over time
Cache hit ratio – rate(shaperail_cache_total{result="hit"}[5m]) / rate(shaperail_cache_total[5m])
Job queue depth – shaperail_job_queue_depth as a time series
Slow queries – Count of slow query log lines (from your log aggregator)

10. Zero-downtime deployments

Rolling updates

The Kubernetes Deployment above uses maxUnavailable: 0 and maxSurge: 1. This means:

A new pod starts and must pass its readiness probe before receiving traffic
Old pods continue serving until new pods are ready
At no point are there fewer than the desired number of ready pods

Migration ordering

Deploy database migrations before new application code:

Check in the migration SQL first (generated initial create files or manual follow-up SQL)
Run shaperail migrate to apply pending files
Deploy new application pods (rolling update)
Verify health checks pass on new pods
Old pods drain and terminate

Write migrations to be backward-compatible with the currently running code:

Adding a column: Make it nullable or provide a default. The old code ignores it.
Removing a column: First deploy code that stops reading it, then drop the column in a later migration.
Renaming a column: Add the new column, backfill, deploy code using the new name, then drop the old column.

Graceful shutdown

When Shaperail receives a SIGTERM signal (sent by Kubernetes before pod termination), it:

Stops accepting new connections
Finishes in-flight requests
Flushes pending OpenTelemetry spans
Closes database and Redis connections
Exits cleanly

Set the Kubernetes terminationGracePeriodSeconds to allow enough time for in-flight requests to complete:

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 30

Pre-stop hook (optional)

If your load balancer needs time to deregister the pod, add a pre-stop delay:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

This gives the load balancer 5 seconds to stop sending new traffic before the application starts its shutdown sequence.

11. Production checklist

Go through each item before launching to production.

Security

JWT_SECRET is set via a secret manager (not in source control, not in .env)
DATABASE_URL uses sslmode=require or sslmode=verify-full
REDIS_URL uses authentication (password in the URL)
WEBHOOK_SECRET is set for outbound webhook signing
Sensitive schema fields are marked sensitive: true for PII redaction
The Docker image runs as non-root user (UID 10001, set by default)
The OpenAPI spec has been reviewed before exposing publicly (shaperail export openapi)

Database

pool_size is tuned for replica count (total connections < Postgres max_connections)
Migrations have been tested in staging
Rollback procedure has been verified (shaperail migrate --rollback)
Backups are configured and tested
Indexes defined in resource files cover production query patterns

Infrastructure

Liveness probe points at /health
Readiness probe points at /health/ready
Resource limits (CPU, memory) are set on containers
HPA is configured for auto-scaling
terminationGracePeriodSeconds is set (30s recommended)
Container image is pinned to a specific version tag (not latest)

Observability

RUST_LOG is set to info (not debug or trace)
SHAPERAIL_SLOW_QUERY_MS is set (100ms recommended)
Prometheus is scraping /metrics
Alerting rules are configured for error rate, latency, and pool exhaustion
Log aggregation is set up (logs ship from stdout to your log platform)
OTLP tracing is configured if distributed tracing is needed

Deployment pipeline

Migrations run before application deployment
Rolling update strategy uses maxUnavailable: 0
Staging environment mirrors production topology
shaperail validate runs in CI before every deploy
shaperail build --docker produces an image under 25 MB