Production Deployment Guide
This guide walks through every step of deploying a Shaperail application to production, from building the release binary to monitoring a running cluster.
1. Building for production
Shaperail provides two build paths: a native release binary and a Docker image.
Native release binary
shaperail build
This runs cargo build --release and produces an optimized binary in target/release/<project-name>. Use this when you deploy to bare-metal servers or have your own container build pipeline.
Docker image
shaperail build --docker
This generates a multi-stage Dockerfile and builds a Docker image tagged with your project name. The generated Dockerfile uses:
- Builder stage –
rust:1.85-slimwith musl toolchain for static linking - Runtime stage –
FROM scratch(no OS layer)
The result is a statically linked binary on a minimal image. The target is x86_64-unknown-linux-musl.
# Generated by: shaperail build --docker
FROM rust:1.85-slim AS builder
RUN apt-get update && apt-get install -y musl-tools pkg-config ca-certificates \
&& rm -rf /var/lib/apt/lists/*
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /app
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/my-app /my-app
USER 10001:10001
EXPOSE 3000
ENTRYPOINT ["/my-app"]
Size targets:
| Metric | Target |
|---|---|
| Final Docker image | < 25 MB |
| Release binary | < 20 MB |
| Idle memory at runtime | < 60 MB |
The scratch base image contains no shell, no package manager, and no libc – the attack surface is minimal. CA certificates are copied from the builder stage so outbound TLS (database connections, webhook delivery) works correctly.
2. Environment configuration
Shaperail reads configuration from two sources:
shaperail.config.yaml– project configuration (checked into source control)- Environment variables – secrets and environment-specific overrides
Required environment variables
| Variable | Example | Description |
|---|---|---|
DATABASE_URL | postgresql://user:pass@db.example.com:5432/myapp | Primary database connection string |
REDIS_URL | redis://:password@redis.example.com:6379 | Redis connection string |
JWT_SECRET | a-long-random-string-at-least-32-chars | HMAC signing secret for JWT tokens |
Optional environment variables
| Variable | Default | Description |
|---|---|---|
SHAPERAIL_PORT | 3000 | Process-level override for the HTTP server port |
RUST_LOG | info | Log level filter (e.g., warn, info,shaperail_runtime=debug) |
OTEL_EXPORTER_OTLP_ENDPOINT | unset | OTLP gRPC endpoint; unset disables tracing |
OTEL_SERVICE_NAME | shaperail | Service name in distributed traces |
SHAPERAIL_SLOW_QUERY_MS | unset | Log warnings for queries exceeding this threshold (ms) |
WEBHOOK_SECRET | unset | HMAC secret for outbound webhook signatures |
.env files vs environment variables
In development, Shaperail reads .env from the project root. In production, do not ship a .env file. Instead, inject variables through your platform:
# Kubernetes: use Secrets (see section 7)
# Docker Compose: use environment: block or env_file:
# Systemd: use Environment= or EnvironmentFile=
# Cloud Run / ECS: use the platform's secret manager integration
Config file with environment interpolation
Use ${VAR} and ${VAR:default} in shaperail.config.yaml to reference environment variables without hardcoding secrets:
project: my-app
port: ${SHAPERAIL_PORT:3000}
workers: auto
databases:
default:
engine: postgres
url: ${DATABASE_URL}
pool_size: ${DB_POOL_SIZE:20}
cache:
type: redis
url: ${REDIS_URL}
auth:
provider: jwt
secret_env: JWT_SECRET
expiry: 24h
refresh_expiry: 30d
logging:
level: info
format: json
If a referenced variable has no default and is unset, the parser halts with an error naming the missing variable. This is intentional – production apps should never start with missing configuration.
3. Database setup
Connection pooling
Shaperail uses sqlx connection pools. The pool_size setting controls the maximum number of simultaneous connections to PostgreSQL.
Tuning guidelines:
| Deployment size | Recommended pool_size | Notes |
|---|---|---|
| Single instance | 10-20 | Default of 20 is fine for most workloads |
| 2-5 replicas | 10 per replica | Total connections = replicas x pool_size; keep under Postgres max_connections |
| 5+ replicas | 5-10 per replica | Use PgBouncer in front of Postgres to multiplex connections |
Key rule: The total connections across all replicas must stay below your PostgreSQL max_connections setting (default: 100). Leave headroom for migrations, monitoring, and admin connections.
# For a 4-replica deployment with Postgres max_connections=100
databases:
default:
engine: postgres
url: ${DATABASE_URL}
pool_size: 20 # 4 replicas x 20 = 80 connections, leaving 20 for admin
For multi-database setups, configure each connection independently:
databases:
default:
engine: postgres
url: ${DATABASE_URL}
pool_size: 15
analytics:
engine: postgres
url: ${ANALYTICS_DATABASE_URL}
pool_size: 5
Migrations in production
Run migrations before deploying new application code:
# Apply pending SQL migrations (uses sqlx-cli under the hood)
DATABASE_URL=postgresql://user:pass@db.example.com:5432/myapp shaperail migrate
Migration workflow for production:
- Keep reviewed SQL files in
migrations/under source control - For brand-new resources,
shaperail migratecan generate the missing initialcreate_<resource>files - For later schema changes, write the follow-up SQL migration files manually
- Apply pending migrations against production as a separate deployment step
- Deploy the new application code after migrations succeed
Rollback strategy:
# Revert the last applied migration
shaperail migrate --rollback
Rollbacks revert one migration at a time. For safe rollbacks:
- Always write backward-compatible migrations (add columns as nullable first, backfill, then add constraints)
- Test rollback in staging before applying migrations to production
- Keep the previous application version ready to redeploy if a migration causes issues
Connection string format
postgresql://username:password@hostname:5432/database_name?sslmode=require
Always use sslmode=require (or sslmode=verify-full) for production database connections.
4. Redis setup
Cache configuration
cache:
type: redis
url: ${REDIS_URL}
The REDIS_URL should include authentication for production:
redis://:your-password@redis.example.com:6379/0
Cache sizing
Shaperail uses Redis for endpoint-level caching (configured per-endpoint with cache: { ttl: 60 }) and background job queues.
Sizing guidelines:
| Workload | Recommended Redis memory |
|---|---|
| Light caching, few jobs | 256 MB |
| Moderate caching, active job queue | 1-2 GB |
| Heavy caching with large payloads | 4+ GB |
Set a maxmemory policy on your Redis instance to prevent unbounded growth:
# redis.conf or managed Redis parameter group
maxmemory 1gb
maxmemory-policy allkeys-lru
Connection pool
The Redis connection pool is managed by deadpool-redis. For most deployments, the default pool size is sufficient. If you see connection timeouts under high load, increase the Redis instance’s maxclients setting and ensure your deployment has enough file descriptors.
5. Health checks
Shaperail registers two health endpoints automatically.
GET /health – liveness probe
Returns 200 OK if the process is running. Does not check dependencies.
{ "status": "ok" }
Use this for liveness probes. If this endpoint stops responding, the process is hung or dead and should be restarted.
GET /health/ready – readiness probe
Checks database and Redis connectivity. Returns 200 OK when all backends are reachable, 503 Service Unavailable when any check fails.
{
"status": "ok",
"checks": {
"database": { "status": "ok" },
"redis": { "status": "ok" }
}
}
Degraded response (503):
{
"status": "degraded",
"checks": {
"database": { "status": "ok" },
"redis": { "status": "error", "message": "Redis PING failed: ..." }
}
}
Use this for readiness probes. Only route traffic to instances where /health/ready returns 200.
6. Kubernetes deployment
Namespace and Secret
apiVersion: v1
kind: Namespace
metadata:
name: my-app
---
apiVersion: v1
kind: Secret
metadata:
name: my-app-secrets
namespace: my-app
type: Opaque
stringData:
DATABASE_URL: "postgresql://user:password@db.example.com:5432/myapp?sslmode=require"
REDIS_URL: "redis://:password@redis.example.com:6379/0"
JWT_SECRET: "your-production-jwt-secret-at-least-32-characters"
WEBHOOK_SECRET: "your-webhook-signing-secret"
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-config
namespace: my-app
data:
RUST_LOG: "info"
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring:4317"
OTEL_SERVICE_NAME: "my-app"
SHAPERAIL_SLOW_QUERY_MS: "100"
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-app
labels:
app: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: my-app
spec:
securityContext:
runAsUser: 10001
runAsGroup: 10001
runAsNonRoot: true
containers:
- name: my-app
image: registry.example.com/my-app:v1.0.0
ports:
- containerPort: 3000
name: http
envFrom:
- secretRef:
name: my-app-secrets
- configMapRef:
name: my-app-config
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "1000m"
memory: "256Mi"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 2
periodSeconds: 2
failureThreshold: 15
Service
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: my-app
spec:
selector:
app: my-app
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
type: ClusterIP
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app
namespace: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Ingress (optional)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
namespace: my-app
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- api.example.com
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80
7. Docker Compose for staging
Use this Compose file for staging environments that mirror production topology:
# docker-compose.staging.yml
services:
app:
image: registry.example.com/my-app:latest
ports:
- "3000:3000"
environment:
DATABASE_URL: "postgresql://myapp:secret@postgres:5432/myapp?sslmode=disable"
REDIS_URL: "redis://redis:6379/0"
JWT_SECRET: "staging-jwt-secret-change-in-production"
RUST_LOG: "info"
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
OTEL_SERVICE_NAME: "my-app-staging"
SHAPERAIL_SLOW_QUERY_MS: "50"
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health/ready"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: myapp
POSTGRES_PASSWORD: secret
POSTGRES_DB: myapp
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myapp"]
interval: 5s
timeout: 3s
retries: 5
redis:
image: redis:7-alpine
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
- redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4317:4317"
volumes:
pgdata:
redisdata:
Supporting Prometheus config for the staging stack:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: shaperail
static_configs:
- targets: ["app:3000"]
8. Logging in production
JSON log format
Shaperail outputs structured JSON logs by default. Every log line includes a request_id for correlation.
{"timestamp":"2026-03-17T12:00:00Z","level":"INFO","request_id":"abc-123","target":"shaperail_runtime::handlers","message":"GET /v1/users 200 12ms"}
Log level configuration
Set log levels via RUST_LOG:
# Production default -- info level
RUST_LOG=info
# Quieter -- warnings and errors only
RUST_LOG=warn
# Debug a specific module without flooding logs
RUST_LOG=info,shaperail_runtime::handlers=debug
# Trace-level for deep debugging (not recommended in production)
RUST_LOG=trace
PII redaction
Fields marked sensitive: true in resource schemas are automatically replaced with "[REDACTED]" in all log output. Mark email, password, SSN, and similar fields as sensitive:
schema:
email: { type: string, format: email, sensitive: true }
password: { type: string, sensitive: true }
Slow query logging
Enable slow query warnings to catch performance regressions:
SHAPERAIL_SLOW_QUERY_MS=100
This logs a warning for any database query exceeding 100ms.
Shipping logs to external systems
Since logs are JSON to stdout, pipe them to any log aggregator:
Docker / Kubernetes (stdout is captured automatically):
- ELK: Use Filebeat or Fluentd DaemonSet to ship container logs
- Datadog: Install the Datadog Agent; it tails container stdout
- Grafana Loki: Use Promtail DaemonSet
Systemd (bare metal):
# Forward to a file for Filebeat
ExecStart=/usr/local/bin/my-app
StandardOutput=append:/var/log/my-app/app.log
Fluentd example config for Kubernetes:
<source>
@type tail
path /var/log/containers/my-app-*.log
pos_file /var/log/fluentd-my-app.pos
tag my-app
<parse>
@type json
</parse>
</source>
<match my-app>
@type elasticsearch
host elasticsearch.monitoring
port 9200
index_name my-app-logs
</match>
9. Monitoring
Prometheus metrics endpoint
Shaperail exposes Prometheus metrics at GET /metrics. Key metrics:
| Metric | Type | Description | Alert threshold |
|---|---|---|---|
shaperail_http_requests_total | counter | Total HTTP requests by method, path, status | Rate of 5xx > 1% of total |
shaperail_http_request_duration_seconds | histogram | Request latency | P99 > 500ms |
shaperail_db_pool_size | gauge | Active DB connections | Approaching pool_size limit |
shaperail_cache_total | counter | Cache hits and misses | Hit rate < 80% |
shaperail_job_queue_depth | gauge | Pending background jobs | Sustained > 1000 |
shaperail_errors_total | counter | Errors by type | Spike above baseline |
Prometheus scrape config
# prometheus.yml
scrape_configs:
- job_name: shaperail
scrape_interval: 15s
static_configs:
- targets: ["my-app.default.svc.cluster.local:3000"]
# For Kubernetes service discovery:
# kubernetes_sd_configs:
# - role: pod
# relabel_configs:
# - source_labels: [__meta_kubernetes_pod_label_app]
# regex: my-app
# action: keep
Recommended alerting rules
# prometheus-alerts.yml
groups:
- name: shaperail
rules:
- alert: HighErrorRate
expr: |
sum(rate(shaperail_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(shaperail_http_requests_total[5m])) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate exceeds 1% for 5 minutes"
- alert: HighLatency
expr: |
histogram_quantile(0.99, rate(shaperail_http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency exceeds 500ms"
- alert: DBPoolExhaustion
expr: shaperail_db_pool_size > 18
for: 2m
labels:
severity: warning
annotations:
summary: "Database pool nearing capacity"
- alert: JobQueueBacklog
expr: shaperail_job_queue_depth > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "Background job queue backlog exceeding 1000"
OpenTelemetry distributed tracing
Enable trace export by setting the OTLP endpoint:
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_SERVICE_NAME=my-app
Shaperail creates spans for HTTP requests, database queries, cache operations, and job execution. Point the exporter at any OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, Datadog).
Grafana dashboard tips
Build dashboards around these panels:
- Request rate –
rate(shaperail_http_requests_total[5m])broken down by status code - Latency heatmap –
shaperail_http_request_duration_seconds_bucketas a heatmap - Error rate – 5xx requests as a percentage of total
- DB pool usage –
shaperail_db_pool_sizeover time - Cache hit ratio –
rate(shaperail_cache_total{result="hit"}[5m]) / rate(shaperail_cache_total[5m]) - Job queue depth –
shaperail_job_queue_depthas a time series - Slow queries – Count of slow query log lines (from your log aggregator)
10. Zero-downtime deployments
Rolling updates
The Kubernetes Deployment above uses maxUnavailable: 0 and maxSurge: 1. This means:
- A new pod starts and must pass its readiness probe before receiving traffic
- Old pods continue serving until new pods are ready
- At no point are there fewer than the desired number of ready pods
Migration ordering
Deploy database migrations before new application code:
1. Check in the migration SQL first (generated initial create files or manual follow-up SQL)
2. Run shaperail migrate to apply pending files
3. Deploy new application pods (rolling update)
4. Verify health checks pass on new pods
5. Old pods drain and terminate
Write migrations to be backward-compatible with the currently running code:
- Adding a column: Make it nullable or provide a default. The old code ignores it.
- Removing a column: First deploy code that stops reading it, then drop the column in a later migration.
- Renaming a column: Add the new column, backfill, deploy code using the new name, then drop the old column.
Graceful shutdown
When Shaperail receives a SIGTERM signal (sent by Kubernetes before pod termination), it:
- Stops accepting new connections
- Finishes in-flight requests
- Flushes pending OpenTelemetry spans
- Closes database and Redis connections
- Exits cleanly
Set the Kubernetes terminationGracePeriodSeconds to allow enough time for in-flight requests to complete:
spec:
template:
spec:
terminationGracePeriodSeconds: 30
Pre-stop hook (optional)
If your load balancer needs time to deregister the pod, add a pre-stop delay:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
This gives the load balancer 5 seconds to stop sending new traffic before the application starts its shutdown sequence.
11. Production checklist
Go through each item before launching to production.
Security
JWT_SECRETis set via a secret manager (not in source control, not in.env)DATABASE_URLusessslmode=requireorsslmode=verify-fullREDIS_URLuses authentication (password in the URL)WEBHOOK_SECRETis set for outbound webhook signing- Sensitive schema fields are marked
sensitive: truefor PII redaction - The Docker image runs as non-root user (UID 10001, set by default)
- The OpenAPI spec has been reviewed before exposing publicly (
shaperail export openapi)
Database
pool_sizeis tuned for replica count (total connections < Postgresmax_connections)- Migrations have been tested in staging
- Rollback procedure has been verified (
shaperail migrate --rollback) - Backups are configured and tested
- Indexes defined in resource files cover production query patterns
Infrastructure
- Liveness probe points at
/health - Readiness probe points at
/health/ready - Resource limits (CPU, memory) are set on containers
- HPA is configured for auto-scaling
terminationGracePeriodSecondsis set (30s recommended)- Container image is pinned to a specific version tag (not
latest)
Observability
RUST_LOGis set toinfo(notdebugortrace)SHAPERAIL_SLOW_QUERY_MSis set (100ms recommended)- Prometheus is scraping
/metrics - Alerting rules are configured for error rate, latency, and pool exhaustion
- Log aggregation is set up (logs ship from stdout to your log platform)
- OTLP tracing is configured if distributed tracing is needed
Deployment pipeline
- Migrations run before application deployment
- Rolling update strategy uses
maxUnavailable: 0 - Staging environment mirrors production topology
shaperail validateruns in CI before every deployshaperail build --dockerproduces an image under 25 MB