Performance tuning
Shaperail is built on Actix-web and Tokio, which provide a high-performance async foundation. This guide covers the framework’s performance targets, how to measure your app against them, and practical tuning strategies.
Performance targets
These targets are mandated by the Shaperail PRD and must pass benchmarks before any release:
| Metric | Target | How to measure |
|---|---|---|
| Simple JSON response | 150,000+ req/s | cargo bench -p shaperail-runtime |
| DB read (cached) | 80,000+ req/s, P99 < 2ms | Bench with Redis running |
| DB write | 20,000+ req/s, P99 < 10ms | Bench with Postgres running |
| Idle memory | <= 60 MB | ps -o rss on a running instance |
| Release binary size | < 20 MB | ls -lh target/release/shaperail |
| Cold start | < 100ms | Time from process start to first /health 200 |
If your app falls short of these numbers, the sections below cover the most common causes and fixes.
Database optimization
Connection pool sizing
The pool_size setting in shaperail.config.yaml controls the maximum number of connections in the sqlx pool:
databases:
default:
engine: postgres
url: ${DATABASE_URL:postgresql://localhost/my_db}
pool_size: 20
Guidelines for sizing:
- Start with
pool_size: 20(the default). This handles most workloads. - CPU-bound apps – keep the pool close to the number of CPU cores. Extra connections sit idle and waste Postgres memory.
- IO-bound apps (many concurrent slow queries) – increase to 2-4x the core count, but never exceed
max_connectionson your Postgres server. - Multi-database setups – each named database in
databases:has its own pool. Size each independently based on its query load.
A pool that is too large wastes Postgres memory (~10 MB per connection). A pool that is too small causes requests to queue waiting for a free connection.
Indexes
Declare indexes in the resource YAML for new resources, and mirror them in manual follow-up SQL when you add indexes to an existing table:
indexes:
- fields: [org_id, role]
- fields: [created_at], order: desc
When to add indexes:
- Filter fields – every field listed in
filters:on an endpoint should be indexed, either individually or as part of a composite index. - Search fields – fields in
search:often need database-specific tuning. Shaperail uses PostgreSQL full-text search clauses, but it does not auto-generate GIN/trigram indexes for you. - Sort fields – if you sort by
created_atdescending, an index withorder: descavoids a sequential scan plus sort. - Foreign keys – any
ref:field (e.g.,org_id: { type: uuid, ref: organizations.id }) should be indexed if you filter or join on it frequently. Declare those indexes explicitly underindexes:.
When NOT to add indexes:
- Tables with fewer than a few thousand rows. Postgres will seq-scan them regardless.
- Write-heavy tables where every insert must update many indexes. Each index adds overhead to writes.
Query analysis
Enable slow query logging to find expensive queries:
SHAPERAIL_SLOW_QUERY_MS=50 shaperail serve
Any query exceeding 50ms will produce a warning in the log output with the full SQL statement. Use EXPLAIN ANALYZE in psql to inspect the query plan:
EXPLAIN ANALYZE SELECT * FROM users WHERE org_id = '...' AND role = 'admin';
Look for:
- Seq Scan on large tables – add an index on the filtered columns.
- Sort with high cost – add an index with the correct
order. - Nested Loop with many rows – check that join columns are indexed.
Caching strategies
TTL tuning
The cache: { ttl: N } value on GET endpoints controls how long responses stay in Redis:
endpoints:
list:
method: GET
path: /products
cache: { ttl: 300 }
TTL guidelines:
| Data pattern | Suggested TTL | Rationale |
|---|---|---|
| Rarely changes (categories, config) | 300-3600s | Low invalidation rate, high cache hit ratio |
| Changes a few times per hour (listings) | 60-300s | Balance between freshness and hit ratio |
| Changes frequently (dashboards, feeds) | 10-30s | Short enough to feel fresh, still offloads DB |
| User-specific or real-time data | No cache | Omit the cache block entirely |
Invalidation patterns
Shaperail uses cache-aside with auto-invalidation. The flow is:
- GET request arrives. Framework checks Redis for a cached response.
- Cache hit: return the cached response (no DB query).
- Cache miss: query the database, store the result in Redis, return to client.
- Any write (POST/PATCH/DELETE) deletes all cache keys for that resource.
This is the only caching pattern Shaperail supports. There is no write-through or write-behind mode. The design is intentional: one canonical pattern means fewer bugs and predictable invalidation.
For finer control, use invalidate_on to limit which write operations clear the cache:
cache:
ttl: 300
invalidate_on: [create, delete]
With this configuration, PATCH (update) operations do not invalidate the cache. Use this when updates are frequent but the cached list view does not need to reflect every change immediately.
Monitoring cache effectiveness
Check the shaperail_cache_total Prometheus metric at GET /metrics:
shaperail_cache_total{result="hit"} 12450
shaperail_cache_total{result="miss"} 830
A healthy cache hit ratio for list endpoints is above 80%. If the miss rate is high, either the TTL is too short or writes are invalidating too aggressively.
Pagination
Shaperail supports two pagination strategies: cursor and offset.
Cursor pagination
pagination: cursor
Performance characteristics:
- Constant time regardless of page depth. Page 1 and page 1000 have the same query cost.
- Uses an indexed column (typically
created_at+id) as the cursor. - Best for infinite scroll, feeds, and any endpoint where users page through large datasets.
Offset pagination
pagination: offset
Performance characteristics:
- Linear degradation with depth.
OFFSET 10000requires Postgres to scan and discard 10,000 rows before returning the page. - Simpler for clients that need “go to page N” behavior.
- Acceptable for small datasets (under ~10,000 rows) or when users rarely go past page 5.
When to use which
| Use case | Recommendation |
|---|---|
| API consumed by mobile/SPA with infinite scroll | cursor |
| Admin dashboard with “page 1 of 50” UI | offset (if dataset is small) |
| Public API with unknown access patterns | cursor (safest default) |
| Reports or exports | cursor (datasets are often large) |
If in doubt, use cursor. It performs well in all cases.
Worker count tuning
The workers setting controls the number of Actix-web worker threads:
workers: auto
Auto mode (default)
auto sets the worker count to the number of logical CPU cores. This is correct for most workloads because Shaperail handlers are async and non-blocking.
Fixed worker count
Set a fixed number when you need predictable resource usage:
workers: 4
Guidelines:
- CPU-bound workloads (heavy JSON serialization, complex validation) – match the core count. More workers than cores causes contention.
- IO-bound workloads (most CRUD apps waiting on Postgres/Redis) – the default
autois optimal. Tokio handles thousands of concurrent connections per worker thread. - Memory-constrained environments (small containers) – reduce workers to lower memory usage. Each worker thread adds ~5-10 MB.
Do not set workers higher than your CPU core count unless you have measured a specific benefit. Extra threads add scheduling overhead without improving throughput for async workloads.
Benchmarking
Running benchmarks
Shaperail includes Criterion benchmarks in the runtime crate:
cargo bench -p shaperail-runtime
This runs without a database or Redis connection. The benchmarks measure raw handler throughput, serialization speed, and routing overhead.
Results are written to target/criterion/ and include HTML reports with statistical analysis.
Interpreting results
Criterion reports look like this:
simple_json_response time: [6.21 us 6.28 us 6.35 us]
thrpt: [157.48 Kreq/s 159.24 Kreq/s 161.03 Kreq/s]
Key values:
- time – the [lower bound, estimate, upper bound] for a single request.
- thrpt – throughput in thousands of requests per second. This is the inverse of time.
Compare against the targets:
| Benchmark | Target | What to check if below target |
|---|---|---|
simple_json_response | 150K req/s | Check that you built with --release |
cached_db_read | 80K req/s | Check Redis connectivity and pool size |
db_write | 20K req/s | Check Postgres pool size and index overhead |
Load testing a running server
For end-to-end benchmarks with a live database, use a tool like wrk or oha:
# Start the server in release mode
cargo run --release -p shaperail-cli -- serve
# In another terminal, run a load test
wrk -t4 -c100 -d30s http://localhost:3000/v1/health
For endpoint-specific tests:
wrk -t4 -c100 -d30s -H "Authorization: Bearer <token>" \
http://localhost:3000/v1/users
Common performance antipatterns
1. Missing indexes on filter columns
Symptom: list endpoints slow down as the table grows.
Fix: add an index for every field used in filters: or search::
indexes:
- fields: [org_id, role]
2. Offset pagination on large tables
Symptom: deep pages (page 50+) take several seconds.
Fix: switch to cursor pagination:
pagination: cursor
3. Cache TTL too short
Symptom: high cache miss rate, database load not reduced.
Fix: increase the TTL. A 5-second TTL on a list endpoint that changes hourly wastes Redis operations without meaningful freshness gain.
4. Oversized connection pool
Symptom: Postgres memory usage climbs; idle in transaction connections accumulate.
Fix: reduce pool_size to match your actual concurrency. Start at 20 and increase only if you see connection wait times in the metrics.
5. No cache on frequently-read endpoints
Symptom: database handles the same query thousands of times per minute.
Fix: add a cache block to GET endpoints that serve repeated queries:
cache: { ttl: 60 }
6. Too many workers on a small container
Symptom: high memory usage, threads competing for CPU.
Fix: set workers to match the container’s CPU limit:
workers: 2
7. Not building in release mode
Symptom: benchmark numbers are 5-10x below targets.
Fix: always benchmark and deploy with release builds:
cargo build --release --workspace
cargo bench -p shaperail-runtime
Debug builds disable all compiler optimizations and are not representative of production performance.