Simplileap logo

// Case studies

ALB weighted routing for a zero-downtime API cutover

A fintech payments service migrated Node versions behind an Application Load Balancer, 5% canary to full cutover with connection draining and no failed transactions.

By Simplileap · Published May 28, 2025 · 9 min read

A payments orchestration API, Node 16 monolith on EC2, PostgreSQL, Redis rate limits, needed Node 20 and OpenSSL 3 compatibility without a maintenance window. Regulatory constraints required auditable rollback within 15 minutes.

Prior big-bang deploys had caused brief 502 spikes during target registration. Leadership mandated canary validation on real traffic fractions.

Architecture: second Auto Scaling group (green) behind existing ALB; weighted target group routing starting 5% / 95%; health checks on /healthz with dependency probes (DB, Redis); deregistration delay set to 120s for connection draining.

Problems encountered: sticky sessions on an older mobile client pinned some users to blue for days, mitigated by TTL reduction and forced rebalance after 24h; green instances failed health check until we increased grace period for JVM-adjacent native module warmup; idempotency keys on POST /transfer masked duplicate retries during a 30-second routing flap, validated via reconciliation job.

Runbook: promote weights 5 → 25 → 50 → 100 over four hours with automatic rollback if 5xx rate exceeds 0.1% for five minutes; CloudWatch alarms on p99 latency and error budget.

Outcome: full cutover with zero failed transactions in settlement reconciliation; p99 latency improved 18% on green due to Node 20 event loop gains. Described as a regulated fintech API partner, name withheld.

← Back to Case studies

Ready to scope your next initiative?

Share your goals with our Bangalore team. We respond within one business day with a clear path from discovery to delivery.