Autoscale

CPU-based horizontal scaling with asymmetric cooldowns — fast up, slow down.

autoscale {} lets voodu adjust replica count based on mean CPU. Asymmetric cooldown is the key feature — scale up aggressively (don't queue 503s), scale down conservatively (don't collapse capacity during a quiet window).

Source: examples/autoscale/

Two profiles, two shapes

Different workload classes need different knobs. The two examples below show worker tier (queue-driven, latency-tolerant) and HTTP tier (request-driven, latency-sensitive).

Worker tier (sidekiq)

sidekiq-worker.hcl
deployment "prod" "sidekiq" {
  image   = "ghcr.io/acme/api:1.4"
  command = ["bundle", "exec", "sidekiq"]

  env = {
    RAILS_ENV           = "production"
    RAILS_LOG_TO_STDOUT = "1"
  }

  env_from = ["prod/shared"]

  autoscale {
    min = 2
    max = 20

    cpu_target = 70

    cooldown_up   = "15s"
    cooldown_down = "2m"
  }
}

Why these knobs:

KnobValueRationale
min = 2Two baseline workersSingle-host hiccup doesn't drain the queue to zero throughput. Revenue-touching jobs (payments, notifications) need ≥ 2.
max = 20Hard ceiling20 sidekiq × concurrency 10 = 200 in-flight jobs. Lower if DB is the bottleneck.
cpu_target = 70Higher than HTTPWorkers don't have a latency SLA. 200ms wait in the queue is fine.
cooldown_up = "15s"AggressiveQueue depth changes fast. A backed-up queue costs real money.
cooldown_down = "2m"Tighter than defaultIdle workers are cheap to undo (no traffic = no 503 risk).

HTTP tier (web API)

Different shape — web tiers care about P95/P99 latency, not average throughput.

web-api-burst.hcl
deployment "prod" "api" {
  image = "ghcr.io/acme/api:1.4"
  ports = ["3000"]

  env = {
    RAILS_ENV           = "production"
    RAILS_LOG_TO_STDOUT = "1"
  }

  env_from = ["prod/shared"]

  autoscale {
    min = 3
    max = 15

    cpu_target = 60

    # cooldown_up omitted — 30s default is correct
    cooldown_down = "10m"
  }
}

ingress "prod" "api" {
  service = "api"
  host    = "api.example.com"

  tls {
    email = "ops@example.com"
  }
}

Why these knobs differ from the worker:

KnobValueWhy different from worker
min = 3Higher floorWeb needs quorum during single-replica restarts (rolling deploys, OOM). Drop to 2, lose one to restart, you're down to 1.
cpu_target = 60Lower targetHTTP latency degrades non-linearly with CPU saturation. At 85% CPU, P99 is already in the seconds. Headroom per replica is the point.
cooldown_down = "10m"Way longerHTTP bursts come in clusters: campaign drives a 5-minute burst, 90-second lull, then mobile clients retry. Collapse capacity during the lull → 503s on the next wave.

Decision band

For both profiles, voodu's autoscaler uses hysteresis to avoid flap:

                  scale up
        ┌──────────────────────────►

        │   target × 1.1                    target × 0.7
   ─────┼────────|────────|────────|────────|─────────►
            scale up    target    scale down

        ◄──────────────────────────┐
                  scale down
  • Mean CPU > target × 1.1 AND replicas < max AND last scale-up > cooldown_up ago → +1 replica
  • Mean CPU < target × 0.7 AND replicas > min AND last scale-down > cooldown_down ago → −1 replica
  • Anywhere in the deadband → hold

Tick rate is 15 seconds. CPU sampled per-replica via Docker stats, averaged.

Mutex with replicas

You can't declare both. Apply rejects:

deployment "prod" "api" {
  replicas = 3        # ❌
  autoscale { ... }   # ❌
}

To switch from fixed to autoscaled, just remove replicas = and add autoscale {}. To pin, do the inverse.

What it doesn't do

  • No memory-based scaling — CPU only. Memory-bound workloads need a different approach.
  • No request-rate scaling — voodu doesn't see request metrics. Wire Prometheus + a custom controller for that shape.
  • No scale-to-zeromin = 1 is the lowest. Cold-start latency from zero would need a separate dispatcher layer.
  • No cross-host scheduling — single-host. Multi-host needs you to add the host as a separate remote and loop apply.

Apply

voodu apply -f voodu.hcl
voodu describe deployment prod/api -r prod   # see current replica count + autoscale state

On this page