If your calls fail for some users but not others, you probably have a TURN strategy problem. Production stability depends on treating relay usage as a capacity and cost signal, not a surprise.

Question

When is STUN enough, and when must I budget TURN relay capacity?

Quick answer

  • STUN helps peers discover reachable public addresses.
  • TURN relays traffic when direct peer paths fail.

STUN is cheaper. TURN is what saves difficult networks.

Production rule

Assume some percentage of sessions will require TURN, especially in enterprise, campus, or strict NAT environments.

If you do not budget for TURN relay load, quality and call success will collapse at peak usage.

STUN vs TURN decision table

Network conditionPrimary pathExpected outcome
Open consumer network with predictable NATSTUN/directLow cost and good call quality
Enterprise firewall or symmetric NATTURN relayHigher reliability with added relay cost
Mobile network with frequent path changesSTUN + TURN fallbackBetter continuity during network churn
Unknown global traffic mixHybrid baseline with TURN headroomFewer surprise failures at peak load

What to monitor

  1. Session success rate by network type.
  2. Share of sessions using relay candidates.
  3. TURN egress bandwidth and cost trend.

If relay usage climbs and you do not track it, surprises hit both reliability and budget.

TURN capacity sanity check

Plan TURN with this back-of-envelope model:

peak_turn_egress ~= concurrent_relay_sessions * avg_bitrate_per_session

Then add headroom for burst and regional imbalance. If you cannot estimate relay share by network segment, your TURN budget is a guess.

10-minute action step

  1. Capture one failing and one successful session log from your signaling layer.
  2. Trace offer/answer, ICE, and candidate events in strict timestamp order.
  3. Mark the first divergence point and tie it to one concrete fix.
  4. Re-test with the same network path and verify behavior is deterministic.

Success signal

You can identify exactly where negotiation broke and prove the same class of failure no longer reproduces.