Scaling WebRTC for Group Video Calling: Mesh vs SFU vs MCU

"How many participants can we support in a single call?"

This question inevitably arises when developing WebRTC applications, and the answer is rarely straightforward. WebRTC's peer-to-peer nature makes it perfect for one-to-one communication, but scaling to multiple participants introduces significant challenges.

I remember the first time I encountered this scaling problem. We had built a video conferencing application that worked beautifully for two people. When we tested it with four participants, it still performed well. But when we tried eight participants, the application crawled to a halt—video froze, audio cut out, and some users couldn't connect at all. Our simple peer-to-peer architecture had hit its limits.

This experience taught me that scaling WebRTC isn't just about adding more connections—it requires fundamentally rethinking your architecture as you grow. The approaches that work for two participants often fail completely at twenty or two hundred.

In this article, we'll explore the different architectures for scaling WebRTC applications, from simple mesh networks to sophisticated media servers. Drawing from my experience building systems that support thousands of concurrent users, I'll share practical strategies for scaling your WebRTC applications while maintaining quality and performance.

Related Guides

Understanding the Scaling Challenge

Before diving into solutions, let's understand why scaling WebRTC is challenging:

The Resource Problem

WebRTC's peer-to-peer model creates resource demands that grow exponentially with the number of participants:

Bandwidth: In a full mesh network, each participant sends their media stream to every other participant. With N participants, each sends N-1 streams and receives N-1 streams.

CPU Usage: Each participant must encode their outgoing stream once for each recipient and decode incoming streams from each participant.

Connection Management: Each peer-to-peer connection requires separate negotiation, ICE candidates, and monitoring.

I once consulted for a company that tried to build a 16-person video conferencing solution using a pure mesh topology. On paper, it seemed feasible. In reality, even high-end computers struggled with encoding and decoding so many simultaneous streams, and most home internet connections couldn't handle the upload bandwidth requirements.

The Practical Limits

Based on my experience, here are the practical limits for different WebRTC architectures:

Full Mesh (Peer-to-Peer): 4-6 participants maximum for video, slightly more for audio-only
Selective Forwarding Unit (SFU): 25-50 participants with video, hundreds with audio
Multipoint Control Unit (MCU): 50-100+ participants with video, depending on server capacity
Hybrid Approaches: Potentially thousands of participants with the right architecture

These aren't hard limits—they depend on many factors including device capabilities, network conditions, and quality expectations. But they provide a useful framework for choosing the right architecture for your needs.

WebRTC Scaling Architectures

Let's explore the main architectural approaches for scaling WebRTC applications:

Mesh Architecture: The Starting Point

In a mesh architecture, every participant connects directly to every other participant:

// Simple implementation of a mesh network
function createMeshNetwork(participants) {
  const connections = {};
  
  // For each pair of participants, create a connection
  for (let i = 0; i < participants.length; i++) {
    for (let j = i + 1; j < participants.length; j++) {
      const participant1 = participants[i];
      const participant2 = participants[j];
      
      // Create unique connection ID
      const connectionId = `${participant1.id}-${participant2.id}`;
      
      // Create and store the connection
      connections[connectionId] = createPeerConnection(participant1, participant2);
    }
  }
  
  return connections;
}

function createPeerConnection(participant1, participant2) {
  // Create RTCPeerConnection
  const peerConnection = new RTCPeerConnection(configuration);
  
  // Add participant1's tracks to the connection
  participant1.stream.getTracks().forEach(track => {
    peerConnection.addTrack(track, participant1.stream);
  });
  
  // Set up event handlers for receiving participant2's tracks
  peerConnection.ontrack = event => {
    participant1.receiveTrack(event.track, participant2.id);
  };
  
  // Handle signaling between participants
  // (simplified - in reality, this would go through your signaling server)
  peerConnection.onicecandidate = event => {
    if (event.candidate) {
      sendSignalingMessage(participant2.id, {
        type: 'candidate',
        candidate: event.candidate
      });
    }
  };
  
  return peerConnection;
}

Advantages of Mesh Architecture:

Simplicity: No server-side media handling required
Low Latency: Direct connections minimize delay
Privacy: Media flows directly between participants without a central server

Disadvantages of Mesh Architecture:

Poor Scalability: Resource requirements grow exponentially with participants
Device Limitations: Mobile devices quickly become overwhelmed
Network Constraints: Most home internet connections have limited upload bandwidth

Despite these limitations, mesh architecture is still appropriate for small group calls (2-4 participants) where simplicity and low operational costs are priorities.

Selective Forwarding Unit (SFU): The Scalable Middle Ground

An SFU acts as a router for WebRTC streams. Each participant sends their media stream once to the SFU, which then forwards it to other participants without decoding or re-encoding:

// Client-side code for connecting to an SFU
function connectToSFU(sfuUrl, localStream, roomId) {
  // Create connection to the SFU
  const peerConnection = new RTCPeerConnection(configuration);
  
  // Add local tracks to the connection
  localStream.getTracks().forEach(track => {
    peerConnection.addTrack(track, localStream);
  });
  
  // Set up handlers for remote tracks from the SFU
  peerConnection.ontrack = event => {
    // The SFU will send streams from other participants
    // We need to identify which participant each track belongs to
    const participantId = event.streams[0].id;
    displayRemoteStream(event.streams[0], participantId);
  };
  
  // Create data channel for control messages
  const controlChannel = peerConnection.createDataChannel('control');
  controlChannel.onopen = () => {
    // Join the room
    controlChannel.send(JSON.stringify({
      type: 'join',
      roomId: roomId
    }));
  };
  
  // Handle incoming control messages
  controlChannel.onmessage = event => {
    const message = JSON.parse(event.data);
    handleControlMessage(message);
  };
  
  // Establish connection to the SFU
  peerConnection.createOffer()
    .then(offer => peerConnection.setLocalDescription(offer))
    .then(() => {
      // Send the offer to the SFU via your signaling mechanism
      return fetch(`${sfuUrl}/join`, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          sdp: peerConnection.localDescription,
          roomId: roomId
        })
      });
    })
    .then(response => response.json())
    .then(answer => {
      return peerConnection.setRemoteDescription(
        new RTCSessionDescription(answer)
      );
    });
  
  return peerConnection;
}

Advantages of SFU Architecture:

Better Scalability: Each participant sends only one outgoing stream
Reduced Client Load: Clients receive only the streams they need
Bandwidth Efficiency: Upload bandwidth requirements remain constant regardless of participant count

Disadvantages of SFU Architecture:

Server Requirements: Requires deploying and maintaining SFU servers
Increased Latency: Adding a server hop increases delay slightly
Server Bandwidth: SFU servers need substantial bandwidth

I've found SFU architecture to be the sweet spot for most multi-party WebRTC applications. It scales well without the complexity of an MCU, and the slight increase in latency is rarely noticeable to users.

Multipoint Control Unit (MCU): Maximum Control

An MCU takes scaling a step further by not only routing media but also processing it. The MCU receives streams from all participants, decodes them, combines them (often into a single composite view), re-encodes the result, and sends it to participants:

// Client-side code for connecting to an MCU is similar to SFU
// The main difference is in how streams are handled

function connectToMCU(mcuUrl, localStream, roomId) {
  const peerConnection = new RTCPeerConnection(configuration);
  
  // Add local tracks
  localStream.getTracks().forEach(track => {
    peerConnection.addTrack(track, localStream);
  });
  
  // With an MCU, we typically receive just one stream
  // containing a composite view of all participants
  peerConnection.ontrack = event => {
    // Display the composite stream
    displayCompositeStream(event.streams[0]);
  };
  
  // Control channel for layout changes, etc.
  const controlChannel = peerConnection.createDataChannel('control');
  controlChannel.onopen = () => {
    // Join the room
    controlChannel.send(JSON.stringify({
      type: 'join',
      roomId: roomId
    }));
  };
  
  // We can send layout preferences to the MCU
  function changeLayout(layout) {
    controlChannel.send(JSON.stringify({
      type: 'layout',
      layout: layout // e.g., 'grid', 'spotlight', 'presentation'
    }));
  }
  
  // Establish connection similar to SFU example
  // ...
  
  return {
    peerConnection,
    changeLayout
  };
}

Advantages of MCU Architecture:

Maximum Scalability: Clients receive only one stream regardless of participant count
Consistent Quality: The MCU can adapt the composite stream to each client's capabilities
Layout Control: The server can control how participants are displayed

Disadvantages of MCU Architecture:

High Server Requirements: Decoding and encoding streams is CPU-intensive
Higher Latency: Processing media adds delay
Complexity and Cost: MCUs are more complex and expensive to deploy and maintain

MCU architecture shines in scenarios with large numbers of participants or where you need precise control over the user experience. For example, I worked on a virtual event platform that used an MCU to create a "TV-like" experience with professional transitions and layouts, something that wouldn't be possible with an SFU.

Hybrid Architectures: The Best of All Worlds

In practice, many large-scale WebRTC applications use hybrid architectures that combine elements of mesh, SFU, and MCU approaches:

Cascaded SFUs

For geographic distribution and massive scale:

Participants -> Regional SFU -> Core SFU -> Regional SFU -> Participants

This approach reduces latency by keeping media routing close to participants while enabling global scale.

SFU with Selective MCU

Use an SFU for most participants but an MCU for specific features:

                 ┌─> Participant A
                 │
Participant X ──> SFU ──> Participant B
                 │
                 └─> MCU ──> Recording Service

This gives you the efficiency of an SFU with the control of an MCU where needed.

Dynamic Mesh-to-Server Switching

Start with mesh for small groups, then transition to server-based architecture as more participants join:

function createScalableRoom(roomId) {
  let participants = [];
  let connections = {};
  let serverConnection = null;
  
  function addParticipant(participant) {
    participants.push(participant);
    
    // If we're below the threshold, use mesh
    if (participants.length <= 4) {
      // Create peer connections to all existing participants
      participants.forEach(existingParticipant => {
        if (existingParticipant.id !== participant.id) {
          const connectionId = `${participant.id}-${existingParticipant.id}`;
          connections[connectionId] = createPeerConnection(
            participant, 
            existingParticipant
          );
        }
      });
    } 
    // If we've crossed the threshold, switch to server architecture
    else if (participants.length === 5) {
      // Disconnect all mesh connections
      Object.values(connections).forEach(conn => conn.close());
      connections = {};
      
      // Connect everyone to the server
      serverConnection = connectToMediaServer(participants);
    }
    // If we're already using the server, just add the new participant
    else if (serverConnection) {
      serverConnection.addParticipant(participant);
    }
  }
  
  // Rest of the implementation...
  
  return {
    addParticipant,
    removeParticipant: function(participantId) { /* ... */ },
    // Other room management functions
  };
}

I implemented a similar approach for a collaborative workspace application. Small team meetings (2-4 people) used direct mesh connections for minimal latency, but when additional participants joined or screen sharing began, we seamlessly transitioned to an SFU architecture.

Real-World Scaling Strategies

Beyond choosing the right architecture, here are practical strategies I've used to scale WebRTC applications:

Bandwidth Management

Control bandwidth usage to support more participants:

// Dynamically adjust video quality based on participant count
function adjustVideoQuality(participantCount, localStream) {
  const videoTrack = localStream.getVideoTracks()[0];
  
  if (!videoTrack) return;
  
  // Define quality levels
  const qualityLevels = {
    high: { width: 1280, height: 720, frameRate: 30 },
    medium: { width: 640, height: 480, frameRate: 24 },
    low: { width: 320, height: 240, frameRate: 15 },
    minimal: { width: 160, height: 120, frameRate: 10 }
  };
  
  // Select quality based on participant count
  let quality;
  if (participantCount <= 4) {
    quality = qualityLevels.high;
  } else if (participantCount <= 9) {
    quality = qualityLevels.medium;
  } else if (participantCount <= 16) {
    quality = qualityLevels.low;
  } else {
    quality = qualityLevels.minimal;
  }
  
  // Apply constraints
  videoTrack.applyConstraints(quality);
}

Simulcast and Layered Coding

Use simulcast to send multiple quality levels, allowing the server or receivers to select the appropriate one:

// Enable simulcast for scalability
function enableSimulcast(sender) {
  const parameters = sender.getParameters();
  
  // Ensure we have encodings
  if (!parameters.encodings) {
    parameters.encodings = [{}];
  }
  
  // Create three simulcast layers
  parameters.encodings = [
    { rid: 'high', maxBitrate: 900000, scaleResolutionDownBy: 1 },
    { rid: 'medium', maxBitrate: 300000, scaleResolutionDownBy: 2 },
    { rid: 'low', maxBitrate: 100000, scaleResolutionDownBy: 4 }
  ];
  
  return sender.setParameters(parameters);
}

// Apply to video sender
const videoSender = peerConnection.getSenders().find(s => 
  s.track && s.track.kind === 'video'
);

if (videoSender) {
  enableSimulcast(videoSender);
}

Intelligent Stream Subscription

Only subscribe to streams that are relevant to the user:

// In an SFU-based system, selectively subscribe to streams
function manageSubscriptions(visibleParticipants, allParticipants, sfuConnection) {
  // Determine which participants are currently visible in the UI
  const visibleIds = visibleParticipants.map(p => p.id);
  
  // For each participant, decide whether to subscribe to their stream
  allParticipants.forEach(participant => {
    const isVisible = visibleIds.includes(participant.id);
    const isActiveSpeaker = participant.isActiveSpeaker;
    
    // Subscribe to visible participants and active speakers
    if (isVisible || isActiveSpeaker) {
      sfuConnection.subscribe(participant.id, { video: true, audio: true });
    } 
    // For non-visible participants, only subscribe to audio
    else {
      sfuConnection.subscribe(participant.id, { video: false, audio: true });
    }
  });
}

This approach is particularly effective for large meetings where only a few participants are visible at any given time. By subscribing only to the streams that are actually being displayed, you can support many more participants.

Load Balancing and Geographic Distribution

For global applications, distribute your media servers geographically:

// Select the optimal media server based on user location
async function selectOptimalMediaServer(userRegion) {
  // Get list of available media servers
  const response = await fetch('/api/media-servers');
  const servers = await response.json();
  
  // Filter servers by health status
  const healthyServers = servers.filter(server => server.status === 'healthy');
  
  // First try: find a server in the user's region
  const regionalServer = healthyServers.find(
    server => server.region === userRegion
  );
  if (regionalServer) return regionalServer;
  
  // Second try: find a server in a nearby region
  const nearbyRegions = getNearbyRegions(userRegion);
  const nearbyServer = healthyServers.find(
    server => nearbyRegions.includes(server.region)
  );
  if (nearbyServer) return nearbyServer;
  
  // Fallback: select the server with the lowest load
  return healthyServers.sort((a, b) => a.currentLoad - b.currentLoad)[0];
}

I implemented a similar system for a global education platform. By routing users to the nearest media server, we reduced latency and improved the user experience, particularly for international connections.

Scaling Challenges and Solutions

Even with the right architecture, scaling WebRTC applications presents unique challenges. Here are some I've encountered and the solutions I've implemented:

Challenge: Signaling Server Scalability

As your user base grows, your signaling server can become a bottleneck.

Solution: Horizontally Scaled Signaling

// Server-side pseudocode for scalable signaling
function createScalableSignalingServer() {
  // Use Redis for pub/sub across multiple server instances
  const redisClient = createRedisClient();
  const redisPublisher = redisClient.duplicate();
  
  // Subscribe to messages for all rooms
  redisClient.psubscribe('room:*');
  
  // When a message arrives from Redis, send it to connected clients
  redisClient.on('pmessage', (pattern, channel, message) => {
    const roomId = channel.split(':')[1];
    const connectedClients = getClientsInRoom(roomId);
    
    connectedClients.forEach(client => {
      if (client.id !== message.senderId) {
        client.send(message.data);
      }
    });
  });
  
  // When a client sends a message, publish it to Redis
  function handleClientMessage(client, message) {
    const roomId = client.roomId;
    
    redisPublisher.publish(`room:${roomId}`, {
      senderId: client.id,
      data: message
    });
  }
  
  // Rest of the signaling server implementation...
}

This approach allows you to run multiple signaling server instances behind a load balancer, with Redis (or another pub/sub system) ensuring messages are properly routed between instances.

Challenge: TURN Server Capacity

TURN servers can become overloaded as your user base grows.

Solution: TURN Server Pools and Monitoring

// Client-side code for TURN server selection
function getTurnServers() {
  return fetch('/api/turn-credentials')
    .then(response => response.json())
    .then(data => {
      // The server provides a list of TURN servers with credentials
      return data.turnServers.map(server => ({
        urls: server.urls,
        username: data.username,
        credential: data.credential
      }));
    });
}

// Server-side pseudocode for TURN server management
function allocateTurnServer(request) {
  // Get all available TURN servers
  const turnServers = getTurnServerPool();
  
  // Filter by region for lower latency
  const userRegion = geolocateUser(request.ip);
  const regionalServers = turnServers.filter(
    server => server.region === userRegion
  );
  
  // Select server with lowest current load
  const selectedServers = (regionalServers.length > 0 ? regionalServers : turnServers)
    .sort((a, b) => a.currentLoad - b.currentLoad)
    .slice(0, 3); // Provide multiple servers for redundancy
  
  // Generate time-limited credentials
  const credentials = generateTurnCredentials();
  
  return {
    turnServers: selectedServers,
    username: credentials.username,
    credential: credentials.password
  };
}

By monitoring TURN server usage and dynamically allocating servers based on load and geographic proximity, you can ensure reliable connectivity even as your application scales.

Challenge: Recording and Archiving at Scale

Recording WebRTC sessions becomes increasingly complex at scale.

Solution: Distributed Recording Architecture

// Server-side pseudocode for scalable recording
function startRecording(roomId) {
  // Allocate a recording worker
  const recordingWorker = allocateRecordingWorker();
  
  // Create a special participant that joins the room
  const recordingParticipant = createRecordingParticipant(roomId);
  
  // The recording participant subscribes to all streams
  subscribeToAllStreams(recordingParticipant, roomId);
  
  // Start recording process on the worker
  recordingWorker.startRecording(recordingParticipant.streams, {
    roomId: roomId,
    timestamp: Date.now(),
    format: 'mp4',
    layout: 'grid'
  });
  
  // Store recording metadata
  storeRecordingMetadata(roomId, {
    workerId: recordingWorker.id,
    startTime: Date.now(),
    status: 'recording'
  });
  
  return {
    recordingId: generateRecordingId(roomId),
    status: 'started'
  };
}

This approach separates recording concerns from your main media servers, allowing you to scale recording capacity independently.

Case Studies in WebRTC Scaling

Let me share some real-world examples of how I've scaled WebRTC applications for different use cases:

Case Study 1: Virtual Classroom (50-100 Participants)

For an educational platform, we needed to support classes with one teacher and up to 100 students. Our solution:

Hybrid Architecture: SFU for routing with selective MCU for recordings
Role-Based Quality: Teacher stream at high quality, student streams at lower quality
Dynamic Subscription: Only subscribe to video for active speakers and "spotlighted" students
Bandwidth Classes: Different quality tiers based on participant's network capabilities

This architecture supported classes with 100+ participants while maintaining a smooth experience for all users.

Case Study 2: Company All-Hands (1000+ Viewers)

For company-wide meetings, we implemented:

Broadcast Model: Small group of presenters connected via SFU
HLS Integration: WebRTC streams were converted to HLS for large-scale viewing
Selective Interactivity: Questions from the audience were selectively brought into the WebRTC session
Regional Distribution: Media servers in multiple regions with intelligent routing

This hybrid WebRTC/HLS approach supported thousands of viewers while maintaining the interactive elements essential for company meetings.

Case Study 3: Virtual Conference (Parallel Sessions)

For a virtual conference platform with multiple simultaneous sessions:

Isolated Media Servers: Dedicated media server instances for each active session
Dynamic Provisioning: Automatically scaling infrastructure based on scheduled sessions
Shared TURN Infrastructure: Common TURN server pool across all sessions
Optimized Recording: Separate recording infrastructure to avoid impacting live sessions

This architecture supported dozens of parallel sessions with 50-100 participants each, providing a reliable experience throughout a multi-day conference.

The Future of WebRTC Scaling

As WebRTC continues to evolve, several emerging technologies promise to further improve scalability:

WebTransport and QUIC

The emerging WebTransport API, based on QUIC, offers potential improvements in connection establishment and latency, which could enhance scaling capabilities.

WebRTC Insertable Streams

The Insertable Streams API enables processing of media before encoding or after decoding, opening possibilities for more efficient client-side processing and custom scaling solutions.

WebAssembly for Media Processing

WebAssembly enables more efficient media processing directly in the browser, potentially allowing for more complex client-side operations without sacrificing performance.

Balancing Scale, Quality, and Cost

Throughout my career implementing WebRTC solutions, I've learned that scaling is always a balance between three factors:

Scale: How many participants you need to support
Quality: The audio/video quality and interactivity you want to maintain
Cost: The infrastructure and operational expenses you can justify

There's no one-size-fits-all solution. A virtual classroom has different requirements than a webinar platform, which differs from a social media live streaming service.

The key is to start with a clear understanding of your specific requirements and constraints, then choose the architecture and scaling strategies that best align with your goals. Be prepared to evolve your approach as your application grows and user needs change.

Putting It All Together

Scaling WebRTC applications beyond simple peer-to-peer connections requires careful architecture selection and implementation. By understanding the strengths and limitations of mesh networks, SFUs, MCUs, and hybrid approaches, you can build applications that support the number of participants you need while maintaining quality and performance.

Remember that scaling isn't just about technology—it's about creating the best possible experience for your users. Sometimes, a simpler architecture with thoughtful optimizations can outperform a more complex solution.

In our next article, we'll explore another crucial aspect of WebRTC: mobile development. We'll see how to adapt WebRTC applications for mobile devices, addressing the unique challenges and opportunities they present.

---

This article is part of our WebRTC Essentials series, where we explore the technologies that power modern real-time communication. Join us in the next installment as we dive into Mobile WebRTC Development.

Scaling WebRTC Applications: From One-to-One to Many-to-Many