"How many participants can we support in a single call?"
This question inevitably arises when developing WebRTC applications, and the answer is rarely straightforward. WebRTC's peer-to-peer nature makes it perfect for one-to-one communication, but scaling to multiple participants introduces significant challenges.
I remember the first time I encountered this scaling problem. We had built a video conferencing application that worked beautifully for two people. When we tested it with four participants, it still performed well. But when we tried eight participants, the application crawled to a halt—video froze, audio cut out, and some users couldn't connect at all. Our simple peer-to-peer architecture had hit its limits.
This experience taught me that scaling WebRTC isn't just about adding more connections—it requires fundamentally rethinking your architecture as you grow. The approaches that work for two participants often fail completely at twenty or two hundred.
In this article, we'll explore the different architectures for scaling WebRTC applications, from simple mesh networks to sophisticated media servers. Drawing from my experience building systems that support thousands of concurrent users, I'll share practical strategies for scaling your WebRTC applications while maintaining quality and performance.
Related Guides
Understanding the Scaling Challenge
Before diving into solutions, let's understand why scaling WebRTC is challenging:
The Resource Problem
WebRTC's peer-to-peer model creates resource demands that grow exponentially with the number of participants:
- Bandwidth: In a full mesh network, each participant sends their media stream to every other participant. With N participants, each sends N-1 streams and receives N-1 streams.
- CPU Usage: Each participant must encode their outgoing stream once for each recipient and decode incoming streams from each participant.
- Connection Management: Each peer-to-peer connection requires separate negotiation, ICE candidates, and monitoring.
I once consulted for a company that tried to build a 16-person video conferencing solution using a pure mesh topology. On paper, it seemed feasible. In reality, even high-end computers struggled with encoding and decoding so many simultaneous streams, and most home internet connections couldn't handle the upload bandwidth requirements.
The Practical Limits
Based on my experience, here are the practical limits for different WebRTC architectures:
- Full Mesh (Peer-to-Peer): 4-6 participants maximum for video, slightly more for audio-only
- Selective Forwarding Unit (SFU): 25-50 participants with video, hundreds with audio
- Multipoint Control Unit (MCU): 50-100+ participants with video, depending on server capacity
- Hybrid Approaches: Potentially thousands of participants with the right architecture
These aren't hard limits—they depend on many factors including device capabilities, network conditions, and quality expectations. But they provide a useful framework for choosing the right architecture for your needs.
WebRTC Scaling Architectures
Let's explore the main architectural approaches for scaling WebRTC applications:
Mesh Architecture: The Starting Point
In a mesh architecture, every participant connects directly to every other participant:
// Simple implementation of a mesh network
function createMeshNetwork(participants) {
const connections = {};
// For each pair of participants, create a connection
for (let i = 0; i < participants.length; i++) {
for (let j = i + 1; j < participants.length; j++) {
const participant1 = participants[i];
const participant2 = participants[j];
// Create unique connection ID
const connectionId = `${participant1.id}-${participant2.id}`;
// Create and store the connection
connections[connectionId] = createPeerConnection(participant1, participant2);
}
}
return connections;
}
function createPeerConnection(participant1, participant2) {
// Create RTCPeerConnection
const peerConnection = new RTCPeerConnection(configuration);
// Add participant1's tracks to the connection
participant1.stream.getTracks().forEach(track => {
peerConnection.addTrack(track, participant1.stream);
});
// Set up event handlers for receiving participant2's tracks
peerConnection.ontrack = event => {
participant1.receiveTrack(event.track, participant2.id);
};
// Handle signaling between participants
// (simplified - in reality, this would go through your signaling server)
peerConnection.onicecandidate = event => {
if (event.candidate) {
sendSignalingMessage(participant2.id, {
type: 'candidate',
candidate: event.candidate
});
}
};
return peerConnection;
}
Advantages of Mesh Architecture:
- Simplicity: No server-side media handling required
- Low Latency: Direct connections minimize delay
- Privacy: Media flows directly between participants without a central server
Disadvantages of Mesh Architecture:
- Poor Scalability: Resource requirements grow exponentially with participants
- Device Limitations: Mobile devices quickly become overwhelmed
- Network Constraints: Most home internet connections have limited upload bandwidth
Despite these limitations, mesh architecture is still appropriate for small group calls (2-4 participants) where simplicity and low operational costs are priorities.
Selective Forwarding Unit (SFU): The Scalable Middle Ground
An SFU acts as a router for WebRTC streams. Each participant sends their media stream once to the SFU, which then forwards it to other participants without decoding or re-encoding:
// Client-side code for connecting to an SFU
function connectToSFU(sfuUrl, localStream, roomId) {
// Create connection to the SFU
const peerConnection = new RTCPeerConnection(configuration);
// Add local tracks to the connection
localStream.getTracks().forEach(track => {
peerConnection.addTrack(track, localStream);
});
// Set up handlers for remote tracks from the SFU
peerConnection.ontrack = event => {
// The SFU will send streams from other participants
// We need to identify which participant each track belongs to
const participantId = event.streams[0].id;
displayRemoteStream(event.streams[0], participantId);
};
// Create data channel for control messages
const controlChannel = peerConnection.createDataChannel('control');
controlChannel.onopen = () => {
// Join the room
controlChannel.send(JSON.stringify({
type: 'join',
roomId: roomId
}));
};
// Handle incoming control messages
controlChannel.onmessage = event => {
const message = JSON.parse(event.data);
handleControlMessage(message);
};
// Establish connection to the SFU
peerConnection.createOffer()
.then(offer => peerConnection.setLocalDescription(offer))
.then(() => {
// Send the offer to the SFU via your signaling mechanism
return fetch(`${sfuUrl}/join`, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
sdp: peerConnection.localDescription,
roomId: roomId
})
});
})
.then(response => response.json())
.then(answer => {
return peerConnection.setRemoteDescription(
new RTCSessionDescription(answer)
);
});
return peerConnection;
}
Advantages of SFU Architecture:
- Better Scalability: Each participant sends only one outgoing stream
- Reduced Client Load: Clients receive only the streams they need
- Bandwidth Efficiency: Upload bandwidth requirements remain constant regardless of participant count
Disadvantages of SFU Architecture:
- Server Requirements: Requires deploying and maintaining SFU servers
- Increased Latency: Adding a server hop increases delay slightly
- Server Bandwidth: SFU servers need substantial bandwidth
I've found SFU architecture to be the sweet spot for most multi-party WebRTC applications. It scales well without the complexity of an MCU, and the slight increase in latency is rarely noticeable to users.
Multipoint Control Unit (MCU): Maximum Control
An MCU takes scaling a step further by not only routing media but also processing it. The MCU receives streams from all participants, decodes them, combines them (often into a single composite view), re-encodes the result, and sends it to participants:
// Client-side code for connecting to an MCU is similar to SFU
// The main difference is in how streams are handled
function connectToMCU(mcuUrl, localStream, roomId) {
const peerConnection = new RTCPeerConnection(configuration);
// Add local tracks
localStream.getTracks().forEach(track => {
peerConnection.addTrack(track, localStream);
});
// With an MCU, we typically receive just one stream
// containing a composite view of all participants
peerConnection.ontrack = event => {
// Display the composite stream
displayCompositeStream(event.streams[0]);
};
// Control channel for layout changes, etc.
const controlChannel = peerConnection.createDataChannel('control');
controlChannel.onopen = () => {
// Join the room
controlChannel.send(JSON.stringify({
type: 'join',
roomId: roomId
}));
};
// We can send layout preferences to the MCU
function changeLayout(layout) {
controlChannel.send(JSON.stringify({
type: 'layout',
layout: layout // e.g., 'grid', 'spotlight', 'presentation'
}));
}
// Establish connection similar to SFU example
// ...
return {
peerConnection,
changeLayout
};
}
Advantages of MCU Architecture:
- Maximum Scalability: Clients receive only one stream regardless of participant count
- Consistent Quality: The MCU can adapt the composite stream to each client's capabilities
- Layout Control: The server can control how participants are displayed
Disadvantages of MCU Architecture:
- High Server Requirements: Decoding and encoding streams is CPU-intensive
- Higher Latency: Processing media adds delay
- Complexity and Cost: MCUs are more complex and expensive to deploy and maintain
MCU architecture shines in scenarios with large numbers of participants or where you need precise control over the user experience. For example, I worked on a virtual event platform that used an MCU to create a "TV-like" experience with professional transitions and layouts, something that wouldn't be possible with an SFU.
Hybrid Architectures: The Best of All Worlds
In practice, many large-scale WebRTC applications use hybrid architectures that combine elements of mesh, SFU, and MCU approaches:
Cascaded SFUs
For geographic distribution and massive scale:
Participants -> Regional SFU -> Core SFU -> Regional SFU -> Participants
This approach reduces latency by keeping media routing close to participants while enabling global scale.
SFU with Selective MCU
Use an SFU for most participants but an MCU for specific features:
┌─> Participant A
│
Participant X ──> SFU ──> Participant B
│
└─> MCU ──> Recording Service
This gives you the efficiency of an SFU with the control of an MCU where needed.
Dynamic Mesh-to-Server Switching
Start with mesh for small groups, then transition to server-based architecture as more participants join:
function createScalableRoom(roomId) {
let participants = [];
let connections = {};
let serverConnection = null;
function addParticipant(participant) {
participants.push(participant);
// If we're below the threshold, use mesh
if (participants.length <= 4) {
// Create peer connections to all existing participants
participants.forEach(existingParticipant => {
if (existingParticipant.id !== participant.id) {
const connectionId = `${participant.id}-${existingParticipant.id}`;
connections[connectionId] = createPeerConnection(
participant,
existingParticipant
);
}
});
}
// If we've crossed the threshold, switch to server architecture
else if (participants.length === 5) {
// Disconnect all mesh connections
Object.values(connections).forEach(conn => conn.close());
connections = {};
// Connect everyone to the server
serverConnection = connectToMediaServer(participants);
}
// If we're already using the server, just add the new participant
else if (serverConnection) {
serverConnection.addParticipant(participant);
}
}
// Rest of the implementation...
return {
addParticipant,
removeParticipant: function(participantId) { /* ... */ },
// Other room management functions
};
}
I implemented a similar approach for a collaborative workspace application. Small team meetings (2-4 people) used direct mesh connections for minimal latency, but when additional participants joined or screen sharing began, we seamlessly transitioned to an SFU architecture.
Real-World Scaling Strategies
Beyond choosing the right architecture, here are practical strategies I've used to scale WebRTC applications:
Bandwidth Management
Control bandwidth usage to support more participants:
// Dynamically adjust video quality based on participant count
function adjustVideoQuality(participantCount, localStream) {
const videoTrack = localStream.getVideoTracks()[0];
if (!videoTrack) return;
// Define quality levels
const qualityLevels = {
high: { width: 1280, height: 720, frameRate: 30 },
medium: { width: 640, height: 480, frameRate: 24 },
low: { width: 320, height: 240, frameRate: 15 },
minimal: { width: 160, height: 120, frameRate: 10 }
};
// Select quality based on participant count
let quality;
if (participantCount <= 4) {
quality = qualityLevels.high;
} else if (participantCount <= 9) {
quality = qualityLevels.medium;
} else if (participantCount <= 16) {
quality = qualityLevels.low;
} else {
quality = qualityLevels.minimal;
}
// Apply constraints
videoTrack.applyConstraints(quality);
}
Simulcast and Layered Coding
Use simulcast to send multiple quality levels, allowing the server or receivers to select the appropriate one:
// Enable simulcast for scalability
function enableSimulcast(sender) {
const parameters = sender.getParameters();
// Ensure we have encodings
if (!parameters.encodings) {
parameters.encodings = [{}];
}
// Create three simulcast layers
parameters.encodings = [
{ rid: 'high', maxBitrate: 900000, scaleResolutionDownBy: 1 },
{ rid: 'medium', maxBitrate: 300000, scaleResolutionDownBy: 2 },
{ rid: 'low', maxBitrate: 100000, scaleResolutionDownBy: 4 }
];
return sender.setParameters(parameters);
}
// Apply to video sender
const videoSender = peerConnection.getSenders().find(s =>
s.track && s.track.kind === 'video'
);
if (videoSender) {
enableSimulcast(videoSender);
}
Intelligent Stream Subscription
Only subscribe to streams that are relevant to the user:
// In an SFU-based system, selectively subscribe to streams
function manageSubscriptions(visibleParticipants, allParticipants, sfuConnection) {
// Determine which participants are currently visible in the UI
const visibleIds = visibleParticipants.map(p => p.id);
// For each participant, decide whether to subscribe to their stream
allParticipants.forEach(participant => {
const isVisible = visibleIds.includes(participant.id);
const isActiveSpeaker = participant.isActiveSpeaker;
// Subscribe to visible participants and active speakers
if (isVisible || isActiveSpeaker) {
sfuConnection.subscribe(participant.id, { video: true, audio: true });
}
// For non-visible participants, only subscribe to audio
else {
sfuConnection.subscribe(participant.id, { video: false, audio: true });
}
});
}
This approach is particularly effective for large meetings where only a few participants are visible at any given time. By subscribing only to the streams that are actually being displayed, you can support many more participants.
Load Balancing and Geographic Distribution
For global applications, distribute your media servers geographically:
// Select the optimal media server based on user location
async function selectOptimalMediaServer(userRegion) {
// Get list of available media servers
const response = await fetch('/api/media-servers');
const servers = await response.json();
// Filter servers by health status
const healthyServers = servers.filter(server => server.status === 'healthy');
// First try: find a server in the user's region
const regionalServer = healthyServers.find(
server => server.region === userRegion
);
if (regionalServer) return regionalServer;
// Second try: find a server in a nearby region
const nearbyRegions = getNearbyRegions(userRegion);
const nearbyServer = healthyServers.find(
server => nearbyRegions.includes(server.region)
);
if (nearbyServer) return nearbyServer;
// Fallback: select the server with the lowest load
return healthyServers.sort((a, b) => a.currentLoad - b.currentLoad)[0];
}
I implemented a similar system for a global education platform. By routing users to the nearest media server, we reduced latency and improved the user experience, particularly for international connections.
Scaling Challenges and Solutions
Even with the right architecture, scaling WebRTC applications presents unique challenges. Here are some I've encountered and the solutions I've implemented:
Challenge: Signaling Server Scalability
As your user base grows, your signaling server can become a bottleneck.
Solution: Horizontally Scaled Signaling
// Server-side pseudocode for scalable signaling
function createScalableSignalingServer() {
// Use Redis for pub/sub across multiple server instances
const redisClient = createRedisClient();
const redisPublisher = redisClient.duplicate();
// Subscribe to messages for all rooms
redisClient.psubscribe('room:*');
// When a message arrives from Redis, send it to connected clients
redisClient.on('pmessage', (pattern, channel, message) => {
const roomId = channel.split(':')[1];
const connectedClients = getClientsInRoom(roomId);
connectedClients.forEach(client => {
if (client.id !== message.senderId) {
client.send(message.data);
}
});
});
// When a client sends a message, publish it to Redis
function handleClientMessage(client, message) {
const roomId = client.roomId;
redisPublisher.publish(`room:${roomId}`, {
senderId: client.id,
data: message
});
}
// Rest of the signaling server implementation...
}
This approach allows you to run multiple signaling server instances behind a load balancer, with Redis (or another pub/sub system) ensuring messages are properly routed between instances.
Challenge: TURN Server Capacity
TURN servers can become overloaded as your user base grows.
Solution: TURN Server Pools and Monitoring
// Client-side code for TURN server selection
function getTurnServers() {
return fetch('/api/turn-credentials')
.then(response => response.json())
.then(data => {
// The server provides a list of TURN servers with credentials
return data.turnServers.map(server => ({
urls: server.urls,
username: data.username,
credential: data.credential
}));
});
}
// Server-side pseudocode for TURN server management
function allocateTurnServer(request) {
// Get all available TURN servers
const turnServers = getTurnServerPool();
// Filter by region for lower latency
const userRegion = geolocateUser(request.ip);
const regionalServers = turnServers.filter(
server => server.region === userRegion
);
// Select server with lowest current load
const selectedServers = (regionalServers.length > 0 ? regionalServers : turnServers)
.sort((a, b) => a.currentLoad - b.currentLoad)
.slice(0, 3); // Provide multiple servers for redundancy
// Generate time-limited credentials
const credentials = generateTurnCredentials();
return {
turnServers: selectedServers,
username: credentials.username,
credential: credentials.password
};
}
By monitoring TURN server usage and dynamically allocating servers based on load and geographic proximity, you can ensure reliable connectivity even as your application scales.
Challenge: Recording and Archiving at Scale
Recording WebRTC sessions becomes increasingly complex at scale.
Solution: Distributed Recording Architecture
// Server-side pseudocode for scalable recording
function startRecording(roomId) {
// Allocate a recording worker
const recordingWorker = allocateRecordingWorker();
// Create a special participant that joins the room
const recordingParticipant = createRecordingParticipant(roomId);
// The recording participant subscribes to all streams
subscribeToAllStreams(recordingParticipant, roomId);
// Start recording process on the worker
recordingWorker.startRecording(recordingParticipant.streams, {
roomId: roomId,
timestamp: Date.now(),
format: 'mp4',
layout: 'grid'
});
// Store recording metadata
storeRecordingMetadata(roomId, {
workerId: recordingWorker.id,
startTime: Date.now(),
status: 'recording'
});
return {
recordingId: generateRecordingId(roomId),
status: 'started'
};
}
This approach separates recording concerns from your main media servers, allowing you to scale recording capacity independently.
Case Studies in WebRTC Scaling
Let me share some real-world examples of how I've scaled WebRTC applications for different use cases:
Case Study 1: Virtual Classroom (50-100 Participants)
For an educational platform, we needed to support classes with one teacher and up to 100 students. Our solution:
- Hybrid Architecture: SFU for routing with selective MCU for recordings
- Role-Based Quality: Teacher stream at high quality, student streams at lower quality
- Dynamic Subscription: Only subscribe to video for active speakers and "spotlighted" students
- Bandwidth Classes: Different quality tiers based on participant's network capabilities
This architecture supported classes with 100+ participants while maintaining a smooth experience for all users.
Case Study 2: Company All-Hands (1000+ Viewers)
For company-wide meetings, we implemented:
- Broadcast Model: Small group of presenters connected via SFU
- HLS Integration: WebRTC streams were converted to HLS for large-scale viewing
- Selective Interactivity: Questions from the audience were selectively brought into the WebRTC session
- Regional Distribution: Media servers in multiple regions with intelligent routing
This hybrid WebRTC/HLS approach supported thousands of viewers while maintaining the interactive elements essential for company meetings.
Case Study 3: Virtual Conference (Parallel Sessions)
For a virtual conference platform with multiple simultaneous sessions:
- Isolated Media Servers: Dedicated media server instances for each active session
- Dynamic Provisioning: Automatically scaling infrastructure based on scheduled sessions
- Shared TURN Infrastructure: Common TURN server pool across all sessions
- Optimized Recording: Separate recording infrastructure to avoid impacting live sessions
This architecture supported dozens of parallel sessions with 50-100 participants each, providing a reliable experience throughout a multi-day conference.
The Future of WebRTC Scaling
As WebRTC continues to evolve, several emerging technologies promise to further improve scalability:
WebTransport and QUIC
The emerging WebTransport API, based on QUIC, offers potential improvements in connection establishment and latency, which could enhance scaling capabilities.
WebRTC Insertable Streams
The Insertable Streams API enables processing of media before encoding or after decoding, opening possibilities for more efficient client-side processing and custom scaling solutions.
WebAssembly for Media Processing
WebAssembly enables more efficient media processing directly in the browser, potentially allowing for more complex client-side operations without sacrificing performance.
Balancing Scale, Quality, and Cost
Throughout my career implementing WebRTC solutions, I've learned that scaling is always a balance between three factors:
- Scale: How many participants you need to support
- Quality: The audio/video quality and interactivity you want to maintain
- Cost: The infrastructure and operational expenses you can justify
There's no one-size-fits-all solution. A virtual classroom has different requirements than a webinar platform, which differs from a social media live streaming service.
The key is to start with a clear understanding of your specific requirements and constraints, then choose the architecture and scaling strategies that best align with your goals. Be prepared to evolve your approach as your application grows and user needs change.
Putting It All Together
Scaling WebRTC applications beyond simple peer-to-peer connections requires careful architecture selection and implementation. By understanding the strengths and limitations of mesh networks, SFUs, MCUs, and hybrid approaches, you can build applications that support the number of participants you need while maintaining quality and performance.
Remember that scaling isn't just about technology—it's about creating the best possible experience for your users. Sometimes, a simpler architecture with thoughtful optimizations can outperform a more complex solution.
In our next article, we'll explore another crucial aspect of WebRTC: mobile development. We'll see how to adapt WebRTC applications for mobile devices, addressing the unique challenges and opportunities they present.
---
This article is part of our WebRTC Essentials series, where we explore the technologies that power modern real-time communication. Join us in the next installment as we dive into Mobile WebRTC Development.
