WebRTC - How does it work ?

1 6 13
calendar_todayschedule6 min read
— Originally published at gitlab.com

What is WebRTC ?

WebRTC stands for Web Real Time Communication -> It allows audio, video and streaming communication to work inside
webpages by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps.

Is supported by Apple, Google, Microsoft, Mozilla and Opera, and is an standard way to implement peer-to-peer
communication.

High level flow of WebRTC

  • A wants to connect to B
  • A finds out all possible ways that the public can connect to it
  • B finds out all possible ways that the public can connect to it
  • A and B signal the session information (supported codecs, mode of communication) via other means (long polling, short polling, websockets)
  • A connects to B via the most optimal path
  • A & B can now exchange their supported media

Components at works

WebRTC is fairly complicated and it requires lot of components to work in harmony to be able to establish a P2P
connection. We should know about following components before we go ahead and create a WebRTC app.

  1. NAT, types of NAT
  2. STUN / TURN servers
  3. Signaling, Session Description Protocol (SDP)
  4. ICE, trickle ICE
  5. JS API -> RTCPeerConnection, RTCDataChannel, createOffer(), createAnswer(), setLocalDescription(), setRemoteDescription(), ...so on.

What is NAT ?

NAT stands for Network Address Translation, any device that is connected to the internet, is connected via a router,
so there are 2 networks that are at play here, one is the public network that is the internet and the other one is the
private network which is basically your LAN, so each device connected to a router that is connected to the internet, has
a public IP and a private IP.

You can check your public IP by visiting whatismyipaddress.com
and if you want to check your private IP open up a terminal (windows / Linux) and run ipconfig(windows) or ifconfig (Linux)
and you will get your local IP address.

So, if you want to connect your device (for ex. private IP: 10.0.0.2:8992) to a server / different device (public IP: 4.4.4.4:80) on internet
you would want a public IP address to connect to it, since router will not allow anyone to access the private network,
this is to avoid unsolicited connections - its a basic firewall.

Who assigns the public IP to your device ? Who make sure that you get back the response from the internet to your device ?

The Router. Router acts like a middleman between your device and the public network / internet.

Here is how it works:

A(private IP:10.0.0.2:8992) wants to connect to B (public IP: 4.4.4.4:80)

A creates a request packet like this

Port Public IP Request Destination IP Destination port
8992 10.0.0.2 GET/ 4.4.4.4 80

A forwards the packet to the router -> Router has to decide whether this is a packet on same network or external network ?

Router checks this by a process called subnet matching,

If 2 devices are from same subnet (xxx.xxx.xxx.y) then there is no need to create a NAT table entry, but if not, a entry in NAT table will be created.

This decision is made by router with following logic

10.0.0.1 && 255.255.255.0 -> 10.0.0.0 (lets say this is the gateway and the subnet is 10.0.0.0)

if the IP belongs to same network we should get same subnet for example 10.0.0.0 , else we would get other subnet and
that means that we are trying to connect to an IP outside the private network.

Types of NAT

  1. One-to-one NAT
  2. Address Restricted NAT
  3. Port Restricted NAT
  4. Symmetric NAT -> WebRTC does not like this one -> neither does any real time gaming app

One to One NAT (Full Cone NAT)

Int port Int IP External IP External port Dest IP Dest port
8992 10.0.0.2 5.5.5.5 3333 4.4.4.4 80

Outbound: You send a packet to any server.
Inbound: Any external IP & port can send to your mapped public IP / port, once it's open.

✅ Most permissive
Good for P2P, pretty bad for security

Address Restricted NAT

Int port Int IP External IP External port Dest IP Dest port
8992 10.0.0.2 5.5.5.5 3333 4.4.4.4 80

Outbound: You send to an external IP.
Inbound: Only that same IP can respond to your public IP/port - any port from that IP is allowed.

✅ Moderate restriction
Safer than full cone

Port restricted NAT

Int port Int IP External IP External port Dest IP Dest port
8992 10.0.0.2 5.5.5.5 3333 4.4.4.4 80

Outbound: You send to an external IP & port.
Inbound: Only that exact IP & port can respond.

✅ More restrictive
Tighter control on incoming traffic

Symmetric NAT

Int port Int IP External IP External port Dest IP Dest port
8992 10.0.0.2 5.5.5.5 3333 4.4.4.4 80

Outbound: Every outbound (IP: port) gets a unique mapping.
Inbound: Only the exact IP & port you contacted via the specific mapping can respond.

Most restrictive
❌ Hardest for P2P

STUN (Session Traversal utilities for NAT)

  • STUN server reflects back the public IP address:port combo back to the peer.
  • STUN has simply one role, get a request from a peer and reflect back its public ip: port
  • Works well with full cone, address/ port restricted NAT types
  • Does not work well with Symmetric NAT -> Since every outbound request gets a new mapping in NAT table,
    the public IP:port pair, returned by the STUN server could not be used reliably to create a P2P connection.

  • Around 80 - 85% of devices are behind NATs where STUN works just fine, only few devices (corporate networks, some mobile carriers, hotel / public wifi) are using symmetric NAT
    for that we need to make sure we use TURN servers

A simple way to remember this

You ask to a STUN server: Who am I ? (what is my public IP:port ?)
STUN server: This is who you are (public IP: port)

  • STUN servers are inexpensive and relatively cheap to maintain and you will find a lot of public STUN servers as well
    that you could use in your application, implementing a STUN server from scratch too, would not be too complex either.

TURN (Traversal using Relays around NAT)

  • As we discussed above, in about 10 - 15% cases its possible that the peers use symmetric NAT, that makes it impossible
    to create a P2P connection with STUN servers
  • TURN comes into picture in such cases, TURN server act as a relay server between 2 peers, esentially
    • A sends data to a TURN server,
    • TURN server sends that data to peer B
    • Peer B sends data to TURN server
    • TURN server forwards that to peer A
  • Now the connection is no longer a truly peer-to-peer anymore, latency is higher now, and as you can expect, bandwidth
    costs for the TURN servers are higher since they are handling a lot of data.

  • Typically a WebRTC client will try to connect peers via the STUN method, and if that fails it will fallback to TURN
    servers

ICE (Interactive connectivity establishment) candidates

  • So we have a bunch of STUNs and TURN servers setup now
  • We want to find what works i.e. is STUN enough or do I need to fallback on TURN ?
  • This process of determining best possible way to connect to peers is ICE gathering,
  • Browser tries out multiple connections until it finds a working one, and once that is done, it creates a list of
    servers through which the peers can communicate
  • Once the discovery is complete, we will have ICE candidates that need to be shared via signaling with other peer.

SDP (Session Description protocol)

  • Session Description Protocol (SDP) is a plain string that describes peers to one another,
  • What kind of media options, security options are supported by both peers, what kind of connection is being requested
    by another peer.
  • One of the important parts of WebRTC, since this will be exchanged via signaling among peers to even start the process
    of ICE gathering,
  • Goal is to generate a SDP and send it to the other party, how its done depends on your method of signaling.

Signaling

  • Signaling typically means sending the required information via some channel to both peers (exchanging information)
  • Now this can be done via almost anything (WhatsApp, QR code, Web sockets, HTTP), web sockets are a popular choice
  • What do we typically signal ? SDP, ICE candidates -> these need to be exchanged before a connection can be successfully
    established between two peers.

Putting everything together... So close to connecting the peers now!

Finally we understand how all the components play in the WebRTC game, now its time to connect

  • A wants to connect to B
  • A creates an "offer", it puts all the information about sessions, data channel in the SDP
  • A signals the SDP to B
  • At the same time, ICE gathering process starts for A, onicecandidate event handler is called once an ICE candidate is
    discovered, this needs to be signaled to B separately
  • B creates the answer after setting A's offer
  • B starts ICE gathering process, signals ICE candidates to A
  • A accepts B's answer
  • Connection is created.
756 Points20 Badges1 6 13
3Posts
5Comments
1Followers
2Connections
My learning is about to be oxidized
Build your own developer journey
Track progress. Share learning. Stay consistent.

2 Comments

0 votes
0 votes
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Tom Smithverified - Mar 16

How to build a browser‑based virtual webcam

AlphaCarbonate - May 25
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!