Skip to content

WebRTC fundamentals

radioman edited this page Jan 28, 2016 · 2 revisions

What is WebRTC?

In general it's a protocol that can be used to transfer (live) audio and/or video content between native clients and/or web clients over the network. It is also capable of establishing data channels. WebRTC is mainly built on libjingle, which is the base of Google's XMPP protocol.

How do the peers connect to each other in general?

  • both peers use the same signaling service;
  • any of the peers calls the other (remote) peer;
  • other peer accepts the call;
  • the peers negotiate the WebRTC connection details;
  • peers establish a direct channel between each other (this is the optimal case) to transmit audio/video;

Briefly about signaling

  • signaling protocols and mechanisms are not defined by WebRTC standards;
  • a signaling service is required to initiate a WebRTC session;

What is a signaling service?

There are two peers that want to communicate and are located in two different places on the Earth. They must somehow get into touch with each other to begin WebRTC negotiation. A signaling service helps the peers locate each other and to exchange the handshaking data. WebRTC doesn't contain a builtin signaling service, you can anytime write your own one (as I did) or use any other infrastructure that may be useful for this purpose, e.g. libjingle.

Signaling service has a particular role as long as the WebRTC negotiation hasn't succeeded, since until that this is the only way peers can communicate. WebRTC negotiation requires exchanging offers and ICE candidates (which is the key to establish direct connection between the peers), and this is done via a signaling service. Once the WebRTC negotiation has succeeded (whose “carrier” is the signaling service itself), peers can begin communicating using the WebRTC infrastructure. In optimal case at this point there's a direct connection between the peers.

What is the role of STUN/TURN servers?

The majority of computers are not directly connected to the internet, although are beyond a firewall (NAT etc.), so two peers can't connect that extremely easily if they're not on the very same subnet. This is where STUN/TURN servers come into the picture.

In my understanding a STUN server is just a machine which tells the caller its public IP address. In optimal case it's adequate for the peers to find the correct path and methods to each other (using ICE protocol). If it's not possible for some reason, TURN servers provide a cloud; the peers connect to the cloud and the TURN server makes it sure that all data intended to the peers get their destinations. It obviously requires more resources than the direct connection. So TURN servers are used to relay traffic if direct (peer to peer) connection fails.

What are the exact steps for two peers to connect and start exchanging data?

The native C/C++ (hybrid) application I wrote connects to another native WebRTC application. The peer who initiates the connection is called 'caller', and the remote peer who accepts the connection is apparently the 'callee'. The goal is simple: caller must transfer the microphone signals to the callee. These are the exact steps that have to be performed by the peers:

  • caller instantiates a PeerConnectionFactory and a PeerConnectionInterface with ICE server, username and password;
  • callee instantiates a PeerConnectionFactory and a PeerConnectionInterface with ICE server, username and password;
  • caller creates audio/video tracks;
  • caller generates an offer;
  • caller sends the offer to the callee;
  • callee sets the received offer as its remote description (kOffer);
  • callee generates an answer, sends it back to the caller and sets as its local description (kAnswer);
  • callee begins receiving ICE candidates and preserves them;
  • caller sets the offer (which was sent to callee above) as its local description (kOffer);
  • caller begins receiving ICE candidates and preserves them;
  • caller sets the callee's answer as its remote description (kAnswer);
  • both caller and callee wait until no more ICE candidates arrive and then they exchange their candidates;
  • from this point the caller has to consider the ICE candidates from the callee and the callee has to consider the ICE candidates from the caller, resp.;
  • both caller and callee call AddIceCandidate(...) on their ICE candidates until success;
  • when the connection is up, flags change and indicate;
  • if the connection breaks, callback functions indicate it, e.g. OnIceConnectionChange(kIceConnectionDisconnected);
Clone this wiki locally