Skip to content

Getting Unable to protect RTP: :replay_old #186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asib opened this issue Jan 29, 2025 · 12 comments
Open

Getting Unable to protect RTP: :replay_old #186

asib opened this issue Jan 29, 2025 · 12 comments

Comments

@asib
Copy link

asib commented Jan 29, 2025

Hey there! Firstly, thank you for all your work on this, I'm totally uneducated with respect to WebRTC and yet even I was able to put together a PoC.

Context

I have an app to which I'm trying to add a voice chat feature. Specifically, it's a multiplayer game where I want it to be possible for all players to send and receive audio. I'm essentially copying the two components in live_ex_webrtc and stripping out the bits related to video to do this. That is to say, the architecture is:

  • there's a publisher liveview which has a peer connection for publishing that is connected to the browser
  • the publisher liveview broadcasts packets received on that PC over pubsub
  • there's a player liveview that is subscribed to the pubsub channel, and has its own PC, connected to the browser
  • when the player liveview receives a packet via pubsub, it sends it down to the browser via its PC, using pc.send_rtp(...)
  • both these liveviews are rendered on the same page (using live_render())

The publisher PC isn't established until the user presses a button on the page to start streaming their microphone audio. They can press it again to stop streaming, by which I mean the PC is closed and all state is destroyed. If they press it a third time, the publisher PC is established completely fresh.

The player PC is established when the page is loaded, and never destroyed.

The problem

The problem I have is that in testing, I noticed that if I start/stop the publisher stream (by pressing the aforementioned button repeatedly) enough times, I eventually see the following message: Unable to protect RTP: :replay_old in the server logs, and packets stop being sent to any player PCs connected to the server. This usually happens the third time I try to start the publisher stream; not sure if that is of significance.

I put a dbg(packet) into my code just before the pc.send_rtp() call, and noticed that I see the :replay_old message only when the sequence_number field of the ExRTP.Packets is lower than it was when the previous publisher PC was disconnected. Indeed on one occasion, I just left the publisher streaming for long enough that the sequence number caught back up to where it had been previously, and the warning ceased and I could hear audio again.

I added some state to the player liveview to keep track of the last seen sequence number, and to rewrite any packets to use a sequence number one greater than the last seen, to ensure the player PC never sees a sequence number twice. This solved the problem, but it seems a bit haphazard.

So I'm wondering if I'm doing something wrong to begin with?

Appreciate any help and guidance you're able to give!

References

@LVala
Copy link
Member

LVala commented Jan 29, 2025

Hi, appreciate the detailed description!

In general, you shouldn't try to feed the same track of the player PCs with different audio streams. Because you tear create/down the publisher PC every time you start/stop, each time the publisher produces a logically new stream, in new sequence number and timestamp space/domain (as you've noticed yourself).

Possible remedies:

  • using the enabled property of published PC in the browser. Instead of tearing it down, along with its Elixir counterpart, you can just mute it - the browser will produce empty audio packets, the timestamps and sequence number will be as they should be,
  • if you need to tear down the publisher PC, you can remove the audio track from player PCs (when published is being destroyed), then add new tracks (when publisher is created again) and push the audio to that tracks. This requires renegotiation tho.
  • you can rewrite the timestamps and sequence numbers yourself - this is a bit more advanced and generally I would recommend against it (borderline anti-pattern in this case), but its possible considering that your publisher actually produces audio with the same codec and clock rate. You might be able to do that using ExWebRTC.RTP.Munger - it was created for different purpose, but there's a chance it will work here as well, just create a munger instance and feed it the packets before sending them to the player PCs. You can also rewrite the sequence numbers manually by just adding the diff between the old and new audio stream (so the packets' sequence numbers seem monotonic), if you're lucky it even will work without rewriting timestamps, but again, not recommended.

I haven't been actively involved in the project for a few months and I'm starting to forget things - try these, check if the issue was solved, otherwise @mickel8 might offer a better assistance.

@mickel8
Copy link
Member

mickel8 commented Jan 29, 2025

Perfect answer @LVala! 🎉

What else I can add. Maybe just two more words about :replay_failed error. It's thrown by SRTP library, which relies on a sliding window that determines sequence numbers that can be encrypted. In particular, as @LVala said, when you create a new RTP stream its sequence numbers and timestamps start from a random value. If this random value belongs to the current SRTP sliding window, SRTP will accept the packet and encrypt it. Otherwise, it will drop it. If you wait long enough, at some point, sequence numbers will rollover and finally enter the sliding window - that's why you can hear the audio after some time.

I agree that muting the track or just renegotiating the connection is the easiest way of making the solution stable.

One more question, is your game one on one, or there can be more than two players in a single session?

I also wonder what's the role of the timestamp in case of audio because even if you fit into the SRTP sliding window with a new RTP stream, there is a high chance you will create a gap in timestamps, which, I thought, should create some problems too 🤔

@asib
Copy link
Author

asib commented Jan 29, 2025

Hey thanks both for the quick responses!

  • using the enabled property of published PC in the browser. Instead of tearing it down, along with its Elixir counterpart, you can just mute it - the browser will produce empty audio packets, the timestamps and sequence number will be as they should be

The reason I opted against doing something like this is that it seems as though as long as the peer connection remains open, the browser indicates to the user that the microphone is enabled (red recording dot on tab and microphone icon in the URL bar):

Image

One more question, is your game one on one, or there can be more than two players in a single session?

There could be any number of players. Based on @LVala's comment:

In general, you shouldn't try to feed the same track of the player PCs with different audio streams.

It would seem that if e.g. there are 4 players all talking then I need to have, for each player, a player PC with 3 different tracks, one for each of the other 3 players' microphone streams? Which in turn I think means I would need 3 <audio> elements, one for each of the tracks? I thought I might be able to bypass this complexity (renegotiation in particular!) by simply shoving any packets I got into the player PCs 😄

I think I need to go back to the drawing board and have a look at the Nexus example code, which seems like exactly what I'm doing, except I'm doing it without the video tracks. Thank you both for you patience!

@mickel8
Copy link
Member

mickel8 commented Jan 29, 2025

Nexus relies on a single peer connection which is a bit more complex example. In your case, when you have a separate pc for sending and a separate pc for receiving, you can do the following (I hope 🤞):

  • When there is a new track, add it to the pc, exchange sdp and that's it
  • When some track disappers, remove it, exchange sdp and that's it
  • When you add or remove track, negotiation_needed event should be fired IIRC
  • Alternatively, if remove_track does not trigger negotiation_needed, stop the transceiver that owns given track. This for sure triggers the event

No need to exchange ice candidates. I would recommend trying this approach at first, without looking at Nexus as Nexus operates on transceivers and directions, which you might not need

@mickel8
Copy link
Member

mickel8 commented Jan 29, 2025

Btw, I checked your app on fly, really nice! And I was ale to talk to myself :D What about writing a short blogpost once you finish the work on audio part?

@asib
Copy link
Author

asib commented Feb 5, 2025

Ok I've added code to create one track per publisher PC on each listener PC (except for the listener's own corresponding publisher, as I don't want to "echo" the user back to themself). I've also added code to handle renegotiation on all player PCs whenever a publisher starts or stops streaming.

One question around this:

No need to exchange ice candidates

Am I supposed to keep state to avoid exchanging ice candidates after the initial setup of the PC? I see a handful of the following logs on the server side every time there's a renegotiation:

[warning] Received remote candidate after end-of-candidates. Ignoring.

I tried adding code to the player on the browser side to not push ice candidates to the client after initial setup, but the logs were still appearing, so I'm not really sure what's producing the candidates.

I've definitely got some bugs to iron out beyond this, though.


edit: one thing in particular I'm having trouble debugging is if I open a tab and turn on the microphone (i.e. start streaming), and then open a second tab, the audio doesn't play, even though ontrack is being triggered in that second tab due to the first tab's publisher pushing RTP packets.

I think the problem might be that my code doesn't wait for renegotiation to complete for any new tracks before trying to send packets to them. Is that a known problem? I can't think what else might be the issue - I can't reproduce it locally, so it's slow to iterate on potential solutions because I have to redeploy every time.


What about writing a short blogpost once you finish the work on audio part?

I don't have a blog! I'm also painfully aware of how little of the webrtc stuff I truly have a grapple on, so I worry that anything I write will come across as poorly informed.

@asib
Copy link
Author

asib commented Feb 5, 2025

Oh one other thing I'm having trouble with, if y'all have any tips: when I try to listen on an iOS mobile device, the <audio> elements created by the ontrack handler (when a new publisher begins streaming) are created in a muted state, despite my code setting newAudio.muted = false.

This is a built-in iOS protection (unmuting requires user interaction, in order to prevent audio spam on a page), but I know there's some way that the OS will let me create an autoplayed, unmuted track, I'm just not sure what that is exactly. Appreciate any steer you might be able to give me, no worries if you've got no clue! :)

@mickel8
Copy link
Member

mickel8 commented Feb 9, 2025

I tried adding code to the player on the browser side to not push ice candidates to the client after initial setup, but the logs were still appearing, so I'm not really sure what's producing the candidates.

It might be that ICE candidates are included in SDP offer/answer once they are gathered. I wouldn't worry about this too much. Alternatively, you can check if a similar log appears in Nexus when a new person joins.

I think the problem might be that my code doesn't wait for renegotiation to complete for any new tracks before trying to send packets to them. Is that a known problem? I can't think what else might be the issue - I can't reproduce it locally, so it's slow to iterate on potential solutions because I have to redeploy every time.

Try to take a look at chrome://webrtc-internals. You will find there all information about incoming RTP streams, in particular whether browser receives any RTP packets and decodes them.
I don't think that sending data too early should be a problem. PC should just drop it. However, in Nexus we wait until we receive an SDP answer - https://github.com/elixir-webrtc/apps/blob/master/nexus/lib/nexus/peer.ex#L144-L152. You can try this too.

Make also sure that you assign MediaStream to your audio element. The track event in JS API does not always include MediaStream object (sometimes only MediaStreamTrack). This depends on your backend code - if you add a track with media stream id, then on the FE there will be a stream too. If stream is null/undefined, just create one, add a track to it, and assign the stream to the audio element.

I don't have a blog! I'm also painfully aware of how little of the webrtc stuff I truly have a grapple on, so I worry that anything I write will come across as poorly informed.

No worries, we have! We can help :)

This is a built-in iOS protection (unmuting requires user interaction, in order to prevent audio spam on a page), but I know there's some way that the OS will let me create an autoplayed, unmuted track, I'm just not sure what that is exactly. Appreciate any steer you might be able to give me, no worries if you've got no clue! :)

The only thing I know is that web browsers come with a similar protection and it is not applied if user somehow interacts with the website (e.g. clicks a button that joins them to a videoconferencing room). Unfortunately, I am have no experience with iOS devices so that's all I know :(

@asib
Copy link
Author

asib commented Feb 10, 2025

Regarding the mobile issue I mentioned, it seems as though if microphone audio is being streamed that new <audio> elements can be created and autoplayed (in an unmuted state). I guess this is why video/audio conferencing apps ask you to "join" a call before you can hear anything, because they can't autoplay without user interaction. In light of this, I need to do a bit of a rewrite!

No worries, we have! We can help :)

Okay, I'm game! I'll hold of on drafting anything until I get something working in my app that I'm happy with, but I'll keep you posted.


Thanks for the tip about chrome://webrtc-internals, I'll give that a go :)

@mickel8
Copy link
Member

mickel8 commented Feb 10, 2025

Regarding the mobile issue I mentioned, it seems as though if microphone audio is being streamed that new elements can be created and autoplayed (in an unmuted state). I guess this is why video/audio conferencing apps ask you to "join" a call before you can hear anything, because they can't autoplay without user interaction. In light of this, I need to do a bit of a rewrite!

I think that a simple button that just activates voice chat should be enough? Something you have right now in the right bottom corner

@asib
Copy link
Author

asib commented Feb 16, 2025

I think that a simple button that just activates voice chat should be enough? Something you have right now in the right bottom corner

Yeh I just want to make the UX a bit clearer. My original idea was that as soon as you hop into a game, you can already hear anyone who might have their microphone enabled. Given I don't think this is possible, I'm going to change the UX to match that of other conference-style apps so that you more explicitly have to "join" the voice chat to be able to hear anyone.

Related to this, I realised that other conferencing apps do not fully destroy their peer connections when a user mutes their audio (or disables their video, if it's a video conferencing app). E.g. I still see the "recording" pulsing icon on my browser tab when I mute myself in a Google Meet call. So I think my fear about that UX upthread isn't as big of a problem as I thought it was - users are already probably conditioned to still seeing the recording indicator even after muting themselves.

@mickel8
Copy link
Member

mickel8 commented Feb 17, 2025

My original idea was that as soon as you hop into a game, you can already hear anyone who might have their microphone enabled.

Tbh I think that explicit join is a better UX. Opening a web-site that can immediately play something is considered as something undesirable :)

Related to this, I realised that other conferencing apps do not fully destroy their peer connections when a user mutes their audio (or disables their video, if it's a video conferencing app). E.g. I still see the "recording" pulsing icon on my browser tab when I mute myself in a Google Meet call. So I think my fear about that UX upthread isn't as big of a problem as I thought it was - users are already probably conditioned to still seeing the recording indicator even after muting themselves.

I would leave this for the next iteration. This would be a good improvement but correct negotiation seems to be more important 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants