Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RTC-93] Add some RTP metrics #142

Merged
merged 18 commits into from
Mar 15, 2023
Merged

[RTC-93] Add some RTP metrics #142

merged 18 commits into from
Mar 15, 2023

Conversation

daniel-jodlos
Copy link
Contributor

This PR adds a few of the metrics described in RTC-93:

  1. Inbound frames (estimated based on the marker)
  2. Outbound frames (estimated based on the marker)
  3. Inbound RTP NACK counter
  4. Inbound RTCP Packets
  5. Inbound RTCP Bytes Received
  6. Outbound RTCP Packets
  7. Outbound RTCP Bytes Sent

@daniel-jodlos daniel-jodlos marked this pull request as ready for review February 9, 2023 04:43
@daniel-jodlos daniel-jodlos requested a review from mickel8 February 9, 2023 04:43
"inbound-rtp.ssrc",
event_name: [Membrane.RTP, :inbound_track, :new],
measurement: :ssrc
),
Metrics.counter(
"inbound-rtp.nack",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, I'd include _sent in the name to leave no questions

Suggested change
"inbound-rtp.nack",
"inbound-rtp.nack_sent",

@@ -58,6 +80,14 @@ defmodule Membrane.RTCP.Parser do
@impl true
def handle_event(:output, %RTCPEvent{} = event, _ctx, state) do
buffer = %Buffer{payload: RTCP.Packet.serialize(event.rtcp)}

Membrane.TelemetryMetrics.execute(
@rtcp_sent_telemetry_event,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit problematic, as it includes FIRs, PLIs, NACKs, REMB (when/if implemented) etc in addition to receiver reports.

  1. Do you thing such aggregated statistic is helpful?
  2. I guess the number of SRs can be estimated by substraction RTCP - FIR - NACK, but maybe we should have a separate metric for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think this comment answers it pretty well: [RTC-93] Add some RTP metrics #142 (comment)

@@ -65,6 +72,8 @@ defmodule Membrane.RTP.OutboundTrackingSerializer do

@impl true
def handle_init(_ctx, options) do
Membrane.TelemetryMetrics.register(@frame_sent_telemetry_event, options.telemetry_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are ignoring RTCP packets sent by this element, a metric for that is missing

Comment on lines 37 to 58
Metrics.counter(
"inbound-rtp.nack",
event_name: [Membrane.RTP, :rtcp, :nack, :sent]
),
Metrics.counter(
"inbound-rtcp.packets",
event_name: [Membrane.RTP, :rtcp, :arrival]
),
Metrics.sum(
"inbound-rtcp.bytes_received",
event_name: [Membrane.RTP, :rtcp, :arrival],
measurement: :bytes
),
Metrics.counter(
"outbound-rtcp.packets",
event_name: [Membrane.RTP, :rtcp, :sent]
),
Metrics.sum(
"outbound-rtcp.bytes_sent",
event_name: [Membrane.RTP, :rtcp, :sent],
measurement: :bytes
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics you've added are on a receiver (inbound) end, so I would rename them, as they might be confused with RTCP sent as sender feedback (mostly in OutboundTrackingSerializer for now). I think the pattern should rather be inbound-rtp.rtcp_sent or inbound.rtcp_sent. This resembles stats in wertc-internals where such info is aggregated under inbount-rtp and remote-inbound-rtp (we may also follow that convention).

+ label RTCP by the track type (outbound/inbound) instead of RTCP
  direction
+ Add sender report sent counter metric
Copy link
Member

@mickel8 mickel8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please upload some example report with new metrics?

Comment on lines 20 to 24
event_name: [Membrane.RTP, :packet, :arrival]
),
Telemetry.Metrics.sum(
Metrics.sum(
"inbound-rtp.bytes_received",
event_name: [Membrane.RTP, :packet, :arrival],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other events I would change this one to Membrane.RTP, :inbound, :packet, :arrival

"inbound-rtp.bytes_received",
event_name: [Membrane.RTP, :packet, :arrival],
measurement: :bytes
),
Telemetry.Metrics.last_value(
Metrics.last_value(
"inbound-rtp.encoding",
event_name: [Membrane.RTP, :inbound_track, :new],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about [Membrane.RTP, :inbound, :new]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed a little weird to me as well, but I have no idea what this will break.

Marker doesn't necessarily indicate a frame boundary. The fact that the
frame could be damaged but we would still count it would also cause
problems. `markers` is therefore more explicit about what this metric
represents.
@daniel-jodlos
Copy link
Contributor Author

After some discussions, I ended up completely reworking all RTCP metrics. They are now grouped per track and they include: NACK sent/received, PLI received (I don't think we're sending any), FIR sent/received, Receiver Reports sent/received and Sender Reports sent/received.

I also had to make a choice of where to count the specific feedback messages and I decided to count them wherever they are being handled. Eg, FIR is counted in OutboundTrackingSerializer which also sends a Keyframe Request upstream as a result of receiving FIR. The other option was to count them in one central location, which is where the overall counter is incremented, but it occurred to me that if they are counted wherever they are handled, the metric would indicate that we're loosing the packet somewhere, eg, if we have 100 NACK sent by the browser and 0 received.

Example report generated from this version:

Reporter.scrape(VideoRoomReporter) #=> %{
  :"inbound-rtp.frames" => 2856,
  :"inbound-rtp.keyframes" => 14,
  {:room_id, "test"} => %{
    {:peer_id, "44896930-ea5c-43d2-87fd-7401aabd810d"} => %{
      :"ice.binding_requests_received" => 19,
      :"ice.binding_responses_sent" => 19,
      :"ice.bytes_received" => 8120624,
      :"ice.bytes_sent" => 5478743,
      :"ice.packets_received" => 9919,
      :"ice.packets_sent" => 6243,
      :"peer.bandwidth" => 2084164.3378765725,
      :"peer.metadata" => nil,
      {:track_id,
       "44896930-ea5c-43d2-87fd-7401aabd810d:54d55d81-f6c5-4a12-8ea4-b44e649b7396:"} => %{
        "inbound-rtp.bytes_received": 160546,
        "inbound-rtp.encoding": :OPUS,
        "inbound-rtp.markers_received": 1,
        "inbound-rtp.packets": 1477,
        "inbound-rtp.ssrc": 1523586566,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 7,
        "rtcp.total_packets_received": 7,
        "rtcp.total_packets_sent": 122,
        "track.metadata": %{"active" => true, "type" => "audio"}
      },
      {:track_id,
       "44896930-ea5c-43d2-87fd-7401aabd810d:acb17d95-5f2a-46ea-87c8-6b0d1aec8111:h"} => %{
        "inbound-rtp.bytes_received": 5710780,
        "inbound-rtp.encoding": :H264,
        "inbound-rtp.markers_received": 700,
        "inbound-rtp.packets": 5166,
        "inbound-rtp.ssrc": 4010095982,
        "rtcp.fir_sent": 1,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 120,
        "rtcp.total_packets_received": 120,
        "rtcp.total_packets_sent": 6,
        "track.metadata": %{"active" => true, "type" => "camera"}
      },
      {:track_id,
       "44896930-ea5c-43d2-87fd-7401aabd810d:acb17d95-5f2a-46ea-87c8-6b0d1aec8111:l"} => %{
        "inbound-rtp.bytes_received": 362315,
        "inbound-rtp.encoding": :H264,
        "inbound-rtp.markers_received": 691,
        "inbound-rtp.packets": 744,
        "inbound-rtp.ssrc": 1335899795,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 27,
        "rtcp.total_packets_received": 27,
        "rtcp.total_packets_sent": 5,
        "track.metadata": %{"active" => true, "type" => "camera"}
      },
      {:track_id,
       "44896930-ea5c-43d2-87fd-7401aabd810d:acb17d95-5f2a-46ea-87c8-6b0d1aec8111:m"} => %{
        "inbound-rtp.bytes_received": 1852102,
        "inbound-rtp.encoding": :H264,
        "inbound-rtp.markers_received": 698,
        "inbound-rtp.packets": 1866,
        "inbound-rtp.ssrc": 1055108907,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 38,
        "rtcp.total_packets_received": 38,
        "rtcp.total_packets_sent": 5,
        "track.metadata": %{"active" => true, "type" => "camera"}
      },
      {:track_id,
       "92bf850f-ff9b-4b43-825f-d160c91656ca:72794a35-c51f-46c2-9f3c-66c668bceb22"} => %{
        "inbound-rtp.ssrc": 2593640056,
        "outbound-rtp.markers_sent": 650,
        "outbound-rtp.variant": :high,
        "outbound-rtp.variant-reason": :other,
        "rtcp.sender_reports_sent": 5,
        "rtcp.total_packets_received": 26,
        "rtcp.total_packets_sent": 5
      },
      {:track_id,
       "92bf850f-ff9b-4b43-825f-d160c91656ca:ba9d1b41-2f4e-4d6e-bb1c-7ce28126c1ae"} => %{
        "inbound-rtp.ssrc": 251360988,
        "outbound-rtp.variant": :high,
        "outbound-rtp.variant-reason": :other,
        "rtcp.sender_reports_sent": 5,
        "rtcp.total_packets_received": 6,
        "rtcp.total_packets_sent": 5
      }
    },
    {:peer_id, "92bf850f-ff9b-4b43-825f-d160c91656ca"} => %{
      :"ice.binding_requests_received" => 17,
      :"ice.binding_responses_sent" => 17,
      :"ice.bytes_received" => 7630666,
      :"ice.bytes_sent" => 5567842,
      :"ice.packets_received" => 9288,
      :"ice.packets_sent" => 6406,
      :"peer.bandwidth" => 2084966.4894894434,
      :"peer.metadata" => nil,
      {:track_id,
       "44896930-ea5c-43d2-87fd-7401aabd810d:54d55d81-f6c5-4a12-8ea4-b44e649b7396"} => %{
        "inbound-rtp.ssrc": 3034914721,
        "outbound-rtp.variant": :high,
        "outbound-rtp.variant-reason": :other,
        "rtcp.sender_reports_sent": 5,
        "rtcp.total_packets_received": 5,
        "rtcp.total_packets_sent": 5
      },
      {:track_id,
       "44896930-ea5c-43d2-87fd-7401aabd810d:acb17d95-5f2a-46ea-87c8-6b0d1aec8111"} => %{
        "inbound-rtp.ssrc": 1784617135,
        "outbound-rtp.markers_sent": 660,
        "outbound-rtp.variant": :high,
        "outbound-rtp.variant-reason": :other,
        "rtcp.sender_reports_sent": 5,
        "rtcp.total_packets_received": 25,
        "rtcp.total_packets_sent": 5
      },
      {:track_id,
       "92bf850f-ff9b-4b43-825f-d160c91656ca:72794a35-c51f-46c2-9f3c-66c668bceb22:h"} => %{
        "inbound-rtp.bytes_received": 5360183,
        "inbound-rtp.encoding": :H264,
        "inbound-rtp.markers_received": 653,
        "inbound-rtp.packets": 4759,
        "inbound-rtp.ssrc": 746097665,
        "rtcp.fir_sent": 1,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 116,
        "rtcp.total_packets_received": 116,
        "rtcp.total_packets_sent": 6,
        "track.metadata": %{"active" => true, "type" => "camera"}
      },
      {:track_id,
       "92bf850f-ff9b-4b43-825f-d160c91656ca:72794a35-c51f-46c2-9f3c-66c668bceb22:l"} => %{
        "inbound-rtp.bytes_received": 352955,
        "inbound-rtp.encoding": :H264,
        "inbound-rtp.markers_received": 651,
        "inbound-rtp.packets": 712,
        "inbound-rtp.ssrc": 3519910266,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 26,
        "rtcp.total_packets_received": 26,
        "rtcp.total_packets_sent": 5,
        "track.metadata": %{"active" => true, "type" => "camera"}
      },
      {:track_id,
       "92bf850f-ff9b-4b43-825f-d160c91656ca:72794a35-c51f-46c2-9f3c-66c668bceb22:m"} => %{
        "inbound-rtp.bytes_received": 1749322,
        "inbound-rtp.encoding": :H264,
        "inbound-rtp.markers_received": 651,
        "inbound-rtp.packets": 1778,
        "inbound-rtp.ssrc": 2961107475,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 38,
        "rtcp.total_packets_received": 38,
        "rtcp.total_packets_sent": 5,
        "track.metadata": %{"active" => true, "type" => "camera"}
      },
      {:track_id,
       "92bf850f-ff9b-4b43-825f-d160c91656ca:ba9d1b41-2f4e-4d6e-bb1c-7ce28126c1ae:"} => %{
        "inbound-rtp.bytes_received": 133465,
        "inbound-rtp.encoding": :OPUS,
        "inbound-rtp.markers_received": 1,
        "inbound-rtp.packets": 1379,
        "inbound-rtp.ssrc": 54295950,
        "rtcp.receiver_reports_sent": 5,
        "rtcp.sender_reports_received": 6,
        "rtcp.total_packets_received": 6,
        "rtcp.total_packets_sent": 115,
        "track.metadata": %{"active" => true, "type" => "audio"}
      }
    }
  }
}

Also, making note of some bugs:

  • outbound tracks have inbound-rtp.ssrc metric and it seems a little weird
  • there are 2 metrics that ended up on the top-level and I think they originate from RTC Engine

Copy link
Member

@mickel8 mickel8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report starts looking really nice! Also, great work with RTCP metrics. Now we get so much useful info 👀

A couple of notes:

  • Looking at RTC-93 some metrics are still missing:
    • RTX has been merged so would be nice to add metric about number of retransmitted packets
    • outgoing RTP packets
    • outgoing RTP bytes
    • incoming RTCP bytes
    • outgoing RTCP bytes
    • incoming padding packets
  • when some event doesn't occur its metric is not present in the report at all. Would be nice to initilize metrics with default values
  • maybe instead of identifying metrics as inbound or outbound we should identify track as inbound or outbound and keep metric name generic when possible. This way we would have
{:track_id, id} => {
  type: :outbound,
  ssrc: 1234
  packets: 1000,
  fir: 100
}

We might consider getting rid of RTCP bytes as we have detailed numbers about specific RTCP messages but I belive that such a metric can still be useful

Copy link
Member

@mickel8 mickel8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more concern. Currently total RTCP packets are counted in multiple places e.g. in SSRCRouter and OutboundTrackingSerializer. Both processes register as owners of the same event and I am not sure if that's correct.

If that's correct, then I think we should go further and increment total RTCP packets in every place that some specific RTCP messag is sent or received.
With current RTP plugin complexity it's very easy either not to count in some metric in total statistics or count it in two or more times

@daniel-jodlos
Copy link
Contributor Author

daniel-jodlos commented Feb 28, 2023

  • incoming RTCP bytes

This metric may fall victim to the architecture of the RTP Plugin. This value is known in RTCP Parser, but it has no way of knowing to which track this particular RTCP belongs. SSRC Router can know that information, but it receives the RTCP in a parsed form, so it doesn't know the size 🤷 For symmetry, I didn't include outgoing RTCP bytes

when some event doesn't occur its metric is not present in the report at all. Would be nice to initilize metrics with default values

I agree, but this is how the library works, but that's the wrong PR, this needs to be done in membrane_telemetry_metrics

maybe instead of identifying metrics as inbound or outbound we should identify track as inbound or outbound and keep metric name generic when possible. This way we would have

I was thinking about that as well, but I came to a conclusion that this is outside the scope of this Pull Request

@daniel-jodlos
Copy link
Contributor Author

daniel-jodlos commented Feb 28, 2023

Both processes register as owners of the same event and I am not sure if that's correct.

It's fine, we do that in a lot of places. We can only have one owner process of the {metric, label} pair, but since these processes hold different labels, I don't think this causes any problems

@mickel8
Copy link
Member

mickel8 commented Feb 28, 2023

Ok, then let's do those two things (metric default value and some renamings) in different PRs. Could you please continue this topic?

@daniel-jodlos daniel-jodlos requested a review from mickel8 March 2, 2023 08:57
Copy link
Member

@mickel8 mickel8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other metrics I mentiond in the previous CR?

@daniel-jodlos daniel-jodlos requested a review from mickel8 March 6, 2023 05:58
links_generator = fn twcc? ->
links_generator = fn twcc?, ctx ->
ssrc_router_pad_options = [
encoding: :rtx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that needed? It looks like SSRC router doesn't make any use of encoding...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Only to emit new_inbound_track_event but it does

@daniel-jodlos daniel-jodlos requested a review from mickel8 March 10, 2023 09:51
mickel8
mickel8 previously approved these changes Mar 10, 2023
Comment on lines 353 to 354
if packet_size == 254,
do: Membrane.TelemetryMetrics.execute(@padding_packet_arrival_event, %{}, %{}, label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is not the correct way to detect padding packets, it will only work for encrypted packets, regular RTP will have the padding stripped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it will work a little too well 🤦 It will also count non-padding packets that happen to have that exact size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this metric for now and get this Pull Request through without it.

Given that the size is not enough to determine if the packet is a padding packet AND the padding also seems to be encrypted, we need something that lives after the decryptor to detect padding packets.
I'm also not convinced it would be a very useful metric on its own in the RTP plugin outside of WebRTC, so I'm considering implementing it in the RTC Engine. Buffers that pass through RTC Engine are decrypted and they have their headers removed, so it should be a simple matter of byte_size(packet) == 0 || packet == <<0::binary-size(254), 255::8>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not convinced it would be a very useful metric on its own in the RTP plugin outside of WebRTC, so I'm considering implementing it in the RTC Engine.

This sounds like a good idea, padding packets seem specific to WebRTC, but probably not possible without refactoring - after inbound RTX has been added, the padding packets are sent with retransmission SSRC and RTXParser drops them (they cannot be parsed as they do not contain a payload with the original sequence number)

Co-authored-by: Bartosz Błaszków <[email protected]>
%{},
state.telemetry_label
)
emit_sender_report_sent_telemetry_event(state)
Copy link
Member

@bblaszkow06 bblaszkow06 Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This function name is invalid (it sends @sender_report_received_telemetry_event)
  2. I rather had this in mind:
Suggested change
emit_sender_report_sent_telemetry_event(state)
emit_telemetry_event(@sender_report_received, state)

This way the private function could extract the common part. Now you still have duplication, although grouped. However, I won't insist on that change

@daniel-jodlos daniel-jodlos merged commit 568194c into master Mar 15, 2023
@daniel-jodlos daniel-jodlos deleted the RTC-93-metrics branch March 15, 2023 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants