-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3401: Native Group VoIP Signalling #3401
base: main
Are you sure you want to change the base?
Changes from 1 commit
05fd5af
7f5ee49
083fd9a
5ee96fb
b90b85e
ed37a0d
33a64f2
7fd1ba6
669d471
48526ad
dfd4ffe
3c306cc
4d43aae
856ddc7
d109b54
07f9547
7a06ed7
32f566a
3fde32b
05b5db2
43dc42f
5635cee
b8ebe27
6b98d66
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,7 +107,11 @@ The fields within the item in the `m.calls` contents are: | |
|
||
* `m.call_id` - the ID of the conference the user is claiming to participate in. If this doesn't match an unterminated `m.call` event, it should be ignored. | ||
* `m.foci` - Optionally, if the user wants to be contacted via an SFU rather than called directly (either 1:1 or full mesh), the user can also specify the SFUs their client(s) are connecting to. | ||
* `m.sources` - Optionally, the user can list the various media streams (and tracks within the streams) they are able to send. This is important if connecting to an SFU, as it lets the SFU know what simulcast tracks the sender can send. In theory the offered SDP should include this, but if we are multiplexing all streams into the same SDP it seems likely that this will get lost, hence publishing it here. If the conference has no SFU, this list defines the devices which other devices should connect to full-mesh in order to participate. | ||
* `m.devices` - The list of the member's active devices in the call. A member may join from one or more devices at a time, but they may not have two active sessions from the same device. Each device contains the following properties: | ||
* `device_id` - The device id to use for to-device messages when establishing a call | ||
* `session_id` - A unique identifier used for resolving duplicate sessions from a given device. When the `session_id` field changes from an incoming `m.call.member` event, any existing calls from this device in this call should be terminated. `session_id` should be generated once per client session on application load. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have trouble understanding what this is about. Is it perhaps about protecting against for example a user opening their same element web session in multiple tabs? In any case, it might be good to spell out more explicitly what exactly what this is supposed to guard against. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's mostly to deal with users hitting the refresh button I believe, so we can ignore anything from previous instances of the app and terminate calls with the old instance when we see a new one. |
||
* `feeds` - Contains an array of feeds the member is sharing and the opponent member may reference when setting up their WebRTC connection. | ||
* `purpose` - Either `m.usermedia` or `m.screenshare` otherwise the feed should be ignored. | ||
|
||
For instance: | ||
|
||
|
@@ -123,88 +127,49 @@ For instance: | |
"@sfu-lon:matrix.org", | ||
"@sfu-nyc:matrix.org", | ||
], | ||
"m.sources": [ | ||
"m.devices": [ | ||
{ | ||
"id": "qegwy64121wqw", // WebRTC MediaStream id | ||
"purpose": "m.usermedia", | ||
"name": "Webcam", // optional, just to help users understand what multiple streams from the same person mean. | ||
"device_id": "ASDUHDGFYUW", // just in case people ending up dialing this directly for full mesh or 1:1 | ||
"audio": [ | ||
"device_id": "ASDUHDGFYUW", // Used to target to-device messages | ||
"session_id": "GHKJFKLJLJ", // Used to resolve duplicate calls from a device | ||
"feeds": [ | ||
{ | ||
"id": "zbhsbdhwe", // WebRTC MediaStreamTrack id | ||
"settings": { // WebRTC MediaTrackSettings object | ||
"channelCount": 2, | ||
"sampleRate": 48000, | ||
"m.maxbr": 32000, // Matrix-specific extension to advertise the max bitrate of this track | ||
} | ||
"purpose": "m.usermedia" | ||
// TODO: Add tracks | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What info do we need in here to describe tracks? Maybe just an array containing the track There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i was envisaging stealing the mediastream description straight of WebRTC. So the |
||
// TODO: Available bitrates etc. should be listed here | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How should we describe the available audio/video streams that clients can request via the SFU datachannel? |
||
}, | ||
], | ||
"video": [ | ||
{ | ||
"id": "zbhsbdhzs", | ||
"settings": { | ||
"width": 1280, | ||
"height": 720, | ||
"facingMode": "user", | ||
"frameRate": 30.0, | ||
"m.maxbr": 512000, | ||
} | ||
}, | ||
{ | ||
"id": "zbhsbdhzx", | ||
"settings": { | ||
"width": 320, | ||
"height": 240, | ||
"facingMode": "user", | ||
"frameRate": 15.0, | ||
"m.maxbr": 64000, | ||
} | ||
}, | ||
], | ||
"mosaic": {}, // for composited video streams? | ||
}, | ||
{ | ||
"id": "suigv372y8378", | ||
"name": "Screenshare", // optional | ||
"purpose": "m.screenshare", | ||
"device_id": "ASDUHDGFYUW", | ||
"video": [ | ||
{ | ||
"id": "xbhsbdhzs", | ||
"settings": { | ||
"width": 3072, | ||
"height": 1920, | ||
"cursor": "moving", | ||
"displaySurface": "monitor", | ||
"frameRate": 30.0, | ||
"m.maxbr": 768000, | ||
} | ||
}, | ||
"purpose": "m.screenshare" | ||
// TODO: Add tracks | ||
// TODO: Available bitrates etc. should be listed here | ||
} | ||
] | ||
}, | ||
} | ||
] | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
This builds on MSC #3077, which describes streams in `m.call.*` events via a `sdp_stream_metadata` field, but providing the full set of information needed for all devices in the room to know what streams are available in the group call without having to independently discover them from the SFU. | ||
This builds on MSC #3077, which describes streams in `m.call.*` events via a `sdp_stream_metadata` field, but providing the full set of information needed for all devices in the room to know what feeds are available in the group call without having to independently discover them from the SFU. | ||
SimonBrandner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
It's acceptable to advertise rigid formats here rather than dynamically negotiating resolution, bitrate etc, as in a group call we should just pick plausible desirable formats rather than try to please everyone. | ||
** TODO: Add tracks field ** | ||
** TODO: Add bitrate/format fields ** | ||
|
||
If a device loses connectivity, it is not particularly problematic that the membership data will be stale: all that will happen is that calls to the disconnected device will fail due to media or data-channel keepalive timeouts, and then subsequent attempts to call that device will fail. Therefore (unlike the earlier demos) we don't need to spot timeouts by constantly re-posting the state event. | ||
Clients should do their best to ensure that calls in `m.call.member` state are removed when the member leaves the call. However, there will be cases where the device loses network connectivity, power, the application is forced closed, or it crashes. If the `m.call.member` state has stale device data the call setup will fail. Clients should re-attempt invites up to 3 times before giving up on calling a member. | ||
|
||
### Call setup | ||
|
||
Call setup then uses the normal `m.call.*` events, except they are sent over to-device messages to the relevant devices (encrypted via Olm). This means: | ||
robertlong marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess the idea here is that clients should identify the sender (user_id and device_id) of the to_device through the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've updated the spec to include this now, we send the sender's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can read about Perhaps the current impl doesn't encrypt with olm yet, but does it make sense to spec that? Is there a good reason to offer a non-encrypted version of the signalling? |
||
|
||
* When initiating a 1:1 call, the `m.call.invite` is sent to `*` devices of the intended target user. | ||
* Once the user answers the call from the device, the sender should rescind the other pending to-device messages, ensuring that other devices don't get spammed about long-obsolete 1:1 calls. XXX: We will need a way to rescind pending to-device msgs. | ||
* Subsequent candidates and other events are sent only to the device who answered. | ||
* XXX: do we still need MSC2746's `party_id` and `m.call.select_answer`? | ||
* We will need to include the `m.call_id` and room_id so that peers can map the call to the right room. | ||
* However, especially for 1:1 calls, we might want to let the to-device messages flow and cause the client to ring even before the `m.call` event propagates, to minimise latency. Therefore we'll need to include an `m.intent` on the `m.call.invite` too. | ||
* When initiating a 1:1 call, the `m.call.invite` is sent to the devices listed in `m.call.member` event's `m.devices` array using the `device_id` field. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we are using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i was assuming that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, right. So send to-device messages to each of the users and use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't the ringing happen by the mere fact that there is a non-terminated m.call state event with that intent? The to_device messages only start flowing once there are at least two There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A "1:1 call" does not seem to be well defined within the context of this MSC. I assume what is meant here is a
SimonBrandner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `m.call.*` events sent via to-device messages should also include the following properties in their content: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we seem to have completely missed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a dumb question here: does it mean that the order of the To-Device messages is not guaranteed? I'm asking since I've processed the To-Device messages on the SFU under the assumption that they come at the same order in which they were sent by the client. If the order is not guaranteed, it may cause some interesting (undesired) effects, e.g. if the "Invite -> Hangup" sequence comes as "Hangup -> Invite" on a server, the end effect will not be what the user expects. Or There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it's not guaranteed and we should rely on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oh, that's interesting. Should we document how to deal with it and what's the semantics? - I've noticed that a new commit has been pushed recently to add the It also has some practical implications: how are we (as receivers) expected to handle it in a proper way? - I.e. imagine that we receive a "New ICE candidates message" on the SFU with a However, this poses certain questions, namely if we're communicating with 1000 devices (SFU use case), this means we would need to store the Another issue is that the sender can attack the receiver by sending a message with |
||
* `conf_id` - The group call id listed in `m.call` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to use a name other than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, much prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently the to_device events sent between members of the group calls also have a It would also be good to use the same property name in both the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imo on top of the raised mentioned improvements, this proposal and eventual resulting spec should stick to the wording "call" even in text to clarify that "conference" is not another concept at work here. |
||
* `dest_session_id` - The recipient's session id. Incoming messages with a `dest_session_id` that doesn't match your current session id should be discarded. | ||
* In addition to the fields above `m.call.invite` events sent via to-device messages should include the following properties : | ||
* `device_id` - The message sender's device id. Used by the opponent member to send response to-device signalling messages even if the `m.call.member` event has not been received yet. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. empirically EC seems to expect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EC also sends device_id, sender_session_id and dest_session_id for every toDevice event: https://github.com/matrix-org/matrix-js-sdk/blob/353d6bab47ab928aab089e897f5475942fcfa0ac/src/webrtc/call.ts#L2008-L2010 |
||
* `sender_session_id` - Like the `device_id` the `sender_session_id` is used by the opponent member to filter out messages unrelated to the sender's session even if the `m.call.member` event has not been received yet. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I need a better way of explaining both the device_id and sender_session_id here. |
||
* For 1:1 calls, we might want to let the to-device messages flow and cause the client to ring even before the `m.call` event propagates, to minimise latency. Therefore we'll need to include an `m.intent` on the `m.call.invite` too. | ||
* When initiating a group call, we need to decide which devices to actually talk to. | ||
* If the client has no SFU configured, we try to use the `m.foci` in the `m.call` event. | ||
* If there are multiple `m.foci`, we select the closest one based on latency, e.g. by trying to connect to all of them simultaneously and discarding all but the first call to answer. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably ought to be
m.conf_id
to differentiate it from IDs of 1:1 calls and match the conf_id field in m.call.* to-device events?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous discussion at https://github.com/matrix-org/matrix-spec-proposals/pull/3401/files#r823313876
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: currently the
call_id
andconf_id
are not identical. This seems to be confusing if we're talking about the SFU calls (not sure how it's handled in a full-mesh).When working on an SFU recently, I realized that
conf_id
was the ID of a conference (or a call if you will) which was quite logical and expected. However, what I did not expect is that in addition to theconf_id
, each To-Device message has acall_id
which does not match theconf_id
and which seems to be uniquely generated by each participant.The thing is:
call_id
field does not make any sense for the SFU at the moment (see the SFU MSC), since the SFU does not know what thecall_id
is (it looks like a randomly generated string that is different for each participant who tries to join a conference), but at the same time, the SFU is essentially obligated to store thecall_id
because the To-Device messages from the SFU to the participants are expected to have thecall_id
that matches thecall_id
value sent from participants to the SFU when they contact the SFU (I tried settings thecall_id
to matchconf_id
when sending a message from the SFU to the client, but the client discarded the message if thecall_id
did not match thecall_id
that the client sent to the SFU). So essentially, there is aconf_id
the semantics of which is defined (it's the unique ID of a conference/call) and thecall_id
(which does not have any meaning for the SFU).