Supporting batch uploads from the client (and routing reports through the collector) #64

csharrison · 2021-06-30T22:25:06Z

In today's design call we discussed the collector receiving encrypted reports from clients and forwarding them to the leader. This aligns with the design we have in the WICG with some of the reasoning documented here.

I also brought this up for discussion on our regular calls in the WICG (minutes). Where there was some agreement that this was a good idea.

Pros / Cons of routing reports through the collector

These are probably non-exhaustive.
Pros:

Doesn't require aggregation servers to be highly online / available
Supports graceful failure ("If something goes wrong, we could re-query")
Gives some indication that "the API is working" on the server without needing to wait until query time, or via some other side-channel.
Distributes state out of the aggregation servers (% protection from replay attacks). Arguably this aligns more in our API with who "owns" the data at some fundamental level.
Adds query flexibility "for free" without explicitly adding support in the aggregation servers by allowing querying subsets of reports (for instance)
Allows support for some level of report authentication by the collector, who (in our model) is the entity that is best in the position to validate reports. This could be done completely outside the protocol.

Cons:

Adds query flexibility, which could be detrimental to privacy
Leaks some metadata about each request that may otherwise be only visible to the leader (e.g. ip address), unless using some anonymizing proxy
Introduces a new vector for replay attacks

Protocol solution

It seems there is a fairly simple solution to this problem, and that is to simply:

Instantiate the protocol where the collector is also the client, where the interactions between the "real clients" and the "client/collector" is unspecified by the protocol.
Allow the "client" to optionally batch upload reports in the protocol rather than sending them one by one.

In the existing protocol there is no client authentication so it is technically possible to have a collector that just collects encrypted reports from clients and forwards them on to the leader. Of course the actual clients would need to be set up to do this but it is permitted by the protocol. By allowing batch uploads we just optimize this already-permitted configuration.

Alternatively, if we deem collector-clients to be bad for the protocol, we ought to have a mechanism which actually forbids them (e.g. by authenticating clients). However, I think that it is pretty reasonable to have this allowed by the protocol and leave it up to specific instantiations how the "client" is configured/trusted.

tgeoghegan · 2021-07-02T21:34:19Z

Gives some indication that "the API is working" on the server without needing to wait until query time, or via some other side-channel.

Can you elaborate on this? Is the idea here that when a client (that is, a real end user client, not a batching one) gets a 200 OK after posting a report to a batching client, the client can be assured that its report has been durably persisted somewhere? I think a leader server could provide a similar guarantee at the end of the upload phase so i'm trying to understand what extra assurances the batching client provides.

Adds query flexibility "for free" without explicitly adding support in the aggregation servers by allowing querying subsets of reports (for instance)

IIUC the query flexibility is because the batching client can submit the same reports multiple times, in different-sized batches. If we decide this query flexibility is bad, we could mitigate this by having the original client include a report timestamp in the encrypted input, where it can't be tampered with by the batching client. Aggregators would then maintain query/privacy budgets per aggregation window and would be able to refuse queries on reports that fall in an aggregation window whose budget is already spent.

csharrison · 2021-07-07T16:51:10Z

Can you elaborate on this? Is the idea here that when a client (that is, a real end user client, not a batching one) gets a 200 OK after posting a report to a batching client, the client can be assured that its report has been durably persisted somewhere? I think a leader server could provide a similar guarantee at the end of the upload phase so i'm trying to understand what extra assurances the batching client provides.

I think the use-case is more that the collector is assured the system is working without requiring an interaction with the helpers. It is possible this case could be met by introducing some "do I have some reports" functionality though.

IIUC the query flexibility is because the batching client can submit the same reports multiple times, in different-sized batches. If we decide this query flexibility is bad, we could mitigate this by having the original client include a report timestamp in the encrypted input, where it can't be tampered with by the batching client. Aggregators would then maintain query/privacy budgets per aggregation window and would be able to refuse queries on reports that fall in an aggregation window whose budget is already spent.

I think this is one part of it. There are a few ways this batching introduces flexibility even if reports can only be queried once. Mainly this is via separating / combining reports across multiple in-the-clear dimensions (in our design we give some info in the clear like the advertiser site a user converted on). A collector could combine multiple small advertisers reports together if they are too small to receive aggregate data. This is recoverable with a robust query model in the helpers though it adds complexity.

Another example along these lines is time-based querying. One collector might want data on hour boundaries, another might want on 4 hour boundaries etc.

cjpatton · 2021-07-14T18:00:43Z

Closed the PR, but here's where we left the discussion: #78 (comment)

cjpatton · 2023-09-20T15:19:58Z

Seems like the protocol already has everything needed to address this issue. In a combined Collector-Leader deployment, the details of the upload protocol in the spec can probably just be disregarded. What matters for interop in that case is the aggregation and collection flows running between Leader and Helper.

Closing as "won't fix". @csharrison please feel free to re-open if there's more to discuss.

csharrison changed the title ~~Supporting batch~~ Supporting batch uploads from the client (and routing reports through the collector) Jun 30, 2021

tgeoghegan mentioned this issue Jul 2, 2021

Encrypt to the leader as well #69

Closed

cjpatton mentioned this issue Jul 12, 2021

Allow multiple reports in the same upload request #78

Closed

cjpatton added the parking-lot Parking lot for future discussions label Jul 14, 2021

tgeoghegan mentioned this issue Jul 21, 2021

Revisiting the security model #86

Closed

cjpatton mentioned this issue Jul 26, 2021

Discuss client attestation in security considerations #89

Closed

csharrison mentioned this issue Jan 19, 2022

Consider making the Leader a non-aggregator #166

Closed

cjpatton mentioned this issue Jan 26, 2022

Consider allowing clients to upload mulitple reports in a single message to aggregators #188

Closed

tgeoghegan mentioned this issue Jul 28, 2022

Protocol text should discuss ingestion servers/anonymizing proxies #294

Closed

cjpatton closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2023

akoshelev mentioned this issue Jan 23, 2025

Support batched submissions in DAP #647

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting batch uploads from the client (and routing reports through the collector) #64

Supporting batch uploads from the client (and routing reports through the collector) #64

csharrison commented Jun 30, 2021

tgeoghegan commented Jul 2, 2021

csharrison commented Jul 7, 2021 •

edited

Loading

cjpatton commented Jul 14, 2021

cjpatton commented Sep 20, 2023

Supporting batch uploads from the client (and routing reports through the collector) #64

Supporting batch uploads from the client (and routing reports through the collector) #64

Comments

csharrison commented Jun 30, 2021

Pros / Cons of routing reports through the collector

Protocol solution

tgeoghegan commented Jul 2, 2021

csharrison commented Jul 7, 2021 • edited Loading

cjpatton commented Jul 14, 2021

cjpatton commented Sep 20, 2023

csharrison commented Jul 7, 2021 •

edited

Loading