Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP API #1

Merged
merged 3 commits into from
Sep 11, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 181 additions & 6 deletions api.bs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Shortname: Attribution
Repository: private-attribution/api
URL: https://private-attribution.github.io/api/
Editor: Martin Thomson, w3cid 68503, Mozilla https://mozilla.org/, [email protected]
Editor: Andy Leiserson, w3cid 147715, Mozilla https://mozilla.org/, [email protected]
Abstract: This specifies a browser API for the measurement of advertising performance. The goal is to produce aggregate statistics about how advertising leads to conversions, without creating a risk to the privacy of individual web users. This API collates information about people from multiple web origins, which could be a significant risk to their privacy. To manage this risk, the information that is gathered is aggregated using an aggregation service that is chosen by websites and trusted to perform aggregation within strict limits. Noise is added to the aggregates produced by this service to provide differential privacy.
Status Text: This specification is a proposal that is intended to be migrated to the W3C standards track. It is not a standard.
Text Macro: LICENSE <a href=http://www.w3.org/Consortium/Legal/2015/copyright-software-and-document>W3C Software and Document License</a>
Expand Down Expand Up @@ -97,23 +98,47 @@ New additions to the

TODO explain why we use histograms

* Compatibility with privacy-preserving aggregation systems
* Flexibility to assign buckets

* As histogram size increases, noise becomes a problem


# Overview of Operation # {#overview}

At impression time, information about an advertisement is saved by the browser in a write-only store.
This includes an identifier for the ad and some metadata about the ad,
such as whether the impression was an ad view or an ad click.
The private attribution API provides aggregate information about the
association between two classes of events: [=impressions=] and [=conversions=].

An <dfn>impression</dfn>, sometimes called a *source event*, is the
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
event to which [=conversion=]s are being attributed. Selection of impression
events is left to the consumer of the API. Examples include:

* Displaying an advertisement to a user.
* Viewing a particular web page.

A <dfn>conversion</dfn>, sometimes called a *trigger event*, is the
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
event being attributed to [=impression=]s. Selection of conversion events
is again left to the consumer of the API. Examples include:

* Signing up for an account.
* Making a purchase.
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved

At conversion time, information for aggregation is created based on the impressions that were previously stored.
A site can request that the browser select impressions based on a simple query.
When an [=impression=] occurs, information about the impression is saved by the
browser in a write-only store. This includes an identifier for the impression
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
and some metadata about the impression, such as whether the impression was an
ad view or an ad click.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ad view or an ad click.
ad view or an ad click.

The write-only piece is negotiable. I can see why a site couldn't modify its store of impressions, but it's also not really necessary if the reading by other sites is controlled properly.

The whole business with "what does the e-Privacy Directive say about storing information" is silly. It matters, of course, but not every regulatory system will grow a PECR-like piece of baggage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write-only term was actually in the original and I also had some doubts about it. I think it's an oversimplification. We should discuss the characteristics of the store somewhere (I think I added a section for that), but it takes more than just saying "write only" to capture the important characteristics.


At [=conversion=] time, information for aggregation is created based on the
impressions that were previously stored. A site can request that the browser
select impressions based on a simple query.

* If there was no matching impression,
or the [=privacy budget=] for the site is exhausted,
a histogram consisting entirely of zeros (0) is constructed.

* If a matching impression is found,
the specified value is added to a histogram
at the bucket that was specified for the ad at the time of the impression.
at the bucket that was specified at the time of the impression.
All other buckets are set to zero.

The resulting histogram is prepared for aggregation according to the requirements
Expand Down Expand Up @@ -142,8 +167,147 @@ The aggregation service:

# API Details # {#api}

Open questions:
* Filter/query language
* Reports are sent to aggregation system directly, or via conversion site? Or
option of either? => via conversion site
* Epochs

TODO

## ListAggregationSystems API ## {#list-aggregation-systems-api}

navigator.privateAttribution.listAggregationSystems()

<xmp class=idl>
dictionary PrivateAttributionAggregationSystem {
required DOMString id;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Work out whether a PrivateAttributionAggregationSystem (wow, right name, hard to type) needs any additional attributes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we can omit the PrivateAttribution prefix from these definitions? I'm not sure where all they might flow to and whether there is namespacing other than from the prefix.

I ask because I think PrivateAttributionImpressionOptions needs to be further refined as something like PrivateAttributionImpressionOptionsLevel1 (to specify the bag of attributes for the level 1 API), which is really quite a mouthful.

</xmp>

## SaveImpression API ## {#save-impression-api}

<pre>
navigator.privateAttribution.saveImpression({
type: "view", // either "view" or "click"
index: 3, // the histogram index for counting this impression
ad: "sample-campaign-eijb", // a unique identifier for the ad placement
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
target: "advertiser.example", // the advertiser site where a conversion will occur
});
</pre>

Add:
* attribution system
* TTL
* DP parameters

Questions:
* Revisit the set of impression types. Can we get rid of it, and put it in the
ad ID? Or generalize to "attribution constraint"?
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved

<xmp class=idl>
enum PrivateAttributionImpressionType { "view", "click" };

andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
dictionary PrivateAttributionImpressionOptions {
PrivateAttributionImpressionType type = "view";
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
required unsigned long index;
required DOMString ad;
required DOMString target;
};

[SecureContext, Exposed=Window]
interface PrivateAttribution {
[Throws] undefined saveImpression(DOMString aggregationSystemId, PrivateAttributionImpressionOptions options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Throws] undefined saveImpression(DOMString aggregationSystemId, PrivateAttributionImpressionOptions options);
[Throws] undefined saveImpression(PrivateAttributionImpressionOptions options);

Given that we have a bucket of fields, some of which are mandatory, let's move the aggregation system into the bucket.

Except... We don't need to know which system is being used, do we? I don't think we care: budgeting happens at the point of conversion, so we probably don't care where conversion data is sent as long as the aggregation system is trustworthy. But at impression time, we really don't care. Yet. In the full API we will. But the bag of attributes method lets us solve that problem neatly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't strictly need it here in the early binding case, but given that we expect to have it eventually, it seems simpler to include it. It seems better to encourage users to register an impression for each aggregator than to set a precedent of supporting the model where the same impression can be referenced by multiple different aggregators at conversion time. There isn't a lot of complexity in the impression data right now, but what happens if the aggregators want to set conditions (like a limit on impression TTL, or the structure of ad IDs)?

};
</xmp>

Implicit saveImpression API inputs:
* Timestamp (epoch?)
* Source site


### Operation ### {#save-impression-api-operation}

1. Validate inputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Validate inputs
To <dfn>save an impression</dfn> given `impressionOptions`:
1. Validate inputs.

2. If the private attribution API is not enabled, discard the impression data.
3. Save the impression to the store.
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved


## MeasureConversion API ## {#measure-conversion-api}

TODO:
* Add conversion value
* Change filter data


navigator.privateAttribution.measureConversion({
// the number of buckets in the histogram
"size": 20,
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved

// only consider impressions within the last N days
lookbackDays: 30,
// the type of impression to match against (if omitted, match either)
impression: "view",
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
// a list of possible ad identifiers that can be attributed
ads: ["sample-campaign-eijb"],
// a list of sites where impressions might have been registered
source: ["publisher.example"]
});


<xmp class=idl>
dictionary PrivateAttributionConversionOptions {
required unsigned long histogramSize;
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved

unsigned long lookbackDays = Infinity;
PrivateAttributionImpressionType impression;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PrivateAttributionImpressionType impression;
required DOMString aggregationSystemId;

Now, I don't think that this is a string that someone will be happy typing often:

let whatsit = await privateAttribution.measureConversion({aggregationSystemId: "joe", ...});

That's already unwieldy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about shortening it to aggregator?

sequence<DOMString> ads = [];
sequence<DOMString> sources = [];
};

[SecureContext, Exposed=Window]
partial interface PrivateAttribution {
[Throws] ArrayBufferView measureConversion(DOMString aggregationSystemId, PrivateAttributionConversionOptions options);
andyleiserson marked this conversation as resolved.
Show resolved Hide resolved
};
</xmp>


Implicit MeasureConversion API inputs:
* Timestamp (epoch?)
* Target site

### Operation ### {#measure-conversion-api-operation}

1. Validate inputs
2. Set reportedConversionValue = 0.
3. If the private attribution API is enabled, search for a matching impression.
4. If a matching impression was found:
1. Set histogramIndex to the value from the matching impression
2. set reportedConversionValue to the smaller of the following:
1. The conversion value passed to the MeasureConversion API.
2. The limit on conversion value determined by the remaining privacy budget.
5. Update the privacy budget store to reflect the reported conversion value.
6. Construct a report from reportedConversionValue, histogramIndex, and histogramSize.
7. Encrypt the report.
8. Return the encrypted report.


## Impression database ## {#impression-database}



## User control and visibility ## {#user-control}

* Users should be able to opt out. Opt out should be undetectable.
* User ability to view the impression store.

# Implementation Considerations # {#implementation-considerations}

* Management and distribution of values for the following:
* Histogram size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things on this that are somewhat in tension:

  1. When it comes down to it, the browser should really care how big histograms are. Sites get to do basically whatever with CPU and network resources and it's hardly going to be that bad if a site wants a really big histogram.
  2. I don't like the idea of really big histograms that basically make a mockery of our aggregation aspirations.

However, I think that the resolution of that tension is in the MPC (DAP specifically). We might want to ask aggregation service operators to limit histogram size as some function of the number of reports they aggregate. (I still somewhat like the idea of refusing to return a value if an aggregate is smaller than some threshold, even though it is perverse from a strictly DP interpretation perspective.) If the aggregator supports more, then the API should not be where limits are applied. Also, different browsers will have different views on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using DAP, then doesn't the browser need to know the histogram size to construct the report?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'll need to know what size to make. We'll ask the site to give us the size they want.

But we don't need to validate or limit that. If it is the wrong size, that's their problem.

* Target site for impressions
* Source site for conversions
* Ad IDs

# Aggregation # {#aggregation}

Expand Down Expand Up @@ -185,6 +349,17 @@ TODO

TODO

* Browser security
* Clearing of impression store
* Partitioning of impression store
* Interaction with private browsing modes
* Interaction with telemetry opt-outs
* Timing attacks on APIs

* Aggregation system security

* Fraud and abuse


# Acknowledgements # {#ack}

Expand Down