Skip to content

Conversation

@bmcase
Copy link
Contributor

@bmcase bmcase commented Sep 4, 2025

With @roxanageambasu and @tholop we have put together some text for the spec that states what assumptions are used in the system model in which the formal DP analysis is done.

The referenced paper is already available, but a few claims in the PR are not yet reflected there. We will update the arXiv version to align with the PR shortly. In the meantime, we confirm that the statements are accurate and supported by analyses we already have internally.


Preview | Diff

@martinthomson martinthomson added the discuss Needs working group discussion label Sep 4, 2025
Comment on lines 2355 to 2362
Assumption 1 is necessary because the system involves multiple sites that could interact
with the same user over time and change the ads they show to the user, or impact the
conversions the user has, based on each other’s DP measurements. For example, if one advertiser
learns, from DP measurements, to make an ad more effective, a user may convert on their site
rather than a competitor’s. In this case, the first site’s DP outputs -- counted only against
its own per-site budget -- alter the data (or absence of data) visible to the competitor, yet
this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated,
the analysis shows that per-site guarantees cannot be achieved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the assumption, but I think that the main challenge here is different. Sites might gain an understanding that a particular visitor to each site in a set is the same person (due to federated login, same email address, or anything including navigation tracking which we can't stop). AND THEN they decide to pool their per-site budgets to use the API to extract more information about that person. In that case, we have no defense from the per-site budget. Sites are only limited by their ability to link activity across sites (which is too easy, as noted) and then the global budget.

So we should acknowledge that limitation as well as the more theoretical one here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson I added a paragraph at the end of this section to capture this additional challenge an adversary is faced with. Let me know if you think that still needs any adjusting.

In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge to the attacker when the user agent has made it difficult to link users across sites.

@bmcase bmcase removed the discuss Needs working group discussion label Oct 3, 2025
@tholop
Copy link

tholop commented Oct 15, 2025

Our updated analysis is now available on arXiv: https://arxiv.org/pdf/2506.05290v2. Section 3 is about per-site guarantees, and Section 3.4 specifies the assumptions under which they hold.
cc @roxanageambasu @bmcase

@bmcase
Copy link
Contributor Author

bmcase commented Dec 4, 2025

Updated the PR to address all the points of feedback above, including a summary of both papers and paragraph to address the challenge of cross-site linking. Cleaned up the markdown format and bib such that all checks are passing now.

We previously discussed this PR in a PATWG meeting and all were supportive of it.

I think we should be ready to merge this PR now. cc @martinthomson , @csharrison , @apasel422 , @andyleiserson

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the work here.

I'm not sure about your presentation of assumptions ahead of what I consider to be the real results. Especially the major limitation in the analysis, which is that the threat model doesn't include sites that can coordinate. (To be clear, I think that it's a very reasonable limitation to assume when doing this analysis, anything else breaks down, but it's a big part of why the global safety limits are so important to the overall design.)

The assumptions are largely theoretical, but they sort of hide the main results. That is that global safety limits are effective and don't rely on any assumptions. I would move that right up to the top.

Then, I think that there are three assumptions you want to present. The first is the important one about coordinating sites, which isn't really a theoretical thing. You can present it more as I did, as more of a limitation of the threat model adopted in the analysis. But I think that it needs to be the second thing you say.

Then you can discharge the two theoretical points, which are very good overall, but I don't think that they have any practical effect, other than being necessary for someone to understand.

Comment on lines +2352 to +2354
which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety
limits exclude it thereby enforcing a global individual DP guarantee. In Attribution Level 1
it is conversion sites that have per-site budgets tracked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to be more direct about the connection to the specification.

Given where we are, this might need to be:

Suggested change
which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety
limits exclude it thereby enforcing a global individual DP guarantee. In Attribution Level 1
it is conversion sites that have per-site budgets tracked.
which they hold).
The per-site budgets include [=site=] in the [=privacy unit=]
are based on the restricted analysis in [[PPA-DP]].
The introduction of <dfn>global safety limits</dfn> exclude [=site=],
which creates a global DP guarantee.
The current version of the document does not define
the application of [=global safety limits=].

1. *No leakage through cross-site shared limits.* Queries from one site must not affect which
reports are emitted to others.

Assumption 1 is necessary because the system involves multiple sites that could interact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assumption 1 is necessary because the system involves multiple sites that could interact
The assumption that sites cannot adapt their queries is necessary
because the system involves multiple sites that could interact

api.bs Outdated
Assumption 1 is necessary because the system involves multiple sites that could interact
with the same user over time and change the ads they show to the user, or impact the
conversions the user has, based on each other’s DP measurements. For example, if one advertiser
learns, from DP measurements, to make an ad more effective, a user may convert on their site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
learns, from DP measurements, to make an ad more effective, a user may convert on their site
learns generally-applicable information that helps them make their ads more effective,
that will make it more likely that their ads are attributed for conversions,
as opposed to a competitor.

Here, "DP measurements" refers to measureConversion() as well. Watch out for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a case where we are talking about aggregate results from many devices that you get back from the aggregation service.

I feel we need some sort of term for this in the spec as for what to call the final aggregate DP results that go back to the advertiser. Maybe "query" is too general but something like "DP attribution results" or "DP measurements" should be clear and maybe we need to define that somewhere in the intro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at defining "attribution result" in the intro and trying to use this whenever we mean the final outputs learned by the advertiser.

this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated,
the analysis shows that per-site guarantees cannot be achieved.

Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
An assumption that sites are unable to coordinate their use of the API is necessary
when we have shared limits that span multiple sites. An example of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to say here is just that if you want to have shared limits you have to make Assumption 2 for the per-site budgets to hold.

api.bs Outdated

Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
such shared limits are the global safety limits that aim to provide a global DP guarantee.
If queries from some sites cause a shared limit to be reached, reports to other sites may be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, queries... If you want to use that word, it might be necessary to explain it up front.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can use measureConversion as this is talking about what happens on a single device.

Comment on lines +2390 to +2393
In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge in practice to the attacker when the user agent has made such cross-site linking difficult.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge in practice to the attacker when the user agent has made such cross-site linking difficult.
An attacker that is able to use other information to link the activity of a user across multiple sites
can use the DP budgets of those sites to overcome the constraints of the per-site budgets.
There are many features in the web platform that allow this capability,
so this is a very plausible attack on the privacy design.
The privacy analysis does not consider this attack within its threat model,
relying exclusively on [=global safety limits=].
Any implementation needs to consider this limitation when selecting DP parameters.

This is the second thing I would say, which is a major limitation of the design and something that is important to understand about the interaction between per-site and global budgets.

This is something that implementations have to consider when they set parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this suggested text, but see me longer comment below for why.

An attacker that is able to use other information to link the activity of a user across multiple sites
can use the DP budgets of those sites to overcome the constraints of the per-site budgets.

An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API. The incremental information they learn by using the API is worst case bounded by the composition of the per-site budgets involved under the assumptions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API.

This is true, but only for those sites on which the attacker can gather information. If there is another site that the attacker has no information about, having multiple vantage points -- or multiple budgets -- to use this API from gives them greater information than they would have from a single vantage point.

So both apply.

Comment on lines +2386 to +2388
By contrast, the analysis shows that *safety limits* -- which operate at global level,
excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual
DP guarantees* regardless of whether either assumption is satisfied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider defining this as [=global safety limits=], per above.

Suggested change
By contrast, the analysis shows that *safety limits* -- which operate at global level,
excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual
DP guarantees* regardless of whether either assumption is satisfied.
The analysis shows that [=global safety limits=] --
which do not have a [=site=]-specific [=privacy unit=] --
deliver sound individual DP guarantees
regardless without relying on either of these assumptions.

Importantly, after introducing the analyses and some context, this is the first thing I would say. It's a simple statement that is easy to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, we can move this up top if we want to start with what holds without any assumptions.

@bmcase
Copy link
Contributor Author

bmcase commented Dec 5, 2025

@martinthomson, let me clarify a little

I'm not sure about your presentation of assumptions ahead of what I consider to be the real results. Especially the major limitation in the analysis, which is that the threat model doesn't include sites that can coordinate. (To be clear, I think that it's a very reasonable limitation to assume when doing this analysis, anything else breaks down, but it's a big part of why the global safety limits are so important to the overall design.)

There is not a limitation in threat model in the paper that assumes sites can't collude. It is a worst case threat model that does assume sites can collude. What these assumptions are about is when the per-site filter is a DP guarantee which tells you how composition of budget is measured when sites do collude.

Under the two assumptions here (1. no cross-site adaptivity in data generation and 2. no leakage through cross-site shared limits) the per-site filters provide an individual DP guarantee. Because of this two sites who collude will have their budget composed to 2x the per-site filter. If it were not the case that these filters provided an IDP guarantee collusion could give even greater than a 2x composition of budget.

As for sites trying to track the user across sites I think the situation is either:

  1. they can cross-site identify the user in which case why even bother to use this API to try and gain more information
  2. they can't cross-site identify the user in which case colluding sites will have a hard time in practice putting to use the theoretical 2x composed budget they have to target a single user.
  3. somewhere in between 1) and 2) where cross-site signals give them some probabilistic ability to track the user across site, in which case using the API may give more information but the information gained is bounded in the worst case by their 2x composed budget.

@martinthomson
Copy link
Member

That 2 sites = 2x the budget is the message that I was looking for.

Comment on lines +2350 to +2351
(Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under
which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under
which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety
Section 3 of [[PPA-DP-2]] addresses per-site guarantees
and Section 3.4 specifies the assumptions under which those guarantees hold.
Per-site budgets include [=site=] in the [=privacy unit=], whereas safety

Comment on lines +2390 to +2393
In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge in practice to the attacker when the user agent has made such cross-site linking difficult.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API.

This is true, but only for those sites on which the attacker can gather information. If there is another site that the attacker has no information about, having multiple vantage points -- or multiple budgets -- to use this API from gives them greater information than they would have from a single vantage point.

So both apply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants