Add text on DP formal analysis and its assumptions #271

bmcase · 2025-09-04T15:34:50Z

With @roxanageambasu and @tholop we have put together some text for the spec that states what assumptions are used in the system model in which the formal DP analysis is done.

The referenced paper is already available, but a few claims in the PR are not yet reflected there. We will update the arXiv version to align with the PR shortly. In the meantime, we confirm that the statements are accurate and supported by analyses we already have internally.

Preview | Diff

DP assumptions for formal proofs

api.bs

martinthomson · 2025-09-04T22:01:41Z

api.bs

+Assumption 1 is necessary because the system involves multiple sites that could interact
+with the same user over time and change the ads they show to the user, or impact the
+conversions the user has, based on each other’s DP measurements. For example, if one advertiser
+learns, from DP measurements, to make an ad more effective, a user may convert on their site
+rather than a competitor’s. In this case, the first site’s DP outputs -- counted only against
+its own per-site budget -- alter the data (or absence of data) visible to the competitor, yet
+this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated,
+the analysis shows that per-site guarantees cannot be achieved.


This is part of the assumption, but I think that the main challenge here is different. Sites might gain an understanding that a particular visitor to each site in a set is the same person (due to federated login, same email address, or anything including navigation tracking which we can't stop). AND THEN they decide to pool their per-site budgets to use the API to extract more information about that person. In that case, we have no defense from the per-site budget. Sites are only limited by their ability to link activity across sites (which is too easy, as noted) and then the global budget.

So we should acknowledge that limitation as well as the more theoretical one here.

@martinthomson I added a paragraph at the end of this section to capture this additional challenge an adversary is faced with. Let me know if you think that still needs any adjusting.

In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge to the attacker when the user agent has made it difficult to link users across sites.

api.bs

tholop · 2025-10-15T01:42:37Z

Our updated analysis is now available on arXiv: https://arxiv.org/pdf/2506.05290v2. Section 3 is about per-site guarantees, and Section 3.4 specifies the assumptions under which they hold.
cc @roxanageambasu @bmcase

add a short description of both DP papers

bmcase · 2025-12-04T03:59:37Z

Updated the PR to address all the points of feedback above, including a summary of both papers and paragraph to address the challenge of cross-site linking. Cleaned up the markdown format and bib such that all checks are passing now.

We previously discussed this PR in a PATWG meeting and all were supportive of it.

I think we should be ready to merge this PR now. cc @martinthomson , @csharrison , @apasel422 , @andyleiserson

martinthomson

Thanks for doing the work here.

I'm not sure about your presentation of assumptions ahead of what I consider to be the real results. Especially the major limitation in the analysis, which is that the threat model doesn't include sites that can coordinate. (To be clear, I think that it's a very reasonable limitation to assume when doing this analysis, anything else breaks down, but it's a big part of why the global safety limits are so important to the overall design.)

The assumptions are largely theoretical, but they sort of hide the main results. That is that global safety limits are effective and don't rely on any assumptions. I would move that right up to the top.

Then, I think that there are three assumptions you want to present. The first is the important one about coordinating sites, which isn't really a theoretical thing. You can present it more as I did, as more of a limitation of the threat model adopted in the analysis. But I think that it needs to be the second thing you say.

Then you can discharge the two theoretical points, which are very good overall, but I don't think that they have any practical effect, other than being necessary for someone to understand.

api.bs

martinthomson · 2025-12-04T04:06:52Z

api.bs

+which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety
+limits exclude it thereby enforcing a global individual DP guarantee.  In Attribution Level 1
+it is conversion sites that have per-site budgets tracked.


I think you need to be more direct about the connection to the specification.

Given where we are, this might need to be:

Suggested change

which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety

limits exclude it thereby enforcing a global individual DP guarantee. In Attribution Level 1

it is conversion sites that have per-site budgets tracked.

which they hold).

The per-site budgets include [=site=] in the [=privacy unit=]

are based on the restricted analysis in [[PPA-DP]].

The introduction of <dfn>global safety limits</dfn> exclude [=site=],

which creates a global DP guarantee.

The current version of the document does not define

the application of [=global safety limits=].

api.bs

martinthomson · 2025-12-04T04:07:41Z

api.bs

+1.  *No leakage through cross-site shared limits.* Queries from one site must not affect which
+    reports are emitted to others.
+
+Assumption 1 is necessary because the system involves multiple sites that could interact


Suggested change

Assumption 1 is necessary because the system involves multiple sites that could interact

The assumption that sites cannot adapt their queries is necessary

because the system involves multiple sites that could interact

martinthomson · 2025-12-04T04:11:02Z

api.bs

+Assumption 1 is necessary because the system involves multiple sites that could interact
+with the same user over time and change the ads they show to the user, or impact the
+conversions the user has, based on each other’s DP measurements. For example, if one advertiser
+learns, from DP measurements, to make an ad more effective, a user may convert on their site


Suggested change

learns, from DP measurements, to make an ad more effective, a user may convert on their site

learns generally-applicable information that helps them make their ads more effective,

that will make it more likely that their ads are attributed for conversions,

as opposed to a competitor.

Here, "DP measurements" refers to measureConversion() as well. Watch out for that.

This a case where we are talking about aggregate results from many devices that you get back from the aggregation service.

I feel we need some sort of term for this in the spec as for what to call the final aggregate DP results that go back to the advertiser. Maybe "query" is too general but something like "DP attribution results" or "DP measurements" should be clear and maybe we need to define that somewhere in the intro.

Took a stab at defining "attribution result" in the intro and trying to use this whenever we mean the final outputs learned by the advertiser.

martinthomson · 2025-12-04T04:12:30Z

api.bs

+this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated,
+the analysis shows that per-site guarantees cannot be achieved.
+
+Assumption 2 is necessary when we have shared limits that span multiple sites. An example of


Suggested change

Assumption 2 is necessary when we have shared limits that span multiple sites. An example of

An assumption that sites are unable to coordinate their use of the API is necessary

when we have shared limits that span multiple sites. An example of

What I'm trying to say here is just that if you want to have shared limits you have to make Assumption 2 for the per-site budgets to hold.

martinthomson · 2025-12-04T04:13:05Z

api.bs

+
+Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
+such shared limits are the global safety limits that aim to provide a global DP guarantee.
+If queries from some sites cause a shared limit to be reached, reports to other sites may be


Again, queries... If you want to use that word, it might be necessary to explain it up front.

here we can use measureConversion as this is talking about what happens on a single device.

martinthomson · 2025-12-04T04:19:17Z

api.bs

+In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
+more DP budget about users also face the practical limitation of being able to link a user across sites.
+This is limitation does not itself provide a theoretical DP benefit but does impose a significant
+challenge in practice to the attacker when the user agent has made such cross-site linking difficult.


Suggested change

In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain

more DP budget about users also face the practical limitation of being able to link a user across sites.

This is limitation does not itself provide a theoretical DP benefit but does impose a significant

challenge in practice to the attacker when the user agent has made such cross-site linking difficult.

An attacker that is able to use other information to link the activity of a user across multiple sites

can use the DP budgets of those sites to overcome the constraints of the per-site budgets.

There are many features in the web platform that allow this capability,

so this is a very plausible attack on the privacy design.

The privacy analysis does not consider this attack within its threat model,

relying exclusively on [=global safety limits=].

Any implementation needs to consider this limitation when selecting DP parameters.

This is the second thing I would say, which is a major limitation of the design and something that is important to understand about the interaction between per-site and global budgets.

This is something that implementations have to consider when they set parameters.

I don't agree with this suggested text, but see me longer comment below for why.

An attacker that is able to use other information to link the activity of a user across multiple sites
can use the DP budgets of those sites to overcome the constraints of the per-site budgets.

An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API. The incremental information they learn by using the API is worst case bounded by the composition of the per-site budgets involved under the assumptions here.

An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API.

This is true, but only for those sites on which the attacker can gather information. If there is another site that the attacker has no information about, having multiple vantage points -- or multiple budgets -- to use this API from gives them greater information than they would have from a single vantage point.

So both apply.

martinthomson · 2025-12-04T04:21:06Z

api.bs

+By contrast, the analysis shows that *safety limits* -- which operate at global level,
+excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual
+DP guarantees* regardless of whether either assumption is satisfied.


Consider defining this as [=global safety limits=], per above.

Suggested change

By contrast, the analysis shows that *safety limits* -- which operate at global level,

excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual

DP guarantees* regardless of whether either assumption is satisfied.

The analysis shows that [=global safety limits=] --

which do not have a [=site=]-specific [=privacy unit=] --

deliver sound individual DP guarantees

regardless without relying on either of these assumptions.

Importantly, after introducing the analyses and some context, this is the first thing I would say. It's a simple statement that is easy to understand.

sure, we can move this up top if we want to start with what holds without any assumptions.

bmcase · 2025-12-05T01:52:54Z

@martinthomson, let me clarify a little

I'm not sure about your presentation of assumptions ahead of what I consider to be the real results. Especially the major limitation in the analysis, which is that the threat model doesn't include sites that can coordinate. (To be clear, I think that it's a very reasonable limitation to assume when doing this analysis, anything else breaks down, but it's a big part of why the global safety limits are so important to the overall design.)

There is not a limitation in threat model in the paper that assumes sites can't collude. It is a worst case threat model that does assume sites can collude. What these assumptions are about is when the per-site filter is a DP guarantee which tells you how composition of budget is measured when sites do collude.

Under the two assumptions here (1. no cross-site adaptivity in data generation and 2. no leakage through cross-site shared limits) the per-site filters provide an individual DP guarantee. Because of this two sites who collude will have their budget composed to 2x the per-site filter. If it were not the case that these filters provided an IDP guarantee collusion could give even greater than a 2x composition of budget.

As for sites trying to track the user across sites I think the situation is either:

they can cross-site identify the user in which case why even bother to use this API to try and gain more information
they can't cross-site identify the user in which case colluding sites will have a hard time in practice putting to use the theoretical 2x composed budget they have to target a single user.
somewhere in between 1) and 2) where cross-site signals give them some probabilistic ability to track the user across site, in which case using the API may give more information but the information gained is bounded in the worst case by their 2x composed budget.

martinthomson · 2025-12-05T02:22:48Z

That 2 sites = 2x the budget is the message that I was looking for.

nice bikeshed handles plural links

api.bs

martinthomson · 2025-12-08T00:25:18Z

api.bs

+(Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under
+which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety


Suggested change

(Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under

which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety

Section 3 of [[PPA-DP-2]] addresses per-site guarantees

and Section 3.4 specifies the assumptions under which those guarantees hold.

Per-site budgets include [=site=] in the [=privacy unit=], whereas safety

martinthomson · 2025-12-08T00:28:18Z

api.bs

+In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
+more DP budget about users also face the practical limitation of being able to link a user across sites.
+This is limitation does not itself provide a theoretical DP benefit but does impose a significant
+challenge in practice to the attacker when the user agent has made such cross-site linking difficult.


An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API.

This is true, but only for those sites on which the attacker can gather information. If there is another site that the attacker has no information about, having multiple vantage points -- or multiple budgets -- to use this API from gives them greater information than they would have from a single vantage point.

So both apply.

api.bs

bmcase added 4 commits September 3, 2025 12:16

add dp assumptions

5838518

DP assumptions for formal proofs

update DP assumptions text

2e27352

Update DP assumptions text

6fe5c2a

fix format

c997c37

martinthomson added the discuss Needs working group discussion label Sep 4, 2025

martinthomson reviewed Sep 4, 2025

View reviewed changes

fmt fixes

5b54331

martinthomson reviewed Sep 9, 2025

View reviewed changes

api.bs Outdated Show resolved Hide resolved

apasel422 reviewed Sep 9, 2025

View reviewed changes

api.bs Outdated Show resolved Hide resolved

bmcase removed the discuss Needs working group discussion label Oct 3, 2025

roxanageambasu mentioned this pull request Oct 15, 2025

Add global privacy budget and per-impression-site quotas #237

Open

bmcase added 2 commits November 3, 2025 12:15

Merge branch 'w3c:main' into origin/dp_assumptions

9c2a1ee

add DP paper summary

f36fac9

add a short description of both DP papers

Ren0652088124-bot approved these changes Nov 21, 2025

View reviewed changes

bmcase added 3 commits December 3, 2025 22:32

fix bib

fd4209c

fix markdown

750176b

cross-site linking challenge to attacker

430c710

martinthomson reviewed Dec 4, 2025

View reviewed changes

bmcase added 3 commits December 5, 2025 11:01

use "attribution result" for final aggregate output

a6ff01c

fix bikeshed links

93aaf12

fix links

68213be

nice bikeshed handles plural links

martinthomson reviewed Dec 11, 2025

View reviewed changes

martinthomson added 2 commits December 12, 2025 08:04

Some editorial fixes

ea2d795

Another one

ec9b250

martinthomson reviewed Dec 11, 2025

View reviewed changes

api.bs Show resolved Hide resolved

Line

efbcda5

-which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety
-limits exclude it thereby enforcing a global individual DP guarantee.  In Attribution Level 1
-it is conversion sites that have per-site budgets tracked.
+which they hold).
+The per-site budgets include [=site=] in the [=privacy unit=]
+are based on the restricted analysis in [[PPA-DP]].
+The introduction of <dfn>global safety limits</dfn> exclude [=site=],
+which creates a global DP guarantee.
+The current version of the document does not define
+the application of [=global safety limits=].

	Assumption 1 is necessary because the system involves multiple sites that could interact
	The assumption that sites cannot adapt their queries is necessary
	because the system involves multiple sites that could interact

-learns, from DP measurements, to make an ad more effective, a user may convert on their site
+learns generally-applicable information that helps them make their ads more effective,
+that will make it more likely that their ads are attributed for conversions,
+as opposed to a competitor.

	Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
	An assumption that sites are unable to coordinate their use of the API is necessary
	when we have shared limits that span multiple sites. An example of

-In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
-more DP budget about users also face the practical limitation of being able to link a user across sites.
-This is limitation does not itself provide a theoretical DP benefit but does impose a significant
-challenge in practice to the attacker when the user agent has made such cross-site linking difficult.
+An attacker that is able to use other information to link the activity of a user across multiple sites
+can use the DP budgets of those sites to overcome the constraints of the per-site budgets.
+There are many features in the web platform that allow this capability,
+so this is a very plausible attack on the privacy design.
+The privacy analysis does not consider this attack within its threat model,
+relying exclusively on [=global safety limits=].
+Any implementation needs to consider this limitation when selecting DP parameters.

-By contrast, the analysis shows that *safety limits* -- which operate at global level,
-excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual
-DP guarantees* regardless of whether either assumption is satisfied.
+The analysis shows that [=global safety limits=] --
+which do not have a [=site=]-specific [=privacy unit=] --
+deliver sound individual DP guarantees
+regardless without relying on either of these assumptions.

		(Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under
		which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety

Add text on DP formal analysis and its assumptions #271

Are you sure you want to change the base?

Add text on DP formal analysis and its assumptions #271

Uh oh!

Conversation

bmcase commented Sep 4, 2025 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tholop commented Oct 15, 2025

Uh oh!

bmcase commented Dec 4, 2025

Uh oh!

martinthomson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmcase commented Dec 5, 2025

Uh oh!

martinthomson commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bmcase commented Sep 4, 2025 •

edited by pr-preview bot

Loading