-
Notifications
You must be signed in to change notification settings - Fork 49
Add text on DP formal analysis and its assumptions #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
DP assumptions for formal proofs
| Assumption 1 is necessary because the system involves multiple sites that could interact | ||
| with the same user over time and change the ads they show to the user, or impact the | ||
| conversions the user has, based on each other’s DP measurements. For example, if one advertiser | ||
| learns, from DP measurements, to make an ad more effective, a user may convert on their site | ||
| rather than a competitor’s. In this case, the first site’s DP outputs -- counted only against | ||
| its own per-site budget -- alter the data (or absence of data) visible to the competitor, yet | ||
| this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated, | ||
| the analysis shows that per-site guarantees cannot be achieved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the assumption, but I think that the main challenge here is different. Sites might gain an understanding that a particular visitor to each site in a set is the same person (due to federated login, same email address, or anything including navigation tracking which we can't stop). AND THEN they decide to pool their per-site budgets to use the API to extract more information about that person. In that case, we have no defense from the per-site budget. Sites are only limited by their ability to link activity across sites (which is too easy, as noted) and then the global budget.
So we should acknowledge that limitation as well as the more theoretical one here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinthomson I added a paragraph at the end of this section to capture this additional challenge an adversary is faced with. Let me know if you think that still needs any adjusting.
In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge to the attacker when the user agent has made it difficult to link users across sites.
|
Our updated analysis is now available on arXiv: https://arxiv.org/pdf/2506.05290v2. Section 3 is about per-site guarantees, and Section 3.4 specifies the assumptions under which they hold. |
add a short description of both DP papers
|
Updated the PR to address all the points of feedback above, including a summary of both papers and paragraph to address the challenge of cross-site linking. Cleaned up the markdown format and bib such that all checks are passing now. We previously discussed this PR in a PATWG meeting and all were supportive of it. I think we should be ready to merge this PR now. cc @martinthomson , @csharrison , @apasel422 , @andyleiserson |
martinthomson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the work here.
I'm not sure about your presentation of assumptions ahead of what I consider to be the real results. Especially the major limitation in the analysis, which is that the threat model doesn't include sites that can coordinate. (To be clear, I think that it's a very reasonable limitation to assume when doing this analysis, anything else breaks down, but it's a big part of why the global safety limits are so important to the overall design.)
The assumptions are largely theoretical, but they sort of hide the main results. That is that global safety limits are effective and don't rely on any assumptions. I would move that right up to the top.
Then, I think that there are three assumptions you want to present. The first is the important one about coordinating sites, which isn't really a theoretical thing. You can present it more as I did, as more of a limitation of the threat model adopted in the analysis. But I think that it needs to be the second thing you say.
Then you can discharge the two theoretical points, which are very good overall, but I don't think that they have any practical effect, other than being necessary for someone to understand.
| which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety | ||
| limits exclude it thereby enforcing a global individual DP guarantee. In Attribution Level 1 | ||
| it is conversion sites that have per-site budgets tracked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to be more direct about the connection to the specification.
Given where we are, this might need to be:
| which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety | |
| limits exclude it thereby enforcing a global individual DP guarantee. In Attribution Level 1 | |
| it is conversion sites that have per-site budgets tracked. | |
| which they hold). | |
| The per-site budgets include [=site=] in the [=privacy unit=] | |
| are based on the restricted analysis in [[PPA-DP]]. | |
| The introduction of <dfn>global safety limits</dfn> exclude [=site=], | |
| which creates a global DP guarantee. | |
| The current version of the document does not define | |
| the application of [=global safety limits=]. |
| 1. *No leakage through cross-site shared limits.* Queries from one site must not affect which | ||
| reports are emitted to others. | ||
|
|
||
| Assumption 1 is necessary because the system involves multiple sites that could interact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assumption 1 is necessary because the system involves multiple sites that could interact | |
| The assumption that sites cannot adapt their queries is necessary | |
| because the system involves multiple sites that could interact |
api.bs
Outdated
| Assumption 1 is necessary because the system involves multiple sites that could interact | ||
| with the same user over time and change the ads they show to the user, or impact the | ||
| conversions the user has, based on each other’s DP measurements. For example, if one advertiser | ||
| learns, from DP measurements, to make an ad more effective, a user may convert on their site |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| learns, from DP measurements, to make an ad more effective, a user may convert on their site | |
| learns generally-applicable information that helps them make their ads more effective, | |
| that will make it more likely that their ads are attributed for conversions, | |
| as opposed to a competitor. |
Here, "DP measurements" refers to measureConversion() as well. Watch out for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a case where we are talking about aggregate results from many devices that you get back from the aggregation service.
I feel we need some sort of term for this in the spec as for what to call the final aggregate DP results that go back to the advertiser. Maybe "query" is too general but something like "DP attribution results" or "DP measurements" should be clear and maybe we need to define that somewhere in the intro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a stab at defining "attribution result" in the intro and trying to use this whenever we mean the final outputs learned by the advertiser.
| this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated, | ||
| the analysis shows that per-site guarantees cannot be achieved. | ||
|
|
||
| Assumption 2 is necessary when we have shared limits that span multiple sites. An example of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assumption 2 is necessary when we have shared limits that span multiple sites. An example of | |
| An assumption that sites are unable to coordinate their use of the API is necessary | |
| when we have shared limits that span multiple sites. An example of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to say here is just that if you want to have shared limits you have to make Assumption 2 for the per-site budgets to hold.
api.bs
Outdated
|
|
||
| Assumption 2 is necessary when we have shared limits that span multiple sites. An example of | ||
| such shared limits are the global safety limits that aim to provide a global DP guarantee. | ||
| If queries from some sites cause a shared limit to be reached, reports to other sites may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, queries... If you want to use that word, it might be necessary to explain it up front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we can use measureConversion as this is talking about what happens on a single device.
| In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain | ||
| more DP budget about users also face the practical limitation of being able to link a user across sites. | ||
| This is limitation does not itself provide a theoretical DP benefit but does impose a significant | ||
| challenge in practice to the attacker when the user agent has made such cross-site linking difficult. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain | |
| more DP budget about users also face the practical limitation of being able to link a user across sites. | |
| This is limitation does not itself provide a theoretical DP benefit but does impose a significant | |
| challenge in practice to the attacker when the user agent has made such cross-site linking difficult. | |
| An attacker that is able to use other information to link the activity of a user across multiple sites | |
| can use the DP budgets of those sites to overcome the constraints of the per-site budgets. | |
| There are many features in the web platform that allow this capability, | |
| so this is a very plausible attack on the privacy design. | |
| The privacy analysis does not consider this attack within its threat model, | |
| relying exclusively on [=global safety limits=]. | |
| Any implementation needs to consider this limitation when selecting DP parameters. |
This is the second thing I would say, which is a major limitation of the design and something that is important to understand about the interaction between per-site and global budgets.
This is something that implementations have to consider when they set parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this suggested text, but see me longer comment below for why.
An attacker that is able to use other information to link the activity of a user across multiple sites
can use the DP budgets of those sites to overcome the constraints of the per-site budgets.
An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API. The incremental information they learn by using the API is worst case bounded by the composition of the per-site budgets involved under the assumptions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API.
This is true, but only for those sites on which the attacker can gather information. If there is another site that the attacker has no information about, having multiple vantage points -- or multiple budgets -- to use this API from gives them greater information than they would have from a single vantage point.
So both apply.
| By contrast, the analysis shows that *safety limits* -- which operate at global level, | ||
| excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual | ||
| DP guarantees* regardless of whether either assumption is satisfied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider defining this as [=global safety limits=], per above.
| By contrast, the analysis shows that *safety limits* -- which operate at global level, | |
| excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual | |
| DP guarantees* regardless of whether either assumption is satisfied. | |
| The analysis shows that [=global safety limits=] -- | |
| which do not have a [=site=]-specific [=privacy unit=] -- | |
| deliver sound individual DP guarantees | |
| regardless without relying on either of these assumptions. |
Importantly, after introducing the analyses and some context, this is the first thing I would say. It's a simple statement that is easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, we can move this up top if we want to start with what holds without any assumptions.
|
@martinthomson, let me clarify a little
There is not a limitation in threat model in the paper that assumes sites can't collude. It is a worst case threat model that does assume sites can collude. What these assumptions are about is when the per-site filter is a DP guarantee which tells you how composition of budget is measured when sites do collude. Under the two assumptions here (1. no cross-site adaptivity in data generation and 2. no leakage through cross-site shared limits) the per-site filters provide an individual DP guarantee. Because of this two sites who collude will have their budget composed to 2x the per-site filter. If it were not the case that these filters provided an IDP guarantee collusion could give even greater than a 2x composition of budget. As for sites trying to track the user across sites I think the situation is either:
|
|
That 2 sites = 2x the budget is the message that I was looking for. |
| (Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under | ||
| which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (Section 3 is about per-site guarantees and Section 3.4 specifies the assumptions under | |
| which they hold). Per-site budgets include [=site=] in the [=privacy unit=], whereas safety | |
| Section 3 of [[PPA-DP-2]] addresses per-site guarantees | |
| and Section 3.4 specifies the assumptions under which those guarantees hold. | |
| Per-site budgets include [=site=] in the [=privacy unit=], whereas safety |
| In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain | ||
| more DP budget about users also face the practical limitation of being able to link a user across sites. | ||
| This is limitation does not itself provide a theoretical DP benefit but does impose a significant | ||
| challenge in practice to the attacker when the user agent has made such cross-site linking difficult. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An attacker who can cross-site identify the user across sites would be able to learn more about the user across sites because of that linkage; but not because of the API.
This is true, but only for those sites on which the attacker can gather information. If there is another site that the attacker has no information about, having multiple vantage points -- or multiple budgets -- to use this API from gives them greater information than they would have from a single vantage point.
So both apply.
With @roxanageambasu and @tholop we have put together some text for the spec that states what assumptions are used in the system model in which the formal DP analysis is done.
The referenced paper is already available, but a few claims in the PR are not yet reflected there. We will update the arXiv version to align with the PR shortly. In the meantime, we confirm that the statements are accurate and supported by analyses we already have internally.
Preview | Diff