Fix: Mirror URI in manifest /index/files response is set for MA files (#7687)#7693
Fix: Mirror URI in manifest /index/files response is set for MA files (#7687)#7693nadove-ucsc wants to merge 20 commits intodevelopfrom
Conversation
52a4ca7 to
d8627ae
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7693 +/- ##
===========================================
+ Coverage 84.80% 84.83% +0.03%
===========================================
Files 157 157
Lines 23067 23124 +57
===========================================
+ Hits 19561 19617 +56
- Misses 3506 3507 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
39b3cd7 to
87d7dea
Compare
src/azul/plugins/__init__.py
Outdated
| Implementations should raise PermissionError if the provided | ||
| authentication is insufficient to access the repository. |
There was a problem hiding this comment.
"Implementations" implies "implementations of this method". No authentication is provided to those implementations, so I don't understand this sentence.
src/azul/service/source_service.py
Outdated
| all_sources = set() | ||
| public_sources = set() |
There was a problem hiding this comment.
We typically use tuple assignmets when they correlate like this.
src/azul/service/source_service.py
Outdated
| return int(time()) | ||
|
|
||
| @cache | ||
| def _configured_sources(self) -> Mapping[str, AbstractSet[SourceRef]]: |
There was a problem hiding this comment.
The return type hint is a text-book case for a typed dict.
There was a problem hiding this comment.
Also, please add unit test coverage for the the newly uncovered lines
https://app.codecov.io/gh/DataBiosphere/azul/pull/7693/blob/src/azul/service/source_service.py
It wasn't caught because |
87d7dea to
7464fca
Compare
7464fca to
73ab983
Compare
| Specifically, this means that subclasses may not add fields without a | ||
| default or modify whether a field is initialized via a keyword-only or | ||
| positional-only argument. |
There was a problem hiding this comment.
| Specifically, this means that subclasses may not add fields without a | |
| default or modify whether a field is initialized via a keyword-only or | |
| positional-only argument. | |
| For example, this means that subclasses may not add fields without a | |
| default value or modify the constructor to accept positional arguments. |
There was a problem hiding this comment.
Modifying the constructor to accept positional arguments would not prevent the constructor from being invoked with keyword arguments. I maintain that my original wording is more accurate and precise.
src/azul/service/source_service.py
Outdated
| """ | ||
| List source IDs in the underlying repository that are accessible using | ||
| the provided authentication. May require a roundtrip to the underlying | ||
| repository, but results are cached in DynamoDB for up to 1 minute. |
There was a problem hiding this comment.
| repository, but results are cached in DynamoDB for up to 1 minute. | |
| repository, but results are cached in DynamoDB for a short time. |
Implementation detail. It's actually 5 min now.
src/azul/service/source_service.py
Outdated
| } | ||
|
|
||
| @property | ||
| def configured_sources(self) -> AbstractSet[SourceRef]: |
There was a problem hiding this comment.
Let's remove this for now. Let's also remove the outer dictionary in sources.json for now, and the TypedDict, and let's rename sources.json to public_sources.json.
I think we should organize the contents of sources.json by catalog. Then we can use configured_public_sources in list_accessible_sources if authentication is None. We can also make this method protected and have the mirror service use the public list_accessible_sources instead. This would create infinite recursion outside of a Lambda context. I think that can be broken by having list_accessible_sources and _list_accessible_sources. I can elaborate in PL.
Somewhat unrelated: Please add a log statement to the auth fallback for the case when it falls back to no auth. I want to be able to tell from the logs how frequently that occurs.
I also have the feeling that a set of SourceRef instances (what this method returns) doesn't work as one would expect. I think it can contain two SourceRef instances with the same source_id.
In essence what this PR then effectively does is cache public sources for much longer and ensure that the plugin layer restricts the set of sources to those configured.
In another PR we can add back the functionality of this removed method, and use it in list_accessible_source_ids() to further subset the return value.
Another point of tension that we need to address in the long run (not in this PR) is that post_deploy and outsourcing of sources do similar things.
There was a problem hiding this comment.
Let's remove this for now. Let's also remove the outer dictionary in
sources.jsonfor now, and the TypedDict, and let's renamesources.jsontopublic_sources.json.
Agreed to remove this method since it is unused in this PR. It will trivial to re-add it later.
I think we should organize the contents of
sources.jsonby catalog. Then we can useconfigured_public_sourcesinlist_accessible_sourcesif authentication is None. We can also make this method protected and have the mirror service use the publiclist_accessible_sourcesinstead. This would create infinite recursion outside of a Lambda context. I think that can be broken by havinglist_accessible_sourcesand_list_accessible_sources. I can elaborate in PL.
Agreed to postpone organizing by catalog and unifying list_accessible_sources with configured_public_sources.
Somewhat unrelated: Please add a log statement to the auth fallback for the case when it falls back to no auth. I want to be able to tell from the logs how frequently that occurs.
This is easy, I can do this.
I also have the feeling that a set of SourceRef instances (what this method returns) doesn't work as one would expect. I think it can contain two SourceRef instances with the same
source_id.
Agreed to change return type annotation to Iterable[SourceRef] and skip the conversion to set.
In essence what this PR then effectively does is cache public sources for much longer and ensure that the plugin layer restricts the set of sources to those configured.
In another PR we can add back the functionality of this removed method, and use it in list_accessible_source_ids() to further subset the return value.
Another point of tension that we need to address in the long run (not in this PR) is that
post_deployand outsourcing of sources do similar things.
There was a problem hiding this comment.
Agreed to change return type annotation to Iterable[SourceRef] and skip the conversion to set.
Something I missed was that it is necessary to store the sources in a set when preparing for outsourcing to avoid duplicate entries in cases where the same source occurs in multiple catalogs (e.g. dcp55 and dcp56).
Two sources with the same ID can coexist in a set only if they differ by some other attribute (e.g. have different prefixes). Therefore using a set here is still very useful at eliminating duplicates, which would otherwise be very common.
3ebeec1 to
c41db2c
Compare
beb364a to
e8b472a
Compare
e8b472a to
75108db
Compare
Linked issues: #7687
Checklist
Author
developissues/<GitHub handle of author>/<issue#>-<slug>1 when the issue title describes a problem, the corresponding PR
title is
Fix:followed by the issue titleAuthor (partiality)
ptag to titles of partial commitspartialor completely resolves all linked issuespartiallabelAuthor (reindex)
rtag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:devor the changes introduced by it will not require reindexing ofdevreindex:anvildevor the changes introduced by it will not require reindexing ofanvildevreindex:anvilprodor the changes introduced by it will not require reindexing ofanvilprodreindex:prodor the changes introduced by it will not require reindexing ofprodreindex:partialand its description documents the specific reindexing procedure fordev,anvildev,anvilprodandprodor requires a full reindex or carries none of the labelsreindex:dev,reindex:anvildev,reindex:anvilprodandreindex:prodAuthor (API changes)
APIor this PR does not modify a REST APIa(A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.pyor this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.jsonand committed the resulting changes or this PR does not modifyazul_docker_images, or any other variables referenced in the definition of that variableutag to commit title or this PR does not require upgrading deploymentsupgradeor does not require upgrading deploymentsdeploy:sharedor does not modifydocker_images.json, and does not require deploying thesharedcomponent for any other reasondeploy:gitlabor does not require deploying thegitlabcomponentdeploy:runneror does not require deploying therunnerimageAuthor (hotfixes)
Ftag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprodandprod) have temporary hotfixes for any of the issues linked to this PRAuthor (before every review)
develop, squashed fixups from prior reviewsmake requirements_updateor this PR does not modifyDockerfile,environment,requirements*.txt,common.mk,Makefileorenvironment.bootRtag to commit title or this PR does not modifyrequirements*.txtreqsor does not modifyrequirements*.txtmake integration_testpasses in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
Note that after requesting changes, the PR must be assigned to only the author.
System administrator (after approval)
demoorno demono demono sandboxN reviewslabel is accurateOperator
reindex:…labels andrcommit title tagno demodevelopOperator (deploy
.sharedand.gitlabcomponents)_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlab_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlabdeploy:gitlabdeploy:gitlabSystem administrator (post-deploy of
.gitlabcomponent)dev.gitlabare complete or this PR is not labeleddeploy:gitlabanvildev.gitlabare complete or this PR is not labeleddeploy:gitlabOperator (deploy runner image)
_select dev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runner_select anvildev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runnerOperator (sandbox build)
sandboxlabel or PR is labeledno sandboxdevor PR is labeledno sandboxanvildevor PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxor this PR does not remove catalogs or otherwise causes unreferenced indices insandboxanvilboxor this PR does not remove catalogs or otherwise causes unreferenced indices inanvilboxsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevOperator (merge the branch)
pif the PR is also labeledpartialOperator (main build)
devanvildevdevdevanvildevanvildev_select dev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shared_select anvildev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shareddevanvildevOperator (reindex)
devor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevdevor this PR does not require reindexingdevdeploy_browserjob in the GitLab pipeline for this PR indevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdeploy_browserjob in the GitLab pipeline for this PR inanvildevor this PR does not require reindexinganvildevOperator (mirroring)
devor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevdevor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevdevor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevOperator
deploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels to the next promotion PRs or this PR carries none of these labelsdeploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
Lline is too longWline wrapping is wrongQbad quotesFother formatting problem