-
Notifications
You must be signed in to change notification settings - Fork 88
[SVCS-552] folder_file_op cut down revalidate calls. #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[SVCS-552] folder_file_op cut down revalidate calls. #311
Conversation
cslzchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and I will take over the ticket/PR. Two minor issues:
- Need to update Google Cloud as well which has been added recently.
- This change affects all path-based provider and will cause conflicts for sure. Proceed with caution 🚔 .
|
|
||
| @property | ||
| @abc.abstractmethod | ||
| def id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the DocStr to mention what this id is for both types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| folder = await dest_provider.create_folder(dest_path, folder_precheck=False) | ||
|
|
||
| dest_path = await dest_provider.revalidate_path(dest_path.parent, dest_path.name, folder=dest_path.is_dir) | ||
| folder_path = dest_provider.path_from_metadata(dest_path.parent, folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename folder to folder_metadata (including the assign statement two lines above).
|
|
||
| @property | ||
| def id(self): | ||
| return self.build_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since id is exactly the same, why not using return self.path? This is what other providers do.
- This id consolidates both path-based ID and id- based ID under a single property. - It is identical to the path object identifier which is used to revalidate paths.
- In addititon, remove superfluous condition statement `if not futures: continue`
6732325 to
a27b568
Compare
- `self.id` is identical to `self.path` since GCS is a path-based provider
- Intermediate commit. Please rebase or ammend. - Discuss the todos and comments, fix issues if any. Update docstring, comments and remove todos. - Need travis ci.
8adfb5c to
b61b2ec
Compare
cslzchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Johnetordoff and CC @felliott
I rebased and updated the code with extra comments for each provider affected. Need some further discussion with you before I can continue. Thanks.
| return 'googlecloud' | ||
|
|
||
| @property | ||
| def id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add GoogleCloud to pass tests.
|
|
||
| @property | ||
| @abc.abstractmethod | ||
| def id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| return 'googlecloud' | ||
|
|
||
| # GoogleCloud is a path-based provider. However, it never calls ``revalidate_path()`` since it | ||
| # is the backend storage provider for OSFStorage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to CloudFiles (when it was the storage provider), GoogleCloud never uses revalidate_path(). We should set the id to be None or empty so we know it is never used accidentally or cause any confusion in the further.
| return 'github' | ||
|
|
||
| # GitHub is a path-based provider. However, it has its own ``revalidate_path()``. | ||
| # TODO: Investigate if this is an issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub may not work because your fix calls (functionally, not literally) the default revalidate_path() instead of the customized one.
| return 'dropbox' | ||
|
|
||
| # TODO: Is Dropbox path-based instead of id-based? | ||
| # TODO: Dropbox does not have a ``revalidate_path()``, should we use path instead of id? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no revalidate_path() for dropbox which makes guess that it is a path based (or at lease both)?
|
|
||
| # TODO: Should this be in ``DataverseFileMetadata``? | ||
| # TODO: Does ``self.raw['id']`` exists for ``DataverseDatasetMetadata``? | ||
| # TODO: DataVerse's ``revalidate_path()`` looks very different. Is this an issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to GitHub, your fix calls (functionally, not literally) revalidate_path() differently than the original way. This might be an issue.
|
|
||
| # CloudFiles is a path-based provider. However, it never called ``revalidate_path()`` even when | ||
| # it was the backend storage provider for OSFStorage. The provider is no longer active until it | ||
| # is upgraded to an addon provider for the users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See GoogleCloud.
Ticket
https://openscience.atlassian.net/browse/SVCS-552
This PR replaces #287
Purpose
folder_file_op makes roughly a trillion metadata calls every time it moves a folder's children. This fix makes it easy to get a path from it's metadata object by giving the metadata an id which corresponds with the path identifier. Since we can now convert a metadata object into a path easily we no longer need to re-validate each time.
Changes
Side effects
None that I know of.
QA Notes
Copy/moves between providers should be unchanged or slightly faster.
Deployment Notes
None.