-
Notifications
You must be signed in to change notification settings - Fork 8
Fix issue with progress directory not found in progress sync #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a race condition on Windows where rmtree and gh repo clone operations can conflict. The issue occurs because shutil.rmtree on Windows may retry deletion by changing permissions, allowing the clone command to start before deletion completes, potentially leaving the system in an inconsistent state where the progress folder is deleted.
Changes:
- Added a retry mechanism with polling to ensure the progress folder is fully deleted before cloning
- Added verification after cloning to ensure the operation succeeded, with recovery logic to restore local progress if the clone fails
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
app/commands/progress/sync/on.py
Outdated
|
|
||
| clone_with_custom_name(f"{username}/{fork_name}", PROGRESS_LOCAL_FOLDER_NAME) | ||
|
|
||
| # Verify clone succeeded by checking folder exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding this as an additional precaution, in case clone fails, so we can re-create the progress folder, ensuring student's are not left in the state where progress doesn't exist at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Core reasoning behind race condition: https://bugs.python.org/issue40143 (discovered by @jovnc ) |
| local_progress = json.load(file) | ||
|
|
||
| rmtree(PROGRESS_LOCAL_FOLDER_NAME) | ||
| os.makedirs(os.path.dirname(local_progress_filepath), exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I understand the rationale behind this? Is it because rmtree might not fully delete it on Windows, and as a result, we need to re-create the folder explicitly, rather than allowing open(..., "a") to create it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is to address an existing bug for all OSes. It's more of a separate issue, but I just placed it here for convenience, since it's sort of related.
Idea is because we do a rmtree at the top deletes the progress folder, but below we instantly open it
with open(local_progress_filepath, "a") as progress_file:
This line throws a FileNotFoundError. The problem is that open with "a" mode creates the file if it doesn't exist, but does NOT create the parent directory.
| clone_with_custom_name(f"{username}/{fork_name}", PROGRESS_LOCAL_FOLDER_NAME) | ||
|
|
||
| # Verify clone succeeded, else restore local progress before failing | ||
| if not os.path.exists(PROGRESS_LOCAL_FOLDER_NAME): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, isn't it still possible for this to work even though the deletion still trails?
- rmtree fails, retry
- Clone runs
- Path exists, skip over if
- rmtree works, deletes the new folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rmtree keeps failing after 20 retries, we raise an exception at rmtree already and won't proceed.
This check here is more of a safety check to check that clone succeeds rather than for rmtree.
It's more of a safety net, as if clone fails, we can still restore the existing local progress.json file for the user (otherwise if it fails, it will leave users in unexpected state without progress folder).
But I'm also ok with leaving this check out to be honest, as i don't feel its entirely necessary.
| from pathlib import Path | ||
| from typing import Union | ||
|
|
||
| MAX_DELETE_RETRIES = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we might stall for 4 seconds in total? Seems like a pretty high upper bound. Do we want to lower it for now? I think a max retry of 5 is somewhat reasonable unless we hit an obvious issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should be safer with a higher number, I increased it from 10 initially. Rationale is if we set too low a threshold, there is a higher likelihood of failure and we will need to update the app again for students, which we want to avoid.
Likely cause:
Race condition in Windows in the following lines of code:
This is because of issues with
shutil.rmtreein Windows due to permissions issue, causing it to retry by changing permissions to write, and trying the delete again.It is likely
gh repo clonecontinues to run, which fails asprogressrepository already exists at that point of time. Only afterwards isshutil.rmtreedeletes the repository, causing us to land in a state whereprogressfolder is deleted from the student's folder.Solution:
progressfolder is present, unless student deletes it themselves)