Skip to content

Conversation

@erwindouna
Copy link
Contributor

@erwindouna erwindouna commented Oct 3, 2025

Breaking change

Proposed change

Proof of Concept for an architectural discussion. Please treat it as such and doesn't require a review at the moment.

Architecture discussion: home-assistant/architecture#1286

The update_coordinator.py now includes a retry_after parameter in the UpdateFailed exception. The coordinator now defers its next scheduled refresh by that many seconds and then resumes the normal cadence.

Considerations:

  • It is not a breaking change.
  • Currently it only accepts only an integer. Open for discussion if this should be a float/timedelta to allow more "jitternish" controlled from the Integration Owners.
  • Integration Owners are responsible for triggering and proper value-setting of the retry_after. This means, the integrations and the API clients are/need to be able to detect its rate-limited.
    Effectively meaning integration owners need to do sanitionzed of the Retry-After header (be it an Int or Datetime), or find out in any other way the desired backoff from the API server. The integration owner dictates the backoff period.
  • The retry_after isn't capped. Perhaps allow a maximum?
  • The retry_after only works after a successful config setup. Effectively meaning that if the UpdateFailed is called from async_config_entry_first_refresh it will still show the default behavior: ConfigEntryNotReady is actually raised. From a functional view I think this is right, but also from a technical view, the _schedule_refresh() would not hit in this scenario.
    For this is (mis)use the raise_on_entry_error to determine if the coordinator is in a setup phase.

Two tests are submitted:

  • To demonstrate the retry_after will not be used, even if the UpdateFailed is triggered via async_config_entry_first_refresh.
  • To demonstrate the UpdateFailed is raised, without and with the retry_after. In the case of without, it should resolve to the else-statement and show normal reschedule behavior of 10 seconds.
    In case of the retry_after scenario it's tested the 60 seconds reschedule is done, then cleared and upon a new schedule_refresh to demonstrate it defaulted back to the 10 seconds of the mock default. Basically to test the reschedule behavior is being reset properly.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • I understand the code I am submitting and can explain how it works.
  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.
  • Any generated code has been carefully reviewed for correctness and compliance with project standards.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

@erwindouna erwindouna marked this pull request as ready for review October 5, 2025 20:36
@erwindouna erwindouna requested a review from a team as a code owner October 5, 2025 20:36
Copilot AI review requested due to automatic review settings October 5, 2025 20:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a proof-of-concept backoff strategy mechanism for Home Assistant's DataUpdateCoordinator to handle rate limiting from APIs. The feature allows integrations to specify a delay period when raising UpdateFailed exceptions, enabling the coordinator to defer its next scheduled refresh appropriately.

Key changes:

  • Added retry_after parameter to UpdateFailed exception for specifying backoff delays
  • Modified coordinator's scheduling logic to honor retry delays after failed updates
  • Added comprehensive test coverage for the new backoff behavior

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
homeassistant/helpers/update_coordinator.py Core implementation of retry_after mechanism in UpdateFailed exception and DataUpdateCoordinator scheduling logic
tests/helpers/test_update_coordinator.py Comprehensive test cases covering retry_after behavior during setup and normal operation phases
homeassistant/components/portainer/coordinator.py Minor debug log message change from detailed endpoint info to simple "Finished"

@MartinHjelmare MartinHjelmare changed the title POC for backoff strategy Add retry_after to UpdateFailed in update coordinator Oct 6, 2025
@home-assistant
Copy link

home-assistant bot commented Oct 6, 2025

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@home-assistant home-assistant bot marked this pull request as draft October 6, 2025 06:23
@erwindouna erwindouna marked this pull request as ready for review October 6, 2025 18:54
@home-assistant home-assistant bot marked this pull request as draft October 15, 2025 13:16
@erwindouna erwindouna marked this pull request as ready for review October 16, 2025 16:45
@MartinHjelmare MartinHjelmare marked this pull request as draft October 19, 2025 05:12
@erwindouna erwindouna marked this pull request as ready for review October 19, 2025 18:53
@erwindouna
Copy link
Contributor Author

@MartinHjelmare did I miss a review comment to be worked out? Or can we continue this PR? :)


except UpdateFailed as err:
self.last_exception = err
# We can only honor a retry_after, after the config entry has been set up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this limitation clear in the dev blog, I think. We could consider allowing the retry_after parameter to influence the config entry retry interval, but I think that requires a new discussion.

Copy link
Member

@MartinHjelmare MartinHjelmare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@MartinHjelmare
Copy link
Member

Please link a dev docs update and blog post.

Please also fill in the PR template checklist.

@MartinHjelmare MartinHjelmare marked this pull request as draft November 3, 2025 15:19
@erwindouna
Copy link
Contributor Author

Thanks @MartinHjelmare! I'll work out a dev blog. :)

@erwindouna
Copy link
Contributor Author

Blog and docs PR provided. Marking this as review for review to bring it to attention.

@erwindouna erwindouna marked this pull request as ready for review November 7, 2025 08:37
@MartinHjelmare MartinHjelmare merged commit 84f8e57 into home-assistant:dev Nov 14, 2025
121 of 122 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 15, 2025
@erwindouna erwindouna deleted the architecture-duc-backoff branch November 19, 2025 08:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants