Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Nov 8, 2025


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know when (and how) should we bump the version?

Since the only doc I can find that related to generated/provider_metadata.json is dev/README_RELEASE_PROVIDERS.md.

Should we bump them manually by release manager? Or these are already included in breeze release-management command?
Thanks!

@potiuk
Copy link
Member Author

potiuk commented Nov 9, 2025

It's run as part of the release process so release manager (in this case me stepping it for Elad) has to do after providers are released:

https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDERS.md#update-providers-metadata

A few words on why have this file at all:

The main reson this is done is that it aids SBOM generation and to speed up the generation without pulling all the information for PyPI when we generate many of those in parallell processes.:

https://github.com/apache/airflow/blob/bc3a750af476b991ca34e6c696208f8e75ff99a7/dev/breeze/doc/09_release_management_tasks.rst#sbom-generation-tasks

The SBOMS are currently generated as part of the "release docs building process for airflow" https://github.com/apache/airflow/blob/main/dev/README_RELEASE_AIRFLOW.md#publish-final-documentation -> this "workflow_run" takes the "main" version of the "provider's metadata" and uses it to determine which version of providers we should be "matching" with the released airflow version. It's done in this way so that we can also at any time regnerate SBOMS for historical versions of Airflow - this file (pulled from main) is used to find which versions of providers were used in the version of Airflow.

But the real usage of it when we want to regenerate the SBOMS for example if a new version of the tool (cdxgen) - is released. Then we can massively parallelise it, and it's been easier to just have the "provider's metadata" in the image already rather than trying to download all the constraints and interact with PyPI to find it out over and over.

So this is largely a glorified cache that we update after release.

There is also some ambiguity here we try to solve - sometime we release providers several times after a version of airflow so the mapping is not 1-1 it's usuall many-1 (many providers matching the same version) - and it also means that sometimes provider is released but it has "no" version of airflow that actually used it (yet) and this code is doing it by matching the released provider with the latest released airflow, even if it is not found in constraints yet. So the "cache" is built with a little more complex logic - reflecting the temporary nature of the "tip" of provider versions.

@potiuk potiuk merged commit cbf9417 into apache:main Nov 9, 2025
53 checks passed
@potiuk potiuk deleted the update-providers-metadata-2025-11-08 branch November 9, 2025 13:26
profgrammer pushed a commit to profgrammer/airflow that referenced this pull request Nov 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants