Skip to content

Conversation

@yannforget
Copy link
Member

@yannforget yannforget commented Sep 19, 2025

The previous Python lib we were using to interact with the Climate Data Store have been deprecated by ECMWF (https://github.com/ecmwf-projects/datapi). They recommend to switch to ecmwf-datastores-client (https://github.com/ecmwf/ecmwf-datastores-client).

We needed a big update to accommodate the change - plus we always had issue with previous version of the package anyway.

The rewrite uses the new ecmwf-datastores-client instead of the deprecated datapi and reworks the data acquisition pipeline:

  • data requests are optimally chunked in the time dimension and processed as they are completed
  • raw GRIB data is only used as temporary artifacts, data is converted to a Zarr store (https://zarr.dev/) that is a lot better for partial reads and appends
  • spatial & temporal aggregation is also rewritten to be a lot faster

See README: https://github.com/BLSQ/openhexa-toolbox/tree/feat/era5-rewrite/openhexa/toolbox/era5

@EstebanMontandon can you have a look at the README and tell me if the API makes sense to you? Or if there is any missing feature?

@nazarfil this implies 2 new dependencies for the toolbox: zarr and ecmwf-datastores-client. I wonder if we should make the toolbox more modular to avoid having to install all dependencies when you only want to use the toolbox for the hexa or dhis2 modules for example.

@yannforget yannforget marked this pull request as draft September 19, 2025 13:20
@yannforget yannforget requested review from EstebanMontandon and nazarfil and removed request for nazarfil September 19, 2025 13:20
@nazarfil
Copy link
Contributor

@nazarfil this implies 2 new dependencies for the toolbox: zarr and ecmwf-datastores-client. I wonder if we should make the toolbox more modular to avoid having to install all dependencies when you only want to use the toolbox for the hexa or dhis2 modules for example.

Hey, Yeah we could add modular installation options , i can do a pr for this

msg = "Dataset still contains 'step' dimension. Please aggregate to daily data first."
raise ValueError(msg)
da = ds[variable]
area_weights = np.cos(np.deg2rad(ds.latitude))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the weighted area was not considered in the previous implementation right?
Just wondering if this could introduce a noticeable difference with the climate data that has already been imported to DHIS2 PNLP, I guess this will not produce a big difference so it should be fine..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is going to be higher as the distance to the equator increases, so it should be quite low for DRC

download_format="unarchived",
)

max_requests = 100

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, this max request is based on some ERA5 daily download limit that gets reset after 24 hrs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the max limit is 200 active requests in the queue. But it can change depending on the load... :/

weekly_data = aggregate_in_time(
results,
period=Period.WEEK,
agg="mean"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the existing implementations (DRC PNLP and SNT) we have all the functionalities in terms of aggregation we need ;) .

Now I'm afraid I'll have to change the era5 pipelines a bit 🤪
Congrats monsieur, nice job ! 🥇

@yannforget yannforget marked this pull request as ready for review October 1, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants