Skip to content

checkpoint_storage_concurrent_gb flag is only respected when load_parameters_path is passed #2829

@mkmg

Description

@mkmg

Bug report

I set checkpoint_storage_concurrent_gb to 900.

When I run with load_parameters_path set, I see the following logs:
Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7934458b4c50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 900000000000 (838.2 GiB), restore_concurrent_bytes: 900000000000 (838.2 GiB)

This is as expected. However for runs where A) load_full_state_path is set or B) neither load_full_state_path or load_parameters_path is set, I see the following logs:
Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f107ea088f0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)

Logs/Output

No response

Environment Information

orbax-checkpoint version: 0.11.30

Additional Context

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions