Skip to content

Historical Segment Cache Loading Strategy on Start-up #18446

@GWphua

Description

@GWphua

Description

  • Propose a configurable startup strategy that eagerly loads only recent (“hot”) segments, while leaving older (“cold”) segments to load lazily on first access.
  • Propose to deprecate druid.segmentCache.lazyLoadOnStart in favour for configs that gives more flexibility to historical's segment cache loading during startup.

Motivation

  • Non-lazy segment loading takes long if Historical segment count is high (observed ~22 minutes per Historical; ~39 hours cluster-wide).
  • Lazy-loading improves startup time but initial queries over hot data can be slow.
  • Many clusters primarily query the last N days/weeks; we can make that slice eager at startup to maintain query performance.

Proposal

Deprecate druid.segmentCache.lazyLoadOnStart in favor of a single strategy-driven config:

New: startupCacheLoadStrategy with options:

  1. loadLazily (all segments lazy)
  2. loadAllEagerly (all segments eager)
  3. loadEagerlyForPeriod (recent window eager, older lazy)

When loadEagerlyForPeriod is selected, require a loadPeriod config (ISO-8601 period, e.g., P7D, P30D).

Backward compatibility and migration

Keep reading druid.segmentCache.lazyLoadOnStart for at least a few more releases with a deprecation warning.
We can map true -> loadLazily, false -> loadAllEagerly.
Using the new startupCacheLoadStrategy overwrites the lazyLoadOnStart setting, [Optional: and a warning is logged if both settings are configured].

The pros of relying on the new config allows us to implement more load strategies that we want.

Config names are open for discussion, do drop some suggestions!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions