-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Historical Startup -- Configurable loading strategy #18687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java
Fixed
Show fixed
Hide fixed
FrankChen021
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor suggestions
server/src/main/java/org/apache/druid/server/metrics/SegmentStatsMonitor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
Outdated
Show resolved
Hide resolved
abhishekrb19
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feature, @GWphua! I've left some comments.
| |--------|-----------| | ||
| |`loadAllEagerly`|The default startup strategy. The Historical service will load all segment column metadata immediately during the initial startup process.| | ||
| |`loadAllLazily`|To significantly improve historical system startup time, segments are not loaded during the initial startup sequence. Instead, the loading cost is deferred, and will be incurred the first time a segment is referenced by a query.| | ||
| |`loadEagerlyBeforePeriod`|Provides a balance between fast startup and query performance. The Historical service will eagerly load column metadata only for segments that fall within the most recent period defined by `druid.segmentCache.startupLoadPeriod`. Segments outside this recent period will be loaded on-demand when first queried.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How feasible/extensible is it to accept a map of datasource to load period, to allow configurable periods per datasource? (similar to the loadByPeriod - load rules config where each datasource can have different load retention rules)
I think having that option would allow a lot more flexibility to operators as the query workloads can be vastly different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is workable --
I can change startupLoadStrategy.period to startupLoadStrategy.datasourceToPeriodMapping, which receives something like a JSON
e.g.
{"DS1": "P7D", "DS2": "P2D", ".": "P7D"}
Where . refers to the default configuration (since datasources cannot start with .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that let's keep the change in this PR small enough. for datasource level configuration, if there's really need for this feature, we can implement it by defining a datasource level configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How feasible/extensible is it to accept a map of datasource to load period, to allow configurable periods per datasource? (similar to the
loadByPeriod- load rules config where each datasource can have different load retention rules)I think having that option would allow a lot more flexibility to operators as the query workloads can be vastly different.
I feel we can leave this for another PR, since it is out of scope of this intended PR. WDYT? @abhishekrb19
| return startupLoadStrategy == null | ||
| ? isLazyLoadOnStart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have validations that we don't have incompatible configurations enabled accidentally? for example, when lazyLoadOnStart = true and startupLoadStrategy are both set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation ensures that a declared startupLoadStrategy overwrites whatever isLazyLoadOnStart is set to, else isLazyLoadOnStart is used for forward compatibility purposes. Personally, there is no need to provide validation with this overwrite functionality -- though it is relatively simple to implement one if you feel it's needed.
...in/java/org/apache/druid/server/coordination/startup/HistoricalStartupCacheLoadStrategy.java
Show resolved
Hide resolved
.../org/apache/druid/server/coordination/startup/HistoricalStartupCacheLoadStrategyFactory.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordination/startup/LoadEagerlyBeforePeriod.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordination/startup/LoadEagerlyBeforePeriod.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Frank Chen <[email protected]>
| public HistoricalStartupCacheLoadStrategy getStartupCacheLoadStrategy() | ||
| { | ||
| return startupLoadStrategy == null | ||
| ? isLazyLoadOnStart() ? new LoadAllLazilyStrategy() : new LoadAllEagerlyStrategy() |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
SegmentLoaderConfig.isLazyLoadOnStart
|
I guess it is natural to allow more start up strategies since we already support - loadAllLazily and loadAllEagerly. IIUC, the use case here is that we frequently query the data of the last So, maybe the solution could be a mix of the recently added "Virtual Storage Fabric" + some load rules. Load rules would perhaps look something like: Rule 1: @FrankChen021 , @GWphua , would this satisfy your use case? |
I think virtual storage + loading rule is different from what this PR is doing. If the goal of virtual storage will be the dominant mode in future, introducing Virtual storage is just merged and is still under experiemental, I don't know when it will be production ready, and what's the roadmap for it. Making a small change to existing segment cache loading is still worthy. |
|
Thanks for the clarification, @FrankChen021 !
True, as mentioned, I am not opposed to the idea of new startup cache load strategies. It only seems natural. Since load rules already work well with the concept of period-based loading, I hoped it would be more useful for the future to just extend that concept to cover such use cases as well. But if that doesn't cover your use case, I can understand.
I think @clintropolis would have some insights there but I imagine it should see good adoption in the near future. I was hoping you would be one of the early birds! 😉 |
Fixes #18446
Description
This PR belongs to a set of PR's that hope to optimize the start-up time of Historical. I came across this problem when I am running 100+ Historical servers, each needing to process a large number of segments during start-up (~85k). When conducting any updates to the Historical, each segment will take 15.3ms to load, and the start-up time for one historical will easily take >20mins. (Meaning 1.5 days to complete update for all Historical servers!)
I looked to using
lazyLoadOnStartto speed up startup time. Lazy loading processes each segment metadata in 3.23ms, and this shortens the start-up time to ~4min. However, using this strategy will cause some hiccups to query latency when we are trying to conduct an upgrade. I plan to solve this by selectively choosing which segments to load during Historical startup.Finding a middle ground
By studying the usage pattern of my clusters, I noticed that each Historical stores 1 month worth of data, but the heavy querying is only restricted to 7 days, while occasional queries are issued for the time period out of the last 7 days. Hence, I changed the logic of segment loading during Historical startup to provide a configurable time period to load segments eagerly (and the rest lazily). The use of the time period is dependent on the querying habits of the cluster.
Here's a benchmark of the improvements. I also included a test for #18489, which helps me to shave 10s off the start-up time. (Total of 78.5% improvement in loading time)
Release note
Segment loading during Historical service startup is now configurable with
druid.segmentCache.startupLoadStrategy. This new setting allows users to choose between the existing eager loading (loadAllEagerly), a new lazy loading (loadAllLazily) option for faster startups, and a hybrid strategy (loadEagerlyBeforePeriod) that ensures low query latency for the most recent data while deferring the loading cost of older data.Deprecated isLazyLoadOnStart.
Key changed/added classes in this PR
docs/configuration/index.mdSegmentStatsMonitorSegmentLoaderConfigSegmentLocalCacheManagerstartup/HistoricalStartupCacheLoadStrategystartup/HistoricalStartupCacheLoadStrategyFactorystartup/LoadAllEagerlyStrategystartup/LoadAllLazilyStrategystartup/LoadEagerlyBeforePeriodThis PR has: