Skip to content

[Feature Request] AsyncIO and IO concurrency in Fetch phase #18780

@abiesps

Description

@abiesps

Is your feature request related to a problem? Please describe

While executing the fetch phase for the queries, we iterate overall desired document ids to load and fetch the documents using StoredFieldReader sequentially, this is probably fine on faster storage devices. But on slower storage this latency significantly adds up.

I tested the impact of this on a "slower" storage device and fetching 10 documents in a very small data set by making sure none of the data is in page cache.
I executed a simple term query (on this 11Gb) data set and on cold page cache this term query took ~200ms. Out of this ~150ms or so were contributed to fetch phase.

Describe the solution you'd like

Lucene supports prefetching in storedfield reader, that takes docId as an input. Which enable use to prefetch the data for required docId asynchronously. Based on how many documents needs to be fetched we can also think of doing this concurrently (for eg if we want to return 100 documents versus 10 or so).

In some cases fdt files are loaded as niofs files, we can also implement prefetching in niofs using posix fadvices. We probably have to maintain a separate buffer in niofs index input for prefetched data.

Related component

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

SearchSearch query, autocomplete ...etcdiscussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or requestlucene

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions