Refactor: Extract index creation logic into smaller methods in MetaDataCreateIndexService by augment-app-staging[bot] · Pull Request #131 · augmentcode/elasticsearch

augment-app-staging · 2026-03-05T02:21:11Z

Summary

Extract the monolithic execute() method in IndexCreationTask (within MetaDataCreateIndexService.java) into smaller, well-named methods that each handle a single responsibility. This makes the service more readable and unit testable.

Changes

The ~300-line execute() method has been broken down into the following focused methods:

Method	Visibility	Responsibility
`parseRequestMappings()`	package-private	Parse request mappings into a structured map
`applyTemplateMappings()`	package-private	Apply and merge template mappings with request mappings
`applyTemplateAliases()`	package-private	Process template aliases with validation and templating
`buildIndexSettings()`	package-private	Build index settings from templates, request, and defaults
`resolveRoutingNumberOfShards()`	package-private static	Calculate routing number of shards
`buildTemporaryIndexMetaData()`	package-private static	Build temporary IndexMetaData for validation
`validateActiveShardCount()`	package-private static	Validate wait_for_active_shards setting
`validateAliasFilters()`	package-private	Validate alias filters using query shard context
`collectMappingMetaData()`	package-private static	Collect mapping metadata from mapper service
`buildFinalIndexMetaData()`	package-private static	Build the final IndexMetaData with all components
`updateClusterState()`	package-private	Update cluster state with blocks, metadata, and routing

Design Decisions

Package-private visibility: Methods are package-private to enable unit testing from the same package while not exposing internals to the public API.
Static where possible: Methods that don't depend on instance state (e.g., resolveRoutingNumberOfShards, buildTemporaryIndexMetaData, validateActiveShardCount, collectMappingMetaData, buildFinalIndexMetaData) are made static to make their dependencies explicit.
Behavior-preserving: This is a pure refactoring - no behavioral changes. The execute() method now reads as a clear high-level flow.

Testing

Note: The Gradle build system cannot resolve dependencies because jcenter.bintray.com (used by this older codebase) has been permanently shut down. The refactoring is structurally verified through code review. The extracted methods maintain the exact same logic as the original monolithic method.

Triggered by: Slack-Bot-Evals
Original discussion: https://augment-wic8570.slack.com/archives/C0AG2MXKEQJ/p1772676453939769

This commit moves the packaging tests for elasticsearch-setup-passwords to java from bats. The change also enables future tests to enable security in Elasticsearch and automatically have waitForElasticsearch work correctly, at least to the same extent it worked in bats, by waiting on the ES port instead of health check. relates elastic#46005

…stic#49166) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when elastic#26472 is implemented. Closes elastic#43599

This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes elastic#20279

This reverts commit 0827ea2.

…lastic#48580) (elastic#49683) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.

…tic#49682) This test no longer relies on jdk version, so the assume should be removed relates elastic#48209

…48557) (elastic#49673) Closes elastic#48475

Add a couple of pointers for the user to check the overall cluster health and the version of ES running on every node. Fixes: elastic#49670 (cherry picked from commit 8ca11f5)

…stic#49471) (elastic#49691) This commit adds a function in NodeClient that allows to track the progress of a search request locally. Progress is tracked through a SearchProgressListener that exposes query and fetch responses as well as partial and final reduces. This new method can be used by modules/plugins inside a node in order to track the progress of a local search request. Relates elastic#49091

…49649) (elastic#49681)

Add a mirror of the maven repository of the shibboleth project and upgrade opensaml and related dependencies to the latest version available version Resolves: elastic#44947

The documentation was added in elastic#47584 but those docs did not reflect the up-to-date behavior of the feature. Backport of: elastic#47784

- Improves HTTP client hostname verification failure messages - Adds "DiagnosticTrustManager" which logs certificate information when trust cannot be established (hostname failure, CA path failure, etc) These diagnostic messages are designed so that many common TLS problems can be diagnosed based solely (or primarily) on the elasticsearch logs. These diagnostics can be disabled by setting xpack.security.ssl.diagnose.trust: false Backport of: elastic#48911

…tic#49701) So that the tests can also run in a FIPS 140 JVM, where using a JKS keystore is not allowed. Resolves: elastic#49261

Removing a lot of needless buffering and array creation to reduce the significant memory usage of tests using this. The incoming stream from the `exchange` is already buffered so there is no point in adding a ton of additional buffers everywhere.

Details in elastic#49719 Relates elastic#47983

…ic#49711) * Make BlobStoreRepository Aware of ClusterState (elastic#49639) This is a preliminary to elastic#49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in elastic#49060

…lastic#49690) (elastic#49718) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes elastic#49531 Backport of elastic#49690

This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. In contrast to prior PR (elastic#49647), this PR also fixes (see b3697cc) a situation where the previous index expression logic had an interesting side effect. For bulk requests (which had resolveIndex = false), the reroute phase was waiting for the index to appear in case where it was not present, and for all other replication requests (resolveIndex = true) it would right away throw an IndexNotFoundException while resolving the name and exit. With elastic#49647, every replication request was now waiting for the index to appear, which was problematic when the given index had just been deleted (e.g. deleting a follower index while it's still receiving requests from the leader, where these requests would now wait up to a minute for the index to appear). This PR now adds b3697cc on top of that prior PR to make sure to reestablish some of the prior behavior where the reroute phase waits for the bulk request for the index to appear. That logic was in place to ensure that when an index was created and not all nodes had learned about it yet, that the bulk would not fail somewhere in the reroute phase. This is now only restricted to the situation where the current node has an older cluster state than the one that coordinated the bulk request (which checks that the index is present). This also means that when an index is deleted, we will no longer unnecessarily wait up to the timeout for the index o appear, and instead fail the request. Closes elastic#20279

…elastic#49714) (elastic#49722)

Some extended testing with MS-SQL server and H2 (which agree on results) revealed bugs in the implementation of WEEK related extraction and diff functions. Non-iso WEEK seems to be broken since elastic#48209 because of the replacement of Calendar and the change in the ISO rules. ISO_WEEK failed for some edge cases around the January 1st. DATE_DIFF was previously based on non-iso WEEK extraction which seems not to be the case. Fixes: elastic#49376 (cherry picked from commit 54fe7f5)

Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567

This rewrites long sort as a `DistanceFeatureQuery`, which can efficiently skip non-competitive blocks and segments of documents. Depending on the dataset, the speedups can be 2 - 10 times. The optimization can be disabled with setting the system property `es.search.rewrite_sort` to `false`. Optimization is skipped when an index has 50% or more data with the same value. Optimization is done through: 1. Rewriting sort as `DistanceFeatureQuery` which can efficiently skip non-competitive blocks and segments of documents. 2. Sorting segments according to the primary numeric sort field(elastic#44021) This allows to skip non-competitive segments. 3. Using collector manager. When we optimize sort, we sort segments by their min/max value. As a collector expects to have segments in order, we can not use a single collector for sorted segments. We use collectorManager, where for every segment a dedicated collector will be created. 4. Using Lucene's shared TopFieldCollector manager This collector manager is able to exchange minimum competitive score between collectors, which allows us to efficiently skip the whole segments that don't contain competitive scores. 5. When index is force merged to a single segment, elastic#48533 interleaving old and new segments allows for this optimization as well, as blocks with non-competitive docs can be skipped. Backport for elastic#48804 Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>

This reverts commit 27d45c9.

Some queries return bulk scorers that can be significantly faster than iterating naively over the scorer. By giving script_score a BulkScorer that would delegate to the wrapped query, we could make it faster in some cases. Closes elastic#40837

Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567

…ecutes async (elastic#50269) Backport elastic#50244 to 7.x branch. If a processor executes asynchronously and the ingest simulate api simulates with multiple documents then the order of the documents in the response may not match the order of the documents in the request. Alexander Reelsen discovered this issue with the enrich processor with the following reproduction: ``` PUT cities/_doc/munich {"zip":"80331","city":"Munich"} PUT cities/_doc/berlin {"zip":"10965","city":"Berlin"} PUT /_enrich/policy/zip-policy { "match": { "indices": "cities", "match_field": "zip", "enrich_fields": [ "city" ] } } POST /_enrich/policy/zip-policy/_execute GET _cat/indices/.enrich-* POST /_ingest/pipeline/_simulate { "pipeline": { "processors" : [ { "enrich" : { "policy_name": "zip-policy", "field" : "zip", "target_field": "city", "max_matches": "1" } } ] }, "docs": [ { "_id": "first", "_source" : { "zip" : "80331" } } , { "_id": "second", "_source" : { "zip" : "50667" } } ] } ``` * fixed test compile error

…elastic#50236)

With node ordinals gone, there's no longer a need for such a complicated full cluster restart procedure (as we can now uniquely associate nodes to data folders). Follow-up to elastic#41652

Loading shard state information during shard allocation sometimes runs into a situation where a data node does not know yet how to look up the shard on disk if custom data paths are used. The current implementation loads the index metadata from disk to determine what the custom data path looks like. This PR removes this dependency, simplifying the lookup. Relates elastic#48701

For 7.x and earlier branches, `_cat/repositories` API requests require a repository name. This removes an erroneous request example without a repository name added with a8e0275.

We need to read in a loop here. A single read to a huge byte array will only read 16k max with the S3 SDK so if the blob we're trying to fully read is larger we close early and fail the size comparison. Also, drain streams fully when checking existence to avoid S3 SDK warnings.

elastic#49933) Users often mistakenly map numeric IDs to numeric datatypes. However, this is often slow for the `term` and other term-level queries. The "Tune for search speed" docs includes advice for mapping numeric IDs to `keyword` fields. However, this tip is not included in the `numeric` or `keyword` field datatype doc pages. This rewords the tip in the "Tune for search speed" docs, relocates it to the `numeric` field docs, and reuses it using tagged regions.

…c#50249) This commit ensures the global info plugin is applied, which supplies the isInternal flag used to determine whether distro download looks for bwcVersions. relates elastic#50230

elastic#50276) (elastic#50290)

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

…stic#50282) This test was causing test failures on slow CI runs. Closes elastic#50279

…lastic#50280) This commit adds removal of unused data frame analytics state from the _delete_expired_data API (and in extend th ML daily maintenance task). At the moment the potential state docs include the progress document and state for regression and classification analyses. Backport of elastic#50243

Our REST infrastructure will reject requests that have a body where the body of the request is never consumed. This ensures that we reject requests on endpoints that do not support having a body. This requires cooperation from the REST handlers though, to actually consume the body, otherwise the REST infrastructure will proceed with rejecting the request. This commit addresses an issue in the has privileges API where we would prematurely try to reject a request for not having a username, before consuming the body. Since the body was not consumed, the REST infrastructure would instead reject the request as a bad request.

Docker bypasses the Uncomplicated Firewall (UFW) on Linux by editing the `iptables` config directly, which leads to the exposure of port 9200, even if you blocked it via UFW. This adds a warning along with work-arounds to the docs. Signed-off-by: Kovah <mail@kovah.de>

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

…lastic#50320) The `filter` rule is not allowed on the top-level of the query, so removing it from the list of allowed rules. Where it can be nested inside other rules, those rules already mention it.

) This PR adds unit tests for wire and xContent serialization of `IntervalsSourceProvider.Wildcard` and `IntervalsSourceProvider.Prefix`. Relates elastic#50150

The freeze index API docs state that frozen indices are blocked for write operations. While this implies frozen indices are read-only, it does not explicitly use the term "read-only", which is found in other docs, such as the force merge docs. This adds the "ready-only" term to the freeze index API docs as well as other clarification.

…stic#50329) Cache results from queries that use scripts if they use only deterministic API calls. Nondeterministic API calls are marked in the whitelist with the `@nondeterministic` annotation. Examples are `Math.random()` and `new Date()`. Refs: elastic#49466

…c#50335) This fixes support for nested fields We now support fully nested, fully collapsed, or a mix of both on inference docs. ES mappings allow the `_source` to be any combination of nested objects + dot delimited fields. So, we should do our best to find the best path down the Map for the desired field.

…tic#50337) ScriptContextInfoSerializingTests:testEqualsAndHashcode was failing because the mutation was generating the same name. **Backport** Fixes: elastic#50331

…job (elastic#50322) (elastic#50324) In order to ensure any persisted model state is searchable by the moment the job reports itself as `stopped`, we need to refresh the state index before completing. This should fix the occasional failures we see in elastic#50168 and elastic#50313 where the model state appears missing. Closes elastic#50168 Closes elastic#50313 Backport of elastic#50322

Switches generated WKT to upper case to conform to the standard recommendation. Relates elastic#49568

…taCreateIndexService Extract the monolithic execute() method in IndexCreationTask into smaller, well-named methods that each handle a single responsibility: - parseRequestMappings(): Parse request mappings into a structured map - applyTemplateMappings(): Apply and merge template mappings with request mappings - applyTemplateAliases(): Process template aliases with validation and templating - buildIndexSettings(): Build index settings from templates, request, and defaults - resolveRoutingNumberOfShards(): Calculate routing number of shards - buildTemporaryIndexMetaData(): Build temporary IndexMetaData for validation - validateActiveShardCount(): Validate wait_for_active_shards setting - validateAliasFilters(): Validate alias filters using query shard context - collectMappingMetaData(): Collect mapping metadata from mapper service - buildFinalIndexMetaData(): Build the final IndexMetaData with all components - updateClusterState(): Update cluster state with blocks, metadata, and routing These methods are package-private to enable unit testing while keeping the same external behavior. Methods that only manipulate ClusterState or its components are made static where possible.

rjernst and others added 30 commits November 27, 2019 16:03

Fix java 8 compile for Files.readString usage

b236076

elastic#49166 Fix spurious test failure

3573250

Remove obsolete resolving logic from TRA (elastic#49647)

0827ea2

This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes elastic#20279

Revert "Remove obsolete resolving logic from TRA (elastic#49647)"

04e9cbd

This reverts commit 0827ea2.

New Histogram field mapper that supports percentiles aggregations. (e…

326fe75

…lastic#48580) (elastic#49683) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.

Enable LicenceServiceTests for all jdks (elastic#49440) backport(elas…

e528b41

…tic#49682) This test no longer relies on jdk version, so the assume should be removed relates elastic#48209

Pure disjunctions should rewrite to a MatchNoneQueryBuilder (elastic#…

1824a2f

…48557) (elastic#49673) Closes elastic#48475

[Docs] Enhance rolling upgrade guide (elastic#49686)

d5842ae

Add a couple of pointers for the user to check the overall cluster health and the version of ES running on every node. Fixes: elastic#49670 (cherry picked from commit 8ca11f5)

Upgrade lucene to 8.4.0-snapshot-e648d601efb (elastic#49641)

2dafecc

[7.x] Remove ClassInfo interface and BinaryClassInfo class. (elastic#…

1425e30

…49649) (elastic#49681)

[7.x] Update opensaml dependency (elastic#44972) (elastic#49512)

ba0c848

Add a mirror of the maven repository of the shibboleth project and upgrade opensaml and related dependencies to the latest version available version Resolves: elastic#44947

Correct the documentation for create_doc privilege (elastic#49354)

31f13e8

The documentation was added in elastic#47584 but those docs did not reflect the up-to-date behavior of the feature. Backport of: elastic#47784

Use PEM files instead of a JKS for key material (elastic#49625) (elas…

a59b7e0

…tic#49701) So that the tests can also run in a FIPS 140 JVM, where using a JKS keystore is not allowed. Resolves: elastic#49261

Mute MixedClusterClientYamlTestSuiteIT (elastic#49721)

3d525e1

Details in elastic#49719 Relates elastic#47983

Replace usages of XPackPlugin with the LocalStateCompositeXPackPlugin (…

d9162c1

…elastic#49714) (elastic#49722)

[Docs] Fix typo in templates.asciidoc (elastic#49726)

dcb9d51

Revert "Deprecate sorting in reindex (elastic#49458)"

1d745f1

This reverts commit 27d45c9.

martijnvg and others added 30 commits December 17, 2019 12:27

[ML] Remove usage of base action logger in ml actions (elastic#50074) (…

098f540

…elastic#50236)

Simplify InternalTestCluster.fullRestart (elastic#50218)

1f98158

With node ordinals gone, there's no longer a need for such a complicated full cluster restart procedure (as we can now uniquely associate nodes to data folders). Follow-up to elastic#41652

[DOCS] Correct cat snapshots API request example (elastic#50274)

99fdea5

For 7.x and earlier branches, `_cat/repositories` API requests require a repository name. This removes an erroneous request example without a repository name added with a8e0275.

[DOCS] Fix typo in Create API key docs (elastic#50233)

afe9ee3

Ensure global buildinfo plugin is applied for distro download (elasti…

e345963

…c#50249) This commit ensures the global info plugin is applied, which supplies the isInternal flag used to determine whether distro download looks for bwcVersions. relates elastic#50230

[DOCS] Move transform resource definitions into APIs (elastic#50108)

30d6682

Pass processConnectTimeout to the method that fetches C++ process' PID (

ac974c3

elastic#50276) (elastic#50290)

[DOCS] Adds GET, GET stats and DELETE inference APIs (elastic#50224)

5759a26

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

Increase timeout on FollowIndexSecurityIT.testAutoFollowPatterns (ela…

8208692

…stic#50282) This test was causing test failures on slow CI runs. Closes elastic#50279

[DOCS] Add 7.5.1 release notes (elastic#50196)

ec11c04

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

[Docs] Remove intervals filter rule from allowed top-level rules (e…

bf63f24

…lastic#50320) The `filter` rule is not allowed on the top-level of the query, so removing it from the list of allowed rules. Where it can be nested inside other rules, those rules already mention it.

Add tests for IntervalsSourceProvider.Wildcard and Prefix (elastic#50306

ef54a9c

) This PR adds unit tests for wire and xContent serialization of `IntervalsSourceProvider.Wildcard` and `IntervalsSourceProvider.Prefix`. Relates elastic#50150

Add version 6.8.7 constant

8439b27

Add 7.5.2 version.

35a88a5

[DOCS] Fixes security links

246d926

[TEST] Exclude name on ScriptContextInfo mutate (elastic#50332) (elas…

9cdbcbd

…tic#50337) ScriptContextInfoSerializingTests:testEqualsAndHashcode was failing because the mutation was generating the same name. **Backport** Fixes: elastic#50331

[DOCS] Document thread_pool node stats (elastic#50330)

73ca2e5

Geo: Switch generated WKT to upper case (elastic#50285)

c77ca98

Switches generated WKT to upper case to conform to the standard recommendation. Relates elastic#49568

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Extract index creation logic into smaller methods in MetaDataCreateIndexService#131

Refactor: Extract index creation logic into smaller methods in MetaDataCreateIndexService#131
augment-app-staging[bot] wants to merge 4828 commits intomainfrom
eval-504606be

augment-app-staging bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

augment-app-staging bot commented Mar 5, 2026

Summary

Changes

Design Decisions

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants