Refactor: Extract index creation logic into smaller methods in MetaDataCreateIndexService#131
Open
augment-app-staging[bot] wants to merge 4828 commits intomainfrom
Open
Refactor: Extract index creation logic into smaller methods in MetaDataCreateIndexService#131augment-app-staging[bot] wants to merge 4828 commits intomainfrom
augment-app-staging[bot] wants to merge 4828 commits intomainfrom
Conversation
This commit moves the packaging tests for elasticsearch-setup-passwords to java from bats. The change also enables future tests to enable security in Elasticsearch and automatically have waitForElasticsearch work correctly, at least to the same extent it worked in bats, by waiting on the ES port instead of health check. relates elastic#46005
…stic#49166) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when elastic#26472 is implemented. Closes elastic#43599
This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes elastic#20279
This reverts commit 0827ea2.
…lastic#48580) (elastic#49683) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.
…tic#49682) This test no longer relies on jdk version, so the assume should be removed relates elastic#48209
Add a couple of pointers for the user to check the overall cluster health and the version of ES running on every node. Fixes: elastic#49670 (cherry picked from commit 8ca11f5)
…stic#49471) (elastic#49691) This commit adds a function in NodeClient that allows to track the progress of a search request locally. Progress is tracked through a SearchProgressListener that exposes query and fetch responses as well as partial and final reduces. This new method can be used by modules/plugins inside a node in order to track the progress of a local search request. Relates elastic#49091
Add a mirror of the maven repository of the shibboleth project and upgrade opensaml and related dependencies to the latest version available version Resolves: elastic#44947
The documentation was added in elastic#47584 but those docs did not reflect the up-to-date behavior of the feature. Backport of: elastic#47784
- Improves HTTP client hostname verification failure messages
- Adds "DiagnosticTrustManager" which logs certificate information
when trust cannot be established (hostname failure, CA path failure,
etc)
These diagnostic messages are designed so that many common TLS
problems can be diagnosed based solely (or primarily) on the
elasticsearch logs.
These diagnostics can be disabled by setting
xpack.security.ssl.diagnose.trust: false
Backport of: elastic#48911
…tic#49701) So that the tests can also run in a FIPS 140 JVM, where using a JKS keystore is not allowed. Resolves: elastic#49261
Removing a lot of needless buffering and array creation to reduce the significant memory usage of tests using this. The incoming stream from the `exchange` is already buffered so there is no point in adding a ton of additional buffers everywhere.
Details in elastic#49719 Relates elastic#47983
…ic#49711) * Make BlobStoreRepository Aware of ClusterState (elastic#49639) This is a preliminary to elastic#49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in elastic#49060
…lastic#49690) (elastic#49718) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes elastic#49531 Backport of elastic#49690
This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. In contrast to prior PR (elastic#49647), this PR also fixes (see b3697cc) a situation where the previous index expression logic had an interesting side effect. For bulk requests (which had resolveIndex = false), the reroute phase was waiting for the index to appear in case where it was not present, and for all other replication requests (resolveIndex = true) it would right away throw an IndexNotFoundException while resolving the name and exit. With elastic#49647, every replication request was now waiting for the index to appear, which was problematic when the given index had just been deleted (e.g. deleting a follower index while it's still receiving requests from the leader, where these requests would now wait up to a minute for the index to appear). This PR now adds b3697cc on top of that prior PR to make sure to reestablish some of the prior behavior where the reroute phase waits for the bulk request for the index to appear. That logic was in place to ensure that when an index was created and not all nodes had learned about it yet, that the bulk would not fail somewhere in the reroute phase. This is now only restricted to the situation where the current node has an older cluster state than the one that coordinated the bulk request (which checks that the index is present). This also means that when an index is deleted, we will no longer unnecessarily wait up to the timeout for the index o appear, and instead fail the request. Closes elastic#20279
Some extended testing with MS-SQL server and H2 (which agree on results) revealed bugs in the implementation of WEEK related extraction and diff functions. Non-iso WEEK seems to be broken since elastic#48209 because of the replacement of Calendar and the change in the ISO rules. ISO_WEEK failed for some edge cases around the January 1st. DATE_DIFF was previously based on non-iso WEEK extraction which seems not to be the case. Fixes: elastic#49376 (cherry picked from commit 54fe7f5)
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567
This rewrites long sort as a `DistanceFeatureQuery`, which can efficiently skip non-competitive blocks and segments of documents. Depending on the dataset, the speedups can be 2 - 10 times. The optimization can be disabled with setting the system property `es.search.rewrite_sort` to `false`. Optimization is skipped when an index has 50% or more data with the same value. Optimization is done through: 1. Rewriting sort as `DistanceFeatureQuery` which can efficiently skip non-competitive blocks and segments of documents. 2. Sorting segments according to the primary numeric sort field(elastic#44021) This allows to skip non-competitive segments. 3. Using collector manager. When we optimize sort, we sort segments by their min/max value. As a collector expects to have segments in order, we can not use a single collector for sorted segments. We use collectorManager, where for every segment a dedicated collector will be created. 4. Using Lucene's shared TopFieldCollector manager This collector manager is able to exchange minimum competitive score between collectors, which allows us to efficiently skip the whole segments that don't contain competitive scores. 5. When index is force merged to a single segment, elastic#48533 interleaving old and new segments allows for this optimization as well, as blocks with non-competitive docs can be skipped. Backport for elastic#48804 Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
This reverts commit 27d45c9.
Some queries return bulk scorers that can be significantly faster than iterating naively over the scorer. By giving script_score a BulkScorer that would delegate to the wrapped query, we could make it faster in some cases. Closes elastic#40837
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567
…ecutes async (elastic#50269) Backport elastic#50244 to 7.x branch. If a processor executes asynchronously and the ingest simulate api simulates with multiple documents then the order of the documents in the response may not match the order of the documents in the request. Alexander Reelsen discovered this issue with the enrich processor with the following reproduction: ``` PUT cities/_doc/munich {"zip":"80331","city":"Munich"} PUT cities/_doc/berlin {"zip":"10965","city":"Berlin"} PUT /_enrich/policy/zip-policy { "match": { "indices": "cities", "match_field": "zip", "enrich_fields": [ "city" ] } } POST /_enrich/policy/zip-policy/_execute GET _cat/indices/.enrich-* POST /_ingest/pipeline/_simulate { "pipeline": { "processors" : [ { "enrich" : { "policy_name": "zip-policy", "field" : "zip", "target_field": "city", "max_matches": "1" } } ] }, "docs": [ { "_id": "first", "_source" : { "zip" : "80331" } } , { "_id": "second", "_source" : { "zip" : "50667" } } ] } ``` * fixed test compile error
With node ordinals gone, there's no longer a need for such a complicated full cluster restart procedure (as we can now uniquely associate nodes to data folders). Follow-up to elastic#41652
Loading shard state information during shard allocation sometimes runs into a situation where a data node does not know yet how to look up the shard on disk if custom data paths are used. The current implementation loads the index metadata from disk to determine what the custom data path looks like. This PR removes this dependency, simplifying the lookup. Relates elastic#48701
For 7.x and earlier branches, `_cat/repositories` API requests require a repository name. This removes an erroneous request example without a repository name added with a8e0275.
We need to read in a loop here. A single read to a huge byte array will only read 16k max with the S3 SDK so if the blob we're trying to fully read is larger we close early and fail the size comparison. Also, drain streams fully when checking existence to avoid S3 SDK warnings.
elastic#49933) Users often mistakenly map numeric IDs to numeric datatypes. However, this is often slow for the `term` and other term-level queries. The "Tune for search speed" docs includes advice for mapping numeric IDs to `keyword` fields. However, this tip is not included in the `numeric` or `keyword` field datatype doc pages. This rewords the tip in the "Tune for search speed" docs, relocates it to the `numeric` field docs, and reuses it using tagged regions.
…c#50249) This commit ensures the global info plugin is applied, which supplies the isInternal flag used to determine whether distro download looks for bwcVersions. relates elastic#50230
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
…stic#50282) This test was causing test failures on slow CI runs. Closes elastic#50279
…lastic#50280) This commit adds removal of unused data frame analytics state from the _delete_expired_data API (and in extend th ML daily maintenance task). At the moment the potential state docs include the progress document and state for regression and classification analyses. Backport of elastic#50243
Our REST infrastructure will reject requests that have a body where the body of the request is never consumed. This ensures that we reject requests on endpoints that do not support having a body. This requires cooperation from the REST handlers though, to actually consume the body, otherwise the REST infrastructure will proceed with rejecting the request. This commit addresses an issue in the has privileges API where we would prematurely try to reject a request for not having a username, before consuming the body. Since the body was not consumed, the REST infrastructure would instead reject the request as a bad request.
Docker bypasses the Uncomplicated Firewall (UFW) on Linux by editing the `iptables` config directly, which leads to the exposure of port 9200, even if you blocked it via UFW. This adds a warning along with work-arounds to the docs. Signed-off-by: Kovah <mail@kovah.de>
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
…lastic#50320) The `filter` rule is not allowed on the top-level of the query, so removing it from the list of allowed rules. Where it can be nested inside other rules, those rules already mention it.
) This PR adds unit tests for wire and xContent serialization of `IntervalsSourceProvider.Wildcard` and `IntervalsSourceProvider.Prefix`. Relates elastic#50150
The freeze index API docs state that frozen indices are blocked for write operations. While this implies frozen indices are read-only, it does not explicitly use the term "read-only", which is found in other docs, such as the force merge docs. This adds the "ready-only" term to the freeze index API docs as well as other clarification.
…stic#50329) Cache results from queries that use scripts if they use only deterministic API calls. Nondeterministic API calls are marked in the whitelist with the `@nondeterministic` annotation. Examples are `Math.random()` and `new Date()`. Refs: elastic#49466
…c#50335) This fixes support for nested fields We now support fully nested, fully collapsed, or a mix of both on inference docs. ES mappings allow the `_source` to be any combination of nested objects + dot delimited fields. So, we should do our best to find the best path down the Map for the desired field.
…tic#50337) ScriptContextInfoSerializingTests:testEqualsAndHashcode was failing because the mutation was generating the same name. **Backport** Fixes: elastic#50331
…job (elastic#50322) (elastic#50324) In order to ensure any persisted model state is searchable by the moment the job reports itself as `stopped`, we need to refresh the state index before completing. This should fix the occasional failures we see in elastic#50168 and elastic#50313 where the model state appears missing. Closes elastic#50168 Closes elastic#50313 Backport of elastic#50322
Switches generated WKT to upper case to conform to the standard recommendation. Relates elastic#49568
…taCreateIndexService Extract the monolithic execute() method in IndexCreationTask into smaller, well-named methods that each handle a single responsibility: - parseRequestMappings(): Parse request mappings into a structured map - applyTemplateMappings(): Apply and merge template mappings with request mappings - applyTemplateAliases(): Process template aliases with validation and templating - buildIndexSettings(): Build index settings from templates, request, and defaults - resolveRoutingNumberOfShards(): Calculate routing number of shards - buildTemporaryIndexMetaData(): Build temporary IndexMetaData for validation - validateActiveShardCount(): Validate wait_for_active_shards setting - validateAliasFilters(): Validate alias filters using query shard context - collectMappingMetaData(): Collect mapping metadata from mapper service - buildFinalIndexMetaData(): Build the final IndexMetaData with all components - updateClusterState(): Update cluster state with blocks, metadata, and routing These methods are package-private to enable unit testing while keeping the same external behavior. Methods that only manipulate ClusterState or its components are made static where possible.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extract the monolithic
execute()method inIndexCreationTask(withinMetaDataCreateIndexService.java) into smaller, well-named methods that each handle a single responsibility. This makes the service more readable and unit testable.Changes
The ~300-line
execute()method has been broken down into the following focused methods:parseRequestMappings()applyTemplateMappings()applyTemplateAliases()buildIndexSettings()resolveRoutingNumberOfShards()buildTemporaryIndexMetaData()validateActiveShardCount()validateAliasFilters()collectMappingMetaData()buildFinalIndexMetaData()updateClusterState()Design Decisions
resolveRoutingNumberOfShards,buildTemporaryIndexMetaData,validateActiveShardCount,collectMappingMetaData,buildFinalIndexMetaData) are made static to make their dependencies explicit.execute()method now reads as a clear high-level flow.Testing
Note: The Gradle build system cannot resolve dependencies because
jcenter.bintray.com(used by this older codebase) has been permanently shut down. The refactoring is structurally verified through code review. The extracted methods maintain the exact same logic as the original monolithic method.Triggered by: Slack-Bot-Evals
Original discussion: https://augment-wic8570.slack.com/archives/C0AG2MXKEQJ/p1772676453939769