UT fixes by debasatwa29 · Pull Request #45 · pinterest/druid

debasatwa29 · 2022-07-11T21:19:03Z

Fixes #XXXX.

Description

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Key changed/added classes in this PR

MyFoo
OurBar
TheirBaz

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

* Add retention for file request logs * Spelling

* Fixes for the Avatica JDBC driver Correctly implement regular and prepared statements Correctly implement result sets Fix race condition with contexts Clarify when parameters are used Prepare for single-pass through the planner * Addressed review comments * Addressed review comment

…apache#12826)

* remove maxRowsPerSegment where appropriate * fix tutorial, accept suggestions * Update docs/design/coordinator.md * additional tutorial file * fix initial index spec * accept comments * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add back comment on maxrows per segment * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rm duplicate entry * Update native-batch-simple-task.md remove ref to `maxrowspersegment` * Update native-batch.md remove ref to `maxrowspersegment` * final tenticles * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Druid planner now makes only one pass through Calcite planner Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.

add NumericRangeIndex interface and BoundFilter support changes: * NumericRangeIndex interface, like LexicographicalRangeIndex but for numbers * BoundFilter now uses NumericRangeIndex if comparator is numeric and there is no extractionFn * NestedFieldLiteralColumnIndexSupplier.java now supports supplying NumericRangeIndex for single typed numeric nested literal columns * better faster stronger and (ever so slightly) more understandable * more tests, fix bug * fix style

…teralColumnIndexSupplier (apache#12837)

Historicals and middle managers crash with an `UnknownHostException` on trying to load `druid-parquet-extensions` with an ephemeral Hadoop cluster. This happens because the `fs.defaultFS` URI value cannot be resolved at start up time as the hadoop cluster may not exist at startup time. This commit fixes the error by performing initialization of the filesystem in `ParquetInputFormat.createReader()` whenever a new reader is requested.

* Python 3 support for post-index-task. Useful when running on macOS or any other system that doesn't have Python 2. * Encode JSON returned by read_task_file. * Adjust. * Skip needless loads. * Add a decode. * Additional decodes needed.

* Use nonzero default value of maxQueuedBytes. The purpose of this parameter is to prevent the Broker from running out of memory. The prior default is unlimited; this patch changes it to a relatively conservative 25MB. This may be too low for larger clusters. The risk is that throughput can decrease for queries with large resultsets or large amounts of intermediate data. However, I think this is better than the risk of the prior default, which is that these queries can cause the Broker to go OOM. * Alter calculation.

* Log4j bump to 2.18 due to [LOG4J2-3419] * Fixing license issues

* fix nested column sql operator return type inference * oops, final

* Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>

* Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments

…pache#12857)

* Frame processing and channels. Follow-up to apache#12745. This patch adds three new concepts: 1) Frame channels are interfaces for doing nonblocking reads and writes of frames. 2) Frame processors are interfaces for doing nonblocking processing of frames received from input channels and sent to output channels. 3) Cluster-by keys, which can be used for sorting or partitioning. The patch also adds SuperSorter, a user of these concepts, both to illustrate how they are used, and also because it is going to be useful in future work. Central classes: - ReadableFrameChannel. Implementations include BlockingQueueFrameChannel (in-memory channel that implements both interfaces), ReadableFileFrameChannel (file-based channel), ReadableByteChunksFrameChannel (byte-stream-based channel), and others. - WritableFrameChannel. Implementations include BlockingQueueFrameChannel and WritableStreamFrameChannel (byte-stream-based channel). - ClusterBy, a sorting or partitioning key. - FrameProcessor, nonblocking processor of frames. Implementations include FrameChannelBatcher, FrameChannelMerger, and FrameChannelMuxer. - FrameProcessorExecutor, an executor service that runs FrameProcessors. - SuperSorter, a class that uses frame channels and processors to do parallel external merge sort of any amount of data (as long as there is enough disk space). * Additional tests, fixes. * Changes from review. * Better implementation for ReadableInputStreamFrameChannel. * Rename getFrameFileReference -> newFrameFileReference. * Add InterruptedException to runIncrementally; add more tests. * Cancellation adjustments. * Review adjustments. * Refactor BlockingQueueFrameChannel, rename doneReading and doneWriting to close. * Additional changes from review. * Additional changes. * Fix test. * Adjustments. * Adjustments.

…he#12844) * Add check for eternity time segment to SqlSegmentsMetadataQuery * Add check for half eternities * Add multiple segments test * Add failing test to document known issue

Kinesis ingestion requires all shards to have at least 1 record at the required position in druid. Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic. Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively. These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard. If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset. These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number. However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation: The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering) Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read. However, the equality check is exclusive when dealing with UNREAD tokens.

…he#12573)

…e old query view, and updating d3 (apache#13169) * remove old query view * update tests * add filter * fix test * bump d3 things to latest versions * rent too far into the future with d3 * make config dialogs load * goodies * update snapshots * only compute duration when running or pending

…gy. (apache#13177) Error messages can be null. If the incoming error message is null, then return null.

* Clean up dependency in extensions * Bump protobuf/aws.sdk * Bump aws-sdk to 1.12.317 * Fix CI * Fix CI * Update license * Update license

* process: update PR template to include release notes * Update .github/pull_request_template.md [ci skip] Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update .github/pull_request_template.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * incorporate feedback from paul Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Add inline descriptor Protobuf bytes decoder * PR comments * Update tests, check for IllegalArgumentException * Fix license, add equals test * Update extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/InlineDescriptorProtobufBytesDecoder.java Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>

1) Better support for Java 9+ in RuntimeInfo. This means that in many cases, an actual validation can be done. 2) Clearer log message in cases where an actual validation cannot be done.

…n-null values (apache#13211)

This adds min/max functions for CompressedBigDecimal. It exposes these functions via sql (BIG_MAX, BIG_MIN--see the SqlAggFunction implementations). It also includes various bug fixes and cleanup to the original CompressedBigDecimal code include the AggregatorFactories. Various null handling was improved. Additional test cases were added for both new and existing code including a base test case for AggregationFactories. Other tests common across sum,min,max may be refactored also to share the varoius cases in the future.

…ent with vector object selectors (apache#13209) * use object[] instead of string[] for vector expressions to be consistent with vector object selectors * simplify

…pache#13144) * Idle Seekable stream supervisor changes. * nit * nit * nit * Adds unit tests * Supervisor decides it's idle state instead of AutoScaler * docs update * nit * nit * docs update * Adds Kafka unit test * Adds Kafka Integration test. * Updates travis config. * Updates kafka-indexing-service dependencies. * updates previous offsets snapshot & doc * Doesn't act if supervisor is suspended. * Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes. * Reverts Kinesis Supervisor idle behaviour changes. * nit * nit * Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests. * Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too * Adds Kafka Supervisor UT * Improves test coverage in druid-server * Corrects IT override config * Doc updates and Syntactic changes * nit * supervisorSpec.ioConfig.idleConfig changes

…ion math null value handling in default mode (apache#13214) * fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode changes: * json_value 'returning' decimal will now plan to native double typed query instead of ending up with default string typing, allowing decimal vector math expressions to work with this type * vector math expressions now zero out 'null' values even in 'default' mode (druid.generic.useDefaultValueForNull=false) to prevent downstream things that do not check the null vector from producing incorrect results * more better * test and why not vectorize * more test, more fix

…ata (apache#13217) * Add more information to exceptions when writing tmp data to disk * Better error message

This is in preparation for eventually retiring the flag `useMaxMemoryEstimates`, after which the footprint of a value in the dimension dictionary will always be estimated using the `estimateSizeOfValue()` method.

…nary (apache#13133)

It was found that the namespace/cache/heapSizeInBytes metric that tracks the total heap size in bytes of all lookup caches loaded on a service instance was being under reported. We were not accounting for the memory overhead of the String object, which I've found in testing to be ~40 bytes. While this overhead may be java version dependent, it should not vary much, and accounting for this provides a better estimate. Also fixed some logging, and reading bytes from the JDBI result set a little more efficient by saving hash table lookups. Also added some of the lookup metrics to the default statsD emitter metric whitelist.

Minor doc update for `BroadcastTablesTooLarge`. Now the user will know what to do in case this fault is encountered.

* update description of default for query priority * update order * update terms * standardize to query context parameters

typo error corrected.

…he#13195) * follow RFC7232 * Only unquoted strings are processed according to RFC7232. * Add help method and test cases.

We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available: druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks. druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks. Both are set using JSON list format: druid.auth.securedContextKeys=["secretKey1", "secretKey2"] You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys. In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally: sqlQueryId sqlStringifyArrays

…che#13172) Overlord leader election can sometimes fail due to task lock re-acquisition issues. This commit solves the issue by failing such tasks and clearing all their locks.

* SQL: Use timestamp_floor when granularity is not safe. PR apache#12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature. The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000. Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature. * Fix test, address PR comments. * Fix ControllerImpl. * Fix test. * Fix unused import.

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>

Async reads for JDBC: Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests. Fixed race condition in Druid's Avatica server-side handler Fixed issue with no-user connections

@paul-rogers

Tracking additional improvements requested by @paul-rogers: apache#13239 * api: refactor page so that indented bullet is child and unindented portion is parent * get rid of post etc headings and combine them with the endpoint * Update docs/operations/api-reference.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix broken links * fix typo Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

… by the user (apache#13198) In MSQ, there can be an upper limit to the number of worker warnings. For example, for parseExceptions encountered while parsing the external data, the user can specify an upper limit to the number of parse exceptions that can be allowed before it throws an error of type TooManyWarnings. This PR makes it so that if the user disallows warnings of a certain type i.e. the limit is 0 (or is executing in strict mode), instead of throwing an error of type TooManyWarnings, we can directly surface the warning as the error, saving the user from the hassle of going throw the warning reports.

* Refactor Calcite test "framework" for planner tests Refactors the current Calcite tests to make it a bit easier to adjust the set of runtime objects used within a test. * Move data creation out of CalciteTests into TestDataBuilder * Move "framework" creation out of CalciteTests into a QueryFramework * Move injector-dependent functions from CalciteTests into QueryFrameworkUtils * Wrapper around the planner factory, etc. to allow customization. * Bulk of the "framework" created once per class rather than once per test. * Refactor tests to use a test builder * Change all testQuery() methods to use the test builder. Move test execution & verification into a test runner.

rohangarg and others added 30 commits July 27, 2022 12:16

Fix hash calcuation in RendezvousHasher (apache#12817)

bf0886a

Add retention for file request logs (apache#12559)

93a9a4b

* Add retention for file request logs * Spelling

Allow dictionary encoded column to use a more generic index interface (…

24c345c

…apache#12826)

add missing selectors for explicit null columns (apache#12834)

d96a9c1

fix: fix broken link to Class TTest (apache#12836)

553ff47

Fix string first/last aggregator comparator (apache#12773)

7ae6cc6

add DictionaryEncodedStringValueIndex implementation to NestedFieldLi…

6046a39

…teralColumnIndexSupplier (apache#12837)

Fix flakiness in query-retry ITs (apache#12818)

eabce8a

fix bugs with nested column jsonpath parser (apache#12831)

6981b1c

Log4j bump to 2.18 due to [LOG4J2-3419] (apache#12847)

3290b49

* Log4j bump to 2.18 due to [LOG4J2-3419] * Fixing license issues

Readme - link fix to build guide (apache#12849)

0a4ed3b

Fix kinesis IT flakiness (apache#12821)

fbd1a07

fix get task may be null (apache#12100)

6f5c143

fix nested column sql operator return type inference (apache#12851)

623b075

* fix nested column sql operator return type inference * oops, final

fix expression plan type inference to correctly handle complex types (a…

73cfc4e

…pache#12857)

Fix typo in compaction.md (apache#12774)

c6dd9dd

Add check for eternity time segment to SqlSegmentsMetadataQuery (apac…

24f8f9e

…he#12844) * Add check for eternity time segment to SqlSegmentsMetadataQuery * Add check for half eternities * Add multiple segments test * Add failing test to document known issue

fix(druid-indexing): failed to get shardSpec for interval issue (apac…

9f8982a

…he#12573)

Fix NPE when applying a transform that outputs to __time (apache#12870)

2045a13

vogievetsky and others added 30 commits October 7, 2022 12:44

Fix null message handling in AllowedRegexErrorResponseTransformStrate…

5b519f3

…gy. (apache#13177) Error messages can be null. If the incoming error message is null, then return null.

Dependency cleanup (apache#13194)

d30cf8c

* Clean up dependency in extensions * Bump protobuf/aws.sdk * Bump aws-sdk to 1.12.317 * Fix CI * Fix CI * Update license * Update license

Improve direct-memory check on startup. (apache#13207)

c19ae13

1) Better support for Java 9+ in RuntimeInfo. This means that in many cases, an actual validation can be done. 2) Clearer log message in cases where an actual validation cannot be done.

fix issue with nested column null value index incorrectly matching no…

9688674

…n-null values (apache#13211)

use object[] instead of string[] for vector expressions to be consist…

59e2afc

…ent with vector object selectors (apache#13209) * use object[] instead of string[] for vector expressions to be consistent with vector object selectors * simplify

Add more information to exceptions occurred while writing temporary d…

548d0d0

…ata (apache#13217) * Add more information to exceptions when writing tmp data to disk * Better error message

Make DimensionDictionary abstract (apache#13215)

346fbf1

This is in preparation for eventually retiring the flag `useMaxMemoryEstimates`, after which the footprint of a value in the dimension dictionary will always be estimated using the `estimateSizeOfValue()` method.

Composite approach for checking in-filter values set in column dictio…

45dfd67

…nary (apache#13133)

Minor doc update for BroadcastTablesTooLarge (apache#13218)

9d51e46

Minor doc update for `BroadcastTablesTooLarge`. Now the user will know what to do in case this fault is encountered.

Docs: update description of query priority default value (apache#13191)

02ad62a

* update description of default for query priority * update order * update terms * standardize to query context parameters

Update nested-columns.md (apache#13227)

42384d8

typo error corrected.

Support to read task logs from some S3 compatible cloud storage (apac…

6332c57

…he#13195) * follow RFC7232 * Only unquoted strings are processed according to RFC7232. * Add help method and test cases.

Docs: Add query/cpu/time to real-time metrics. (apache#13229)

3bbb76f

Fix Overlord leader election when task lock re-acquisition fails (apa…

b88e1c2

…che#13172) Overlord leader election can sometimes fail due to task lock re-acquisition issues. This commit solves the issue by failing such tasks and clearing all their locks.

Collocated processes instructions (apache#13224)

cc10350

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>

Fix race condition in HttpPostEmitter (apache#13237)

9763bf8

Merge branch 'pinterest:master' into master

10e106d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UT fixes#45

UT fixes#45
debasatwa29 wants to merge 658 commits intopinterest:druid_mainfrom
debasatwa29:master

debasatwa29 commented Jul 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

debasatwa29 commented Jul 11, 2022

Description

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Key changed/added classes in this PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants