Skip to content

Iteration Order Dependency in MovingAverageIterableTest #18651

@EdwinIngJ

Description

@EdwinIngJ

I was able to reproduce similar errors while running the test org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtMiddle as the errors mentioned in this previous PR for org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtTheEnd here: #17264.

Digging deeper I found the root cause to be line 227 in MovingAverageIterable.java where the averagerKey.iterator() is used to traverse the Set<Map<String,Object>> averagerKeys. The iteration order of the Set is not guaranteed to have any specific ordering. For example if I modify lines 225 like such:

-            Set<Map<String, Object>> averagerKeys = new HashSet<>(averagers.keySet());
+            Set<Map<String, Object>> averagerKeys = new HashSet<>(101); // Set initial capacity
+            averagerKeys.addAll(averagers.keySet());
             averagerKeys.removeAll(seenKeys);
             averagersKeysIter = averagerKeys.iterator();
             cacheIter = null;

Initializing the HashSet to some initial capacity (101), both the extensions-contrib/moving-average-query,org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtMiddle and org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtTheEndfail consistently with the following errors:

[INFO] Running org.apache.druid.query.movingaverage.MovingAverageIterableTest
[ERROR] Tests run: 4, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.811 s <<< FAILURE! -- in org.apache.druid.query.movingaverage.MovingAverageIterableTest
[ERROR] org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtMiddle -- Time elapsed: 0.734 s <<< FAILURE!
org.junit.ComparisonFailure: expected:<[u]> but was:<[f]>
	at org.junit.Assert.assertEquals(Assert.java:117)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtMiddle(MovingAverageIterableTest.java:521)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)

and

[INFO] Running org.apache.druid.query.movingaverage.MovingAverageIterableTest
[ERROR] Tests run: 4, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.763 s <<< FAILURE! -- in org.apache.druid.query.movingaverage.MovingAverageIterableTest
[ERROR] org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtTheEnd -- Time elapsed: 0.691 s <<< FAILURE!
org.junit.ComparisonFailure: expected:<[u]> but was:<[f]>
	at org.junit.Assert.assertEquals(Assert.java:117)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at org.apache.druid.query.movingaverage.MovingAverageIterableTest.testMissingDataAtTheEnd(MovingAverageIterableTest.java:441)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)

, respectively.

Should the iteration order change both this test and the test mentioned in the PR above would fail.

According to the observations above, I propose solution to make these tests more robust by using a set comparison for the lines where the test asserts for the presence of the injections such as

Set<String> expectedGenders = new HashSet<>(Arrays.asList("u", "f"));
Set<String> actualGenders = new HashSet<>();

Assert.assertTrue(iter.hasNext());
result = iter.next();
actualGenders.add((result.getDimension("gender")).get(0));
Assert.assertEquals(JAN_2, (result.getTimestamp()));

Assert.assertTrue(iter.hasNext());
result = iter.next();
actualGenders.add((result.getDimension("gender")).get(0));
Assert.assertEquals(JAN_2, (result.getTimestamp()));

Assert.assertEquals(expectedGenders, actualGenders);

This would not reduce the coverage of these tests.

@kfaraz Would you like me to open a PR for this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions