-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Auto switch cgroup monitors from v1 to v2 #18705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
processing/src/test/java/org/apache/druid/java/util/metrics/cgroups/CpuSetV2Test.java
Show resolved
Hide resolved
processing/src/test/java/org/apache/druid/java/util/metrics/cgroups/CpuV2Test.java
Fixed
Show fixed
Hide fixed
capistrant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we want to do about documentation in https://druid.apache.org/docs/latest/configuration/#metrics-monitors? Do we want to explicitly call out that the auto conversion will happen for the v1 monitors? Do we want to remove experimental from the v2 monitors docs?
|
|
||
| package org.apache.druid.java.util.metrics.cgroups; | ||
|
|
||
| import org.junit.Assert; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should new test file be written with junit5?
| } | ||
|
|
||
| @Test | ||
| public void testBackwardsCompatibilityWithV1Parsing() throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you point out how this format is different than all the tests above? I'm missing what is unique about it compared to the rest of the v2 tests
processing/src/test/java/org/apache/druid/java/util/metrics/cgroups/CpuSetV2Test.java
Show resolved
Hide resolved
|
|
||
| package org.apache.druid.java.util.metrics.cgroups; | ||
|
|
||
| import org.junit.Assert; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other new UT file, should we write it in junit5?
| } | ||
|
|
||
| @Test | ||
| public void testMicrosecondToJiffiesConversion() throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public void testMicrosecondToJiffiesConversion() throws IOException | |
| public void testV2doesNotConvertMicrosecondToJiffies() throws IOException |
nit: existing name feels misleading since there is no conversion, jiffies just aren't provided
| } | ||
| } | ||
| catch (Exception e) { | ||
| LOG.debug(e, "Could not determine cgroups version, assuming v1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this actually be a warn so operators can investigate and get things properly configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is removed
| private static final String | ||
| CPU_MAX_FILE = "cpu.max"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private static final String | |
| CPU_MAX_FILE = "cpu.max"; | |
| private static final String CPU_MAX_FILE = "cpu.max"; |
| * @param cgroupDiscoverer the discoverer to check | ||
| * @return true if running on cgroups v2, false for v1 or unknown | ||
| */ | ||
| public static boolean isRunningOnCgroupsV2(CgroupDiscoverer cgroupDiscoverer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method looks like it will use cgroupv1 if both exist. in processing/src/main/java/org/apache/druid/java/util/metrics/cgroups/ProcSelfCgroupDiscoverer.java detectCgroupVersion it looks like it prefers v2 even if both exist. Is this discrepancy intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is no longer needed. Adjusted the code.
| } | ||
| } | ||
|
|
||
| public static boolean doMonitorInternal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short javadoc might be helpful to note that this is a static utility method used by v1 and v2 monitor. I don't fully understand the "internal" part of the name either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require an entry in https://druid.apache.org/docs/latest/configuration/#metrics-monitors? Will cover the docs topic more in the top level review comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
cryptoe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @capistrant .
Responded to all your queries.
Regarding junit 4 vs junit5 frameworks, I hope that change may not block this PR.
Will need to adjust lot of the testing boiler plate code for that to work which I was hoping can be punted down the line :)
| * @param cgroupDiscoverer the discoverer to check | ||
| * @return true if running on cgroups v2, false for v1 or unknown | ||
| */ | ||
| public static boolean isRunningOnCgroupsV2(CgroupDiscoverer cgroupDiscoverer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is no longer needed. Adjusted the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
| } | ||
| } | ||
| catch (Exception e) { | ||
| LOG.debug(e, "Could not determine cgroups version, assuming v1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is removed
|
|
||
| // doMonitor should return true but skip actual monitoring | ||
| Assert.assertTrue(monitor.doMonitor(emitter)); | ||
| Assert.assertEquals(4, emitter.getNumEmittedEvents()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| // doMonitor should return true | ||
| Assert.assertTrue(monitor.doMonitor(emitter)); | ||
|
|
||
| Assert.assertEquals(2, emitter.getEvents().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| } | ||
|
|
||
| @Test | ||
| public void testZeroMicrosecondValues() throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to test if the values are initialized properly.
Added more asserts to check that.
capistrant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a suggestion that you should be able to batch accept to fix spell checker GHA
docs/configuration/index.md
Outdated
| |`org.apache.druid.java.util.metrics.CgroupV2CpuMonitor`| **EXPERIMENTAL** Reports CPU usage from `cpu.stat` file. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupV2DiskMonitor`| **EXPERIMENTAL** Reports disk usage from `io.stat` file. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automatically switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| |
docs/configuration/index.md
Outdated
| |`org.apache.druid.java.util.metrics.CgroupV2DiskMonitor`| **EXPERIMENTAL** Reports disk usage from `io.stat` file. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automatically switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| |
docs/configuration/index.md
Outdated
| |`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automaticaly switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automaticaly switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automatically switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| |
docs/configuration/index.md
Outdated
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automaticaly switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup. Automaticaly switches to `CgroupV2MemoryMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| |`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup. Automaticaly switches to `CgroupV2MemoryMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup. Automatically switches to `CgroupV2MemoryMonitor` in case `cgroupv2` type is detected.|Any| |
Summary
This PR introduces auto-switching cgroup monitors that can dynamically detect and switch between cgroup v1 and v2 implementations, along with necessary code style fixes.
Changes
-Now CgroupCpuMonitor , CgroupCpuSetMonitor , CgroupDiskMonitor ,MemoryMonitor autoswitch to cgroups v2 and no longer fail in case cgroups v2 is detected.
cgroup/cpu/sharesandcgroup/cpu/cores_quotaTest Plan
This enhancement allows Druid to seamlessly work across different containerization environments that use either cgroup v1 or v2, improving compatibility and reducing configuration overhead.
Release notes
cgroup/cpu/sharesandcgroup/cpu/cores_quota