Add preliminary instructions for energy measurement #38

holly-cummins · 2025-11-07T11:04:16Z

I've written down what I did at Øredev to show energy consumption. It's not a perfect process, but I think documenting what we have is best.

cc @metacosm
cc @franz1981

franz1981 · 2025-11-07T11:08:10Z

scripts/energy-stress-and-stop.sh

+
+thisdir=`dirname "$0"`
+
+jbang wrk2@hyperfoil -t2 -c100 -d20s --rate 2000 --timeout 1s http://localhost:8080/fruits


Are you sure this is ok with any laptop?

If the disk is slow it won't be able to sustain such load, so would be preferrable to make it parametric

I'm quite sure it's not ok with any laptop. :)

So yes, it could use the same parameterising logic as the other scripts. We should probably also add some instructions to say something like "do the throughput test and then set a load which is x% of the lowest max when you do the energy test."

When I had load which was too low, the test was boring, because everything was almost the same. And you're right that load which is too high will be an incorrect test.

Yep, the "educational" part is related:

if you want to measure energy consumption and compare 2 applications both should have the same capacity and the work performed should be the same e.g. there are many car analogies which would work the same for this

the work performed should be "enough" to use the given capacity: an external bottleneck (e.g. database) could make the application to perform little work, meaning that its cost will be dominated by going idle/awaking - making more complex to draw conclusions on its efficiency

Both are concept which can be challenging to grok...
The latter, in particular, indeed many industrial benchmarks refer to this as modelling power efficiency at different "operational ranges".

I'm going to defer my review here for the others. They know more about this stuff than me.

Sanne · 2025-11-11T16:18:26Z

Why are we adding energy measurements in this repo?

I'm not strictly against it, but it seems like a pandora box, unless we have a very clear reason, and a measurement system which we can stand behind.

holly-cummins · 2025-11-11T16:41:26Z

Why are we adding energy measurements in this repo?

I'm not strictly against it, but it seems like a pandora box, unless we have a very clear reason, and a measurement system which we can stand behind.

We know Quarkus has excellent energy efficiency, and we talk about it (for example, https://www.redhat.com/en/resources/greener-java-applications-detail). So we want to be able to back up the figures we present publically with (a) rigorous, and regular, measurements in our performance lab and (b) less rigorous, but more accessible, 'home' measurements.

I think it would be unwieldy (and unconvincing) to have one application we use for measuring performance, and another we use for measuring energy. It would just contribute to the proliferation of benchmarks that we're currently struggling with, and it would raise questions about why we don't feel confident to measure the energy consumption of the 'main' application. It would also mean we couldn't point out the strong correlation between low energy consumption and high maximum throughput (the "vrrrooooom model"), because we'd be comparing apples and oranges.

Sanne · 2025-11-11T17:15:36Z

I love the idea at high level, but I'm troubled about the "home measurements" - we need to avoid creating another situation in which lots of people will be extrapolating the wrong conclusions from running things the wrong way on their laptop.
Can we try to find a pragmatic solution which the people from the performance team would find at least acceptable?

I agree that ideally it would be easy to reproduce by people at home, but if that's not possible I think we'll need to accept that not everyone will be able to run such measurements w/o access to adequate hardware and a little more effort.

holly-cummins · 2025-11-11T20:30:24Z

I love the idea at high level, but I'm troubled about the "home measurements" - we need to avoid creating another situation in which lots of people will be extrapolating the wrong conclusions from running things the wrong way on their laptop. Can we try to find a pragmatic solution which the people from the performance team would find at least acceptable?

We're working with @franz1981 on these, the same as we are for the performance scripts. The advantage of having the energy scripts in the same repo as the performance scripts is that it makes it easier to have the @franz1981 oversight, rather than asking him to review independent scripts across two repos.

I agree that ideally it would be easy to reproduce by people at home, but if that's not possible I think we'll need to accept that not everyone will be able to run such measurements w/o access to adequate hardware and a little more effort.

I don't believe the situation is worse (ie riskier) for energy measurement than it is for performance measurements. In both cases, our strategy is damage reduction. We know people run these kinds of tests on laptops. There's a proliferation of home-rolled benchmarks. We see other people do it at conferences, members of the Quarkus team do it at conferences, and we see blogs which do it. I don't believe that assuming people won't do it if we make it hard is a viable strategy; instead, we need to make it easier for the easy thing to do to be the right thing to do. So we're providing scripts of varying degrees of rigour, and also providing a set of documents which articulate the tradeoffs and explain some of the anti-patterns – and we're going to make sure that even the 'quick and easy' level of our scripts are more rigorous than the 'average' script used for DIY performance measurement.

holly-cummins · 2025-11-11T20:38:32Z

Just to reiterate, at the moment, this kind of code is tucked away in personal repos. Putting it in a central place makes it much easier for @franz1981 to help us with it and spot pitfalls. That makes it easier for us to coalesce on a more-rigorous set of instructions, instead of having every developer independently invent their own buggy version and show that at conferences. :)

holly-cummins · 2025-11-12T09:40:16Z

I was thinking about this more in the shower (where all thinking happens™). I think there are three parts to @Sanne's original question:

Should we be enabling/encouraging amateur-hour performance measurement? (independent of the energy measurements of this pr)
Does energy measurement belong in the same place as throughput + memory measurement?
Do we have the expertise to do energy measurement?

Should we be enabling/encouraging amateur-hour performance measurement? While it's possible for a reasonable person to form a different opinion, I do have strong feelings about this. There are two sub-aspects to the question. The first is pragmatism and damage reduction. Amateur-hour performance measurement is already happening. I know of at least four members of the Quarkus team who run these kinds of mini-measurements in conference talks. We could ban them from doing it, but that wouldn't stop the wider community. Many people trying Quarkus out for the first time will do some sort of performance comparison, and they won't have access to a proper lab. So what we should do is make the measurements less bad by fixing common gotchas (use hyperfoil, not jmeter, etc). Making something common available also reduce duplicated effort so that we don't have multiple team members wasting time re-inventing similarly imperfect scripts. It also allows us to provide an on-ramp to better quality techniques. We don't have it yet, but I want to add some extra context + discussion about "this is the weakness of doing the measurement in this way, for a more professional result you could consider doing this extra thing."

The second sub-aspect is one of trust and robustness. We know Quarkus has better performance than Spring, on every metric, including throughput. We can't have a story that goes something like "Quarkus is faster than Spring, but in order to see this, you have to do your measurements during a full moon, while stood on one leg, using a server with Himalayan silver connectors." Even if that is the best way to do the measurement, it makes the result look fragile - if "Quarkus being faster than Spring" doesn’t reproduce in a range of measurement contexts (including severely suboptimal ones), how can people trust it will reproduce for them? This is why in the stress.sh script, I've tried to make everything very short and familiar, so that people can see there's no trickery. It reduces the rigour, but arguably improves the trust.

Does energy measurement belong in the same place as throughput + memory measurement? To me, this is an easy yes. Energy consumption is so closely related to other performance metrics, and the best practice for measuring energy is extremely similar to the best practice for measuring other performance metrics. If we had a separate repo for the energy aspect, it would have the same application, and many of the scripts would be the same. So it would be a dual-maintenance headache, for no gain.

Do we have the expertise to do energy measurement? @Sanne is right that this is a specialised skill, and hard. Not many people (anywhere) know how to do it at the moment. But I think it's important, and I'd like to try and make it more accessible. We're in a unique position, because @metacosm is developing strong expertise in this area, and he's also building out some really nice cross-platform tooling. So we should leverage that.

Add preliminary instructions for energy measurement

38d799d

holly-cummins requested a review from edeandrea November 7, 2025 11:04

franz1981 suggested changes Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add preliminary instructions for energy measurement #38

Add preliminary instructions for energy measurement #38

Uh oh!

holly-cummins commented Nov 7, 2025

Uh oh!

franz1981 Nov 7, 2025

Uh oh!

holly-cummins Nov 7, 2025

Uh oh!

holly-cummins Nov 7, 2025

Uh oh!

franz1981 Nov 7, 2025 •

edited

Loading

Uh oh!

edeandrea Nov 7, 2025

Uh oh!

Sanne commented Nov 11, 2025

Uh oh!

holly-cummins commented Nov 11, 2025

Uh oh!

Sanne commented Nov 11, 2025

Uh oh!

holly-cummins commented Nov 11, 2025 •

edited

Loading

Uh oh!

holly-cummins commented Nov 11, 2025

Uh oh!

holly-cummins commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		thisdir=`dirname "$0"`

		jbang wrk2@hyperfoil -t2 -c100 -d20s --rate 2000 --timeout 1s http://localhost:8080/fruits

Add preliminary instructions for energy measurement #38

Are you sure you want to change the base?

Add preliminary instructions for energy measurement #38

Uh oh!

Conversation

holly-cummins commented Nov 7, 2025

Uh oh!

franz1981 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

holly-cummins Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

holly-cummins Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

franz1981 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edeandrea Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Sanne commented Nov 11, 2025

Uh oh!

holly-cummins commented Nov 11, 2025

Uh oh!

Sanne commented Nov 11, 2025

Uh oh!

holly-cummins commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holly-cummins commented Nov 11, 2025

Uh oh!

holly-cummins commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

franz1981 Nov 7, 2025 •

edited

Loading

holly-cummins commented Nov 11, 2025 •

edited

Loading

holly-cummins commented Nov 12, 2025 •

edited

Loading