-
Notifications
You must be signed in to change notification settings - Fork 8
Add preliminary instructions for energy measurement #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add preliminary instructions for energy measurement #38
Conversation
|
|
||
| thisdir=`dirname "$0"` | ||
|
|
||
| jbang wrk2@hyperfoil -t2 -c100 -d20s --rate 2000 --timeout 1s http://localhost:8080/fruits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is ok with any laptop?
If the disk is slow it won't be able to sustain such load, so would be preferrable to make it parametric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite sure it's not ok with any laptop. :)
So yes, it could use the same parameterising logic as the other scripts. We should probably also add some instructions to say something like "do the throughput test and then set a load which is x% of the lowest max when you do the energy test."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I had load which was too low, the test was boring, because everything was almost the same. And you're right that load which is too high will be an incorrect test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the "educational" part is related:
- if you want to measure energy consumption and compare 2 applications both should have the same capacity and the work performed should be the same e.g. there are many car analogies which would work the same for this
- the work performed should be "enough" to use the given capacity: an external bottleneck (e.g. database) could make the application to perform little work, meaning that its cost will be dominated by going idle/awaking - making more complex to draw conclusions on its efficiency
Both are concept which can be challenging to grok...
The latter, in particular, indeed many industrial benchmarks refer to this as modelling power efficiency at different "operational ranges".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to defer my review here for the others. They know more about this stuff than me.
|
Why are we adding energy measurements in this repo? I'm not strictly against it, but it seems like a pandora box, unless we have a very clear reason, and a measurement system which we can stand behind. |
We know Quarkus has excellent energy efficiency, and we talk about it (for example, https://www.redhat.com/en/resources/greener-java-applications-detail). So we want to be able to back up the figures we present publically with (a) rigorous, and regular, measurements in our performance lab and (b) less rigorous, but more accessible, 'home' measurements. I think it would be unwieldy (and unconvincing) to have one application we use for measuring performance, and another we use for measuring energy. It would just contribute to the proliferation of benchmarks that we're currently struggling with, and it would raise questions about why we don't feel confident to measure the energy consumption of the 'main' application. It would also mean we couldn't point out the strong correlation between low energy consumption and high maximum throughput (the "vrrrooooom model"), because we'd be comparing apples and oranges. |
|
I love the idea at high level, but I'm troubled about the "home measurements" - we need to avoid creating another situation in which lots of people will be extrapolating the wrong conclusions from running things the wrong way on their laptop. I agree that ideally it would be easy to reproduce by people at home, but if that's not possible I think we'll need to accept that not everyone will be able to run such measurements w/o access to adequate hardware and a little more effort. |
We're working with @franz1981 on these, the same as we are for the performance scripts. The advantage of having the energy scripts in the same repo as the performance scripts is that it makes it easier to have the @franz1981 oversight, rather than asking him to review independent scripts across two repos.
I don't believe the situation is worse (ie riskier) for energy measurement than it is for performance measurements. In both cases, our strategy is damage reduction. We know people run these kinds of tests on laptops. There's a proliferation of home-rolled benchmarks. We see other people do it at conferences, members of the Quarkus team do it at conferences, and we see blogs which do it. I don't believe that assuming people won't do it if we make it hard is a viable strategy; instead, we need to make it easier for the easy thing to do to be the right thing to do. So we're providing scripts of varying degrees of rigour, and also providing a set of documents which articulate the tradeoffs and explain some of the anti-patterns – and we're going to make sure that even the 'quick and easy' level of our scripts are more rigorous than the 'average' script used for DIY performance measurement. |
|
Just to reiterate, at the moment, this kind of code is tucked away in personal repos. Putting it in a central place makes it much easier for @franz1981 to help us with it and spot pitfalls. That makes it easier for us to coalesce on a more-rigorous set of instructions, instead of having every developer independently invent their own buggy version and show that at conferences. :) |
|
I was thinking about this more in the shower (where all thinking happens™). I think there are three parts to @Sanne's original question:
Should we be enabling/encouraging amateur-hour performance measurement? While it's possible for a reasonable person to form a different opinion, I do have strong feelings about this. There are two sub-aspects to the question. The first is pragmatism and damage reduction. Amateur-hour performance measurement is already happening. I know of at least four members of the Quarkus team who run these kinds of mini-measurements in conference talks. We could ban them from doing it, but that wouldn't stop the wider community. Many people trying Quarkus out for the first time will do some sort of performance comparison, and they won't have access to a proper lab. So what we should do is make the measurements less bad by fixing common gotchas (use hyperfoil, not jmeter, etc). Making something common available also reduce duplicated effort so that we don't have multiple team members wasting time re-inventing similarly imperfect scripts. It also allows us to provide an on-ramp to better quality techniques. We don't have it yet, but I want to add some extra context + discussion about "this is the weakness of doing the measurement in this way, for a more professional result you could consider doing this extra thing." The second sub-aspect is one of trust and robustness. We know Quarkus has better performance than Spring, on every metric, including throughput. We can't have a story that goes something like "Quarkus is faster than Spring, but in order to see this, you have to do your measurements during a full moon, while stood on one leg, using a server with Himalayan silver connectors." Even if that is the best way to do the measurement, it makes the result look fragile - if "Quarkus being faster than Spring" doesn’t reproduce in a range of measurement contexts (including severely suboptimal ones), how can people trust it will reproduce for them? This is why in the Does energy measurement belong in the same place as throughput + memory measurement? To me, this is an easy yes. Energy consumption is so closely related to other performance metrics, and the best practice for measuring energy is extremely similar to the best practice for measuring other performance metrics. If we had a separate repo for the energy aspect, it would have the same application, and many of the scripts would be the same. So it would be a dual-maintenance headache, for no gain. Do we have the expertise to do energy measurement? @Sanne is right that this is a specialised skill, and hard. Not many people (anywhere) know how to do it at the moment. But I think it's important, and I'd like to try and make it more accessible. We're in a unique position, because @metacosm is developing strong expertise in this area, and he's also building out some really nice cross-platform tooling. So we should leverage that. |
I've written down what I did at Øredev to show energy consumption. It's not a perfect process, but I think documenting what we have is best.
cc @metacosm
cc @franz1981