Draft
Conversation
This commit exposes a common API to interact with physical counters on arm and x86. The x86 logic was introduced to improve the performance of the timer driver see commit # 6454026 on x86. By definition these are user space counters and therefore may be useful in various other circumstances on both arm, x86 and likely risc-v as well (which has not been implemented here). Since this code is intrinsically architecture specific handling this in a library will reduce repeated #define checks other applications.
These result were all measured using the benchmark.py script with four tcp clients. They demonstrate significant performance improvements. On imx8mm throughput improves from 800Mbs -> 900Mbs. On maaxboard throuhgput improves from 700Mbs -> 900Mbs. On odroidc4 throughput improves from 500Mbs-> 900Mbs. On zcu102 throuhgput improves from 725Mbs -> 910Mbs. Median rtts saw slight improvements. Future work: is minimising the presences of idle cycles when the system is not achieving line rate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Got distracted from my thesis and investigated why tcp performance was so bad relative to linux.
I have a lot more data that I collected on the zcu102 driver.
It should be possible with some work to saturated the line rate given the number of idle cycles I was measuring on the zcu102, but I don't have the time at the moment.
I have no intention of merging this PR in its current state, but it seemed sensible to include the data in this draft.
TCP optimisation results.
These result were all measured using the benchmark.py script with four tcp
clients.
They demonstrate significant performance improvements.
On imx8mm throughput improves from 800Mbs -> 900Mbs.
On maaxboard throuhgput improves from 700Mbs -> 900Mbs.
On odroidc4 throughput improves from 500Mbs-> 900Mbs.
On zcu102 throuhgput improves from 725Mbs -> 910Mbs.
Median rtts saw slight improvements.
Future work: is minimising the presences of idle cycles when the system
is not achieving line rate.