Skip to content

TCP Performance Optimisations#706

Draft
cazb2 wants to merge 3 commits intomainfrom
callumb/tcp_optimisations
Draft

TCP Performance Optimisations#706
cazb2 wants to merge 3 commits intomainfrom
callumb/tcp_optimisations

Conversation

@cazb2
Copy link
Copy Markdown
Contributor

@cazb2 cazb2 commented Apr 15, 2026

Got distracted from my thesis and investigated why tcp performance was so bad relative to linux.
I have a lot more data that I collected on the zcu102 driver.
It should be possible with some work to saturated the line rate given the number of idle cycles I was measuring on the zcu102, but I don't have the time at the moment.

I have no intention of merging this PR in its current state, but it seemed sensible to include the data in this draft.

TCP optimisation results.
These result were all measured using the benchmark.py script with four tcp
clients.

They demonstrate significant performance improvements.

On imx8mm throughput improves from 800Mbs -> 900Mbs.
On maaxboard throuhgput improves from 700Mbs -> 900Mbs.
On odroidc4 throughput improves from 500Mbs-> 900Mbs.
On zcu102 throuhgput improves from 725Mbs -> 910Mbs.

Median rtts saw slight improvements.

Future work: is minimising the presences of idle cycles when the system
is not achieving line rate.

cazb2 added 3 commits April 15, 2026 21:42
This commit exposes a common API to interact with physical counters
on arm and x86.

The x86 logic was introduced to improve the performance of the timer driver
see commit # 6454026 on x86.

By definition these are user space counters and therefore may be useful in various
other circumstances on both arm, x86 and likely risc-v as well (which has not been
implemented here).

Since this code is intrinsically architecture specific handling this in a library
will reduce repeated #define checks other applications.
These result were all measured using the benchmark.py script with four tcp
clients.

They demonstrate significant performance improvements.

On imx8mm throughput improves from 800Mbs -> 900Mbs.
On maaxboard throuhgput improves from 700Mbs -> 900Mbs.
On odroidc4 throughput improves from 500Mbs-> 900Mbs.
On zcu102 throuhgput improves from 725Mbs -> 910Mbs.

Median rtts saw slight improvements.

Future work: is minimising the presences of idle cycles when the system
 is not achieving line rate.
@cazb2 cazb2 requested a review from Courtney3141 April 15, 2026 13:12
@cazb2 cazb2 changed the title Callumb/tcp optimisations TCP Performance Optimisations Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant