Skip to content

Conversation

@Menkib64
Copy link
Contributor

This is a proposal how onnx backend could be refactored to add cuda graph support. It hopefully simplifies potential future similar changes for other backends.

Improvement ideas are welcome. Design changes are easier to make before we add support for more backend specific optimizations.

This builds on top of #2307.

borg323 and others added 30 commits October 1, 2025 22:32
Very small batches require a separate optimisation. It costs too much
performance for small sizes if optimising the batch sizes 1. Adding
special optimisation for very small batches won't a simple change which
should be left for future change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants