Skip to content

Conversation

@IntegratedQuantum
Copy link
Member

The compiler is poor at optimizing search loops like this, since it doesn't have enough information (it must not access the memory beyond the last entry and thus is stuck iterating element by element).

So I decided to try and throw some SIMD on this, and the easiest way to use SIMD is possible is to align the vector and blindly access memory beyond the length limit, and it actually is not a lot more complex than the non-SIMD code (see godbolt).

I also did some measurements, but getting the palette is not a common operation, and thus overall performance impact is below 1%, the given function measured alone though is about 17-50% faster depending on the use case.
But to be honest this is mostly about fixing the worst-case performance (which of course I did not care to measure :​P)

  • cleanup, move function, orelse, maybe some name improvements

fixes #318

@Argmaster
Copy link
Collaborator

How did you measure performance impact?

@Argmaster Argmaster moved this to WIP/not ready for review in PRs to review Dec 15, 2025
@IntegratedQuantum
Copy link
Member Author

I looked at it with a sampling profiler, not terribly precise I know. Maybe I'll make a better worst-case benchmark for this in the future.

@Argmaster Argmaster moved this from WIP/not ready for review to Easy to Review in PRs to review Dec 20, 2025
@Argmaster Argmaster moved this from Easy to Review to WIP/not ready for review in PRs to review Dec 20, 2025
@IntegratedQuantum IntegratedQuantum moved this from WIP/not ready for review to In review in PRs to review Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

Searching the palette of the palette compressed light chunks uses a slow linear search

2 participants