Skip to content

vermicelli: correct AVX-512 nvermicelli tail scan masking bug#378

Open
byeonguk-jeong wants to merge 1 commit intoVectorCamp:developfrom
AhnLab-OSSG:vermicelli-avx512-fix
Open

vermicelli: correct AVX-512 nvermicelli tail scan masking bug#378
byeonguk-jeong wants to merge 1 commit intoVectorCamp:developfrom
AhnLab-OSSG:vermicelli-avx512-fix

Conversation

@byeonguk-jeong
Copy link

In nvermicelliExecReal, the tail path loads a vector from (buf_end - S) so the unprocessed tail bytes sit at the END of the vector (high offsets). However, first_zero_match_inverted<64> applies a mask selecting only the FIRST 'len' bytes (low offsets), which means it re-checks the already-scanned overlap region and completely misses the actual tail bytes.

This only affects AVX-512 (S=64) because the 16-byte and 32-byte specializations of first_zero_match_inverted mark 'len' as UNUSED and always check the full vector.

Fix by passing S instead of (buf_end - d) as the length, so the full vector is checked. The overlap bytes are guaranteed to already match, so no false positives are possible, and the existing (rv < buf_end) guard prevents out-of-range results.

Fixes: 87d8b35 ("Feature/refactor fdr (#251)")

@AhnLab-OSS @AhnLab-OSSG

In nvermicelliExecReal, the tail path loads a vector from
(buf_end - S) so the unprocessed tail bytes sit at the END of the
vector (high offsets). However, first_zero_match_inverted<64>
applies a mask selecting only the FIRST 'len' bytes (low offsets),
which means it re-checks the already-scanned overlap region and
completely misses the actual tail bytes.

This only affects AVX-512 (S=64) because the 16-byte and 32-byte
specializations of first_zero_match_inverted mark 'len' as UNUSED and
always check the full vector.

Fix by passing S instead of (buf_end - d) as the length, so the full
vector is checked. The overlap bytes are guaranteed to already match,
so no false positives are possible, and the existing (rv < buf_end)
guard prevents out-of-range results.

Fixes: 87d8b35 ("Feature/refactor fdr (VectorCamp#251)")

Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant