Skip to content

Conversation

@mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Oct 31, 2025

What this PR does:
This fixes a potential hash collision of StringArray static MapKey. Similar to #5716 (comment) we need to write a separator between values, so that "foo","bar" hashes to a different value than "foobar". Also ports the missing clone of attribute buffers to vparquet5, which was causing panics that caused me to come across this potential hash collision.

A few other changes:

  • Removes unnecessary computations of MapKey in the compare() function (we only need to do it once). This is about a 4% improvement in memory.
  • Replaces unsafe.Slice/unsafe.StringData with []byte(string) which the compiler can optimize in this case to be zero alloc (confirmed with benchmark).

Which issue(s) this PR fixes:
Fixes n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…re-computations in compare() function, replace usage of unsafe.StringData
Comment on lines 755 to 756
_, _ = h.Write(seedBytes)
_, _ = h.Write([]byte(str))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use a similar approach for avoiding hash collisions for the dedicated columns in block_meta.go:L377
I'm not sure what the best method is, but it would be nice to use the same in both places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify - are you thinking the same seed/separator bytes is enough? Or perhaps shared code. There is a small difference here in that an empty string is ok for attribute values, but not for attribute name in the dedicated columns config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just suggesting to use the same separator and seed for consistency, not necessarily sharing the same code. It could also be a nice touch to switch from xxhash to fnv (as we don't use xxhash anywhere else).
Mainly suggesting this to make things a bit more consistent. Totally fine if you think it’s out of scope for this PR - don’t want to block it

@mdisibio mdisibio merged commit 454dd73 into grafana:main Nov 12, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants