Skip to content

BinaryMemoTable.Exists causes heap allocation per call due to closure escape #736

@laskoviymishka

Description

@laskoviymishka

Describe the enhancement requested

Found this during research around apache/iceberg-go#818 (comment)
The is_in kernel for binary types allocates once per input element because the []byte value escapes to the heap through a closure chain. This makes is_in on binary arrays ~2x slower than a manual map[string]struct{} lookup.

The call chain in the is_in hot path:

  1. visitBinary slices rawBytes[offsets[pos]:offsets[pos+1]] and passes it to valid(slice) callback
  2. The callback in isInKernelExec calls state.Lookup.Exists(v)
  3. BinaryMemoTable.Exists calls b.lookup(hash, val)
  4. lookup calls b.tbl.Lookup(h, func(i int32) bool { return bytes.Equal(val, ...) })

The closure at step 4 captures val []byte. Since it's passed to HashTable.Lookup (which takes cmp func(T) bool), Go's escape analysis conservatively moves val to the heap. This propagates up — the slice created at step 1 also escapes.

Result: 1 heap allocation per input element, regardless of whether the value exists in the set.

Evidence

Profiling is_in on a 1M-element Binary array with a 10-element value set:

255203025 94.92%  BinaryMemoTable.Exists    (flat alloc_objects)
 10726303  3.99%  BinaryMemoTable.InsertOrGet

~1M allocs from Exists alone, matching the input size exactly.

Validated Fix

Using the noescape unsafe.Pointer trick (same pattern as Go runtime's runtime/stubs.go):

func noescape(p unsafe.Pointer) unsafe.Pointer {
    x := uintptr(p)
    return unsafe.Pointer(x ^ 0 ^ 0)
}

func noescapeBytes(val []byte) []byte {
    p := noescape(unsafe.Pointer(unsafe.SliceData(val)))
    return unsafe.Slice((*byte)(p), len(val))
}

Combined with isInBinaryDirect that iterates the binary array directly (bypassing visitBinary/VisitBitBlocksShort closure chain) and calls ExistsDirect which inlines the hash probe loop:

Before: 1,000,231 allocs/op at 1M rows
After: 225 allocs/op at 1M rows (4,445x reduction)

Speed improved ~10% (51ms -> 35ms for 1M binary keys against 10-element value set).

Suggested Fix Options

  1. noescape trick (validated, minimal change) — add ExistsDirect to BinaryMemoTable + isInBinaryDirect to the kernel
  2. Restructure HashTable.Lookup to avoid the cmp func(T) bool closure — e.g., add LookupEqual(hash uint64, val T, eq func(a, b T) bool) where the comparator is a package-level function (won't capture locals)
  3. Change BinaryMemoTable.Exists signature to accept (buf []byte, offset, length int) to avoid sub-slicing

Reproduction

Any is_in call with Binary type input will show ~1 alloc per element in benchmarks.

Component(s)

Benchmarking

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions