Skip to content

Conversation

@puzpuzpuz
Copy link

@puzpuzpuz puzpuzpuz commented Jan 9, 2026

The race condition happens when per-cpu arenas and background threads features are enabled:

  1. Thread A calls malloc() first -> starts malloc_init_hard()
  2. While Thread A is still in malloc_init_hard(), Threads B, C, D also call malloc()
  3. These threads trigger arena creation (with percpu_arena, each CPU gets its own arena)
  4. arena_init() -> arena_new_create_background_thread() -> background_thread_create(arena_ind=12, etc.)
  5. Thread A hasn't reached line 2232 (background_thread_create(tsd, 0)) yet
  6. So a non-zero arena can mark thread slot 0 as "started" before arena 0 does

The initialization has locks, but arena creation for different arenas can proceed in parallel as soon as malloc_init_state = initialized and init_lock is released, and the background_thread_create(tsd, 0) call happens late in malloc_init_hard() (after arenas are already being created by other threads).

This bug won't be noticeable for applications that have single-threaded initialization path, but in case of JVM I observed the following threads allocating concurrently at startup:

Thread Description
java Main JVM thread
C1 CompilerThre JIT C1 compiler thread
JVMCI-native Co GraalVM JVMCI compiler thread
GC Thread#5, 10, 15 G1 garbage collector threads
shared-network_ QuestDB network worker threads
Thread-1 Generic Java thread

Reproducer

Can be reproduced with QuestDB's rt binary on Linux and the following script:

#!/bin/bash

cd <project_dir>/target/questdb-9.2.4-SNAPSHOT-rt-linux-x86-64

for i in {1..50}; do
  echo "=== Run $i ==="
  ./bin/questdb.sh start -d <qdb_root>
  sleep 2
  pid=$(pgrep -f QuestDB-Runtime)
  if [ -n "$pid" ]; then
    count=$(sudo gdb -p $pid -batch -ex "info threads" 2>/dev/null | grep -c jemalloc_bg_thd)
    echo "jemalloc_bg_thd threads: $count"
  else
    echo "QuestDB not running"
  fi
  ./bin/questdb.sh stop
  sleep 1
done

When jemalloc_bg_thd threads: 0 gets printed, it means that jemalloc's background threads didn't start due to the race.

@puzpuzpuz puzpuzpuz self-assigned this Jan 9, 2026
@puzpuzpuz
Copy link
Author

To run tests:

./autogen.sh
make
make check

@puzpuzpuz puzpuzpuz added the bug Something isn't working label Jan 9, 2026
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 9, 2026 21:23
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_fix_startup_race branch from c2fb6f9 to 0260423 Compare January 12, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants