Skip to content

Conversation

@yaananth
Copy link

@yaananth yaananth commented Nov 26, 2025

Disclaimer: Used copilot to investigate the crash we faced and took it's help to write this

Summary

Fixes a SIGSEGV crash in the Azure Kusto output plugin when processing buffered backlog data on startup.

Root Cause

The ingest_all_chunks() function had nested mk_list_foreach_safe loops that both used the same tmp variable as the iterator:

mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {     // outer loop
    ...
    mk_list_foreach_safe(f_head, tmp, &fs_stream->files) {  // inner loop - SAME tmp!

The mk_list_foreach_safe macro stores the "next" pointer in its second argument for safe iteration during list modification. When the inner loop overwrote tmp, it corrupted the outer loop's iteration state, causing undefined behavior and eventually a SIGSEGV when the outer loop tried to continue iteration with a corrupted pointer.

Crash Stack Trace (from production)

[engine] caught signal (SIGSEGV)
#0 flush_init() at plugins/out_azure_kusto/azure_kusto.c:1169
#1 cb_azure_kusto_flush() at plugins/out_azure_kusto/azure_kusto.c:1257
#2 co_init() at lib/monkey/deps/flb_libco/amd64.c:117

Fix

Add a dedicated f_tmp variable for the inner loop to prevent iterator corruption:

struct mk_list *tmp;
struct mk_list *f_tmp;  // NEW: dedicated inner loop iterator
...
mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {
    ...
    mk_list_foreach_safe(f_head, f_tmp, &fs_stream->files) {  // Use f_tmp

Testing

  • Added flb_test_azure_kusto_buffering_backlog regression test that exercises the buffering/backlog restart code path
  • All existing azure_kusto tests continue to pass
  • Verified build compiles without warnings
Test json_invalid...                            [ OK ]
Test managed_identity_system...                 [ OK ]
Test managed_identity_user...                   [ OK ]
Test service_principal...                       [ OK ]
Test workload_identity...                       [ OK ]
Test buffering_backlog...                       [ OK ]
SUCCESS: All unit tests have passed.

Changes

  • plugins/out_azure_kusto/azure_kusto.c: Add f_tmp iterator variable for inner loop (+2 lines, -1 line)
  • tests/runtime/out_azure_kusto.c: Add buffering backlog regression test (+73 lines)

Summary by CodeRabbit

  • Refactor

    • Improved internal iteration safety and stability; no public API changes.
  • Tests

    • Added an end-to-end buffering/backlog regression test that verifies two-phase write-and-restart behavior with cleanup.
    • Added runtime helpers and test scaffolding to support buffered-backlog testing and temporary resource cleanup.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

Updated the Azure Kusto output plugin to use a dedicated inner safety iterator variable; added a new runtime test that verifies buffering/backlog processing with a two-phase Fluent Bit run and accompanying recursive cleanup helpers.

Changes

Cohort / File(s) Summary
Iterator variable fix
plugins/out_azure_kusto/azure_kusto.c
Introduced a local f_tmp variable and switched the inner mk_list_foreach_safe to use f_tmp as the safety iterator while retaining f_head for element access. No public API changes.
Buffering backlog test & helpers
tests/runtime/out_azure_kusto.c
Added portability guards and new system headers, implemented flb_kusto_unlink_cb and flb_kusto_rm_rf helpers for recursive removal, declared and implemented flb_test_azure_kusto_buffering_backlog, and registered the test in TEST_LIST. The test runs twice (create disk-backed backlog, restart to process backlog) and performs buffer-dir cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to focus on:
    • Correctness of mk_list_foreach_safe usage with the new f_tmp iterator.
    • Test reliability: timing, deterministic waits, and teardown.
    • Portability and correctness of recursive file removal helpers (flb_kusto_rm_rf).

Suggested labels

backport to v4.1.x

Suggested reviewers

  • edsiper
  • koleini

Poem

🐇 I hopped through loops and found a tidy spot,

f_tmp kept watch where list edges are fraught.
I tucked some carrots on disk with care,
Restarted the burrow — backlog cleared from there.
Happy hops for tests and tidy code, so spry!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: fixing a SIGSEGV crash caused by nested mk_list_foreach_safe loops in the Azure Kusto plugin.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03323cf and 5602e1f.

📒 Files selected for processing (1)
  • tests/runtime/out_azure_kusto.c (5 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • tests/runtime/out_azure_kusto.c
🔇 Additional comments (6)
tests/runtime/out_azure_kusto.c (6)

3-5: LGTM!

Proper portability guard for POSIX.1-2008 features (nftw, stat, etc.) used by the test helpers.


27-31: LGTM!

All required headers for the test helpers and buffering_backlog test are present.


35-38: LGTM!

The nftw callback signature is correct and the implementation is appropriate for test cleanup.


40-49: LGTM!

The recursive removal logic is correct for test cleanup. The stat() check makes it idempotent, and the FTW_DEPTH | FTW_PHYS flags are appropriate for safe deletion.


59-59: LGTM!

Test prototype and registration follow the standard Fluent Bit test pattern.

Also applies to: 68-68


243-330: Well-structured regression test for the nested loop SIGSEGV fix.

The two-phase approach effectively exercises the backlog processing path (ingest_all_chunks) that previously crashed due to nested mk_list_foreach_safe with shared iterators. Key strengths:

  • PID-unique buffer path ensures test isolation
  • Cleanup before and after prevents stale data interference
  • First run buffers data to disk, second run processes backlog on startup
  • If the SIGSEGV occurs, the test will crash, validating the fix
  • Past review concerns about directory cleanup and mkdir error handling have been addressed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yaananth yaananth force-pushed the fix/azure-kusto-nested-loop-sigsegv branch from 8d08559 to 55130bb Compare November 26, 2025 11:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/runtime/out_azure_kusto.c (1)

217-286: Consider adding test directory cleanup for reliability.

The test uses a hardcoded buffer directory /tmp/fluent-bit-kusto-test/ without cleanup. Leftover files from previous runs could cause flaky behavior or mask test failures.

Consider adding cleanup before and after the test:

 void flb_test_azure_kusto_buffering_backlog(void)
 {
     int i;
     int ret;
     int bytes;
     char sample[] = "{\"k\":\"v\"}";
     size_t sample_size = sizeof(sample) - 1;
     flb_ctx_t *ctx;
     int in_ffd;
     int out_ffd;
+    
+    /* Clean up any leftover files from previous runs */
+    system("rm -rf /tmp/fluent-bit-kusto-test/");
 
     /* First run: enable buffering and write data to disk */

And at the end:

     flb_stop(ctx);
     flb_destroy(ctx);
+
+    /* Clean up test directory */
+    system("rm -rf /tmp/fluent-bit-kusto-test/");
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae14454 and 8d08559.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (3 hunks)
🔇 Additional comments (2)
plugins/out_azure_kusto/azure_kusto.c (1)

410-431: LGTM! Correct fix for the nested loop iterator collision.

The introduction of f_tmp as a dedicated temporary iterator for the inner mk_list_foreach_safe loop properly resolves the SIGSEGV. Previously, both loops shared tmp, causing the inner loop to overwrite the outer loop's saved "next" pointer, corrupting iteration state upon returning to the outer loop.

tests/runtime/out_azure_kusto.c (1)

33-33: LGTM! Forward declaration and test list entry properly added.

Also applies to: 42-42

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/runtime/out_azure_kusto.c (1)

244-245: Consider test isolation and cleanup for the buffer directory.

The hardcoded path /tmp/fluent-bit-kusto-test/ may cause issues with parallel test runs or leave stale files between runs. Consider:

  1. Using a unique temp directory (e.g., with mkdtemp pattern or PID suffix).
  2. Cleaning up the buffer directory at test start and/or end.

For a regression test focused on preventing SIGSEGV, this is a minor concern, but cleanup would prevent accumulation of test artifacts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d08559 and 55130bb.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/out_azure_kusto/azure_kusto.c
🧰 Additional context used
🧬 Code graph analysis (1)
tests/runtime/out_azure_kusto.c (1)
src/flb_lib.c (10)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_input_set (305-335)
  • flb_output (279-289)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_lib_push (843-870)
🔇 Additional comments (2)
tests/runtime/out_azure_kusto.c (2)

33-33: LGTM!

The function declaration and TEST_LIST registration follow the existing patterns in this file.

Also applies to: 42-42


260-285: Regression test structure looks correct for its purpose.

The two-phase approach (write buffered data, restart to process backlog) correctly exercises the ingest_all_chunks code path where the nested loop SIGSEGV occurred. If the test completes without crashing, the fix is validated.

For enhanced verification, you could optionally add a comment clarifying that the test's success criterion is "no crash" rather than output validation.

@yaananth yaananth force-pushed the fix/azure-kusto-nested-loop-sigsegv branch from 55130bb to ea67b3e Compare November 26, 2025 11:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/runtime/out_azure_kusto.c (1)

280-283: Consider adding verification that backlog processing occurred.

While the test successfully exercises the buffering/backlog code path (its primary goal is crash prevention), it could be more robust by verifying that:

  • Buffered data was actually loaded from disk
  • Backlog processing completed
  • The buffer directory state changed appropriately

This is optional since the test's main purpose—ensuring no SIGSEGV—is achieved if execution completes successfully.

Example verification (optional):

/* After line 283, verify buffer directory state or add logging check */
/* Note: This would require inspecting flb logs or buffer directory contents */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55130bb and ea67b3e.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/out_azure_kusto/azure_kusto.c
🔇 Additional comments (1)
tests/runtime/out_azure_kusto.c (1)

33-33: LGTM!

The function prototype and test list entry follow the established pattern in the file.

Also applies to: 42-42

The ingest_all_chunks() function had nested mk_list_foreach_safe loops
that both used the same 'tmp' variable as the iterator. The macro stores
the 'next' pointer in its second argument for safe iteration during list
modification. When the inner loop overwrote 'tmp', it corrupted the outer
loop's iteration state, causing undefined behavior and a SIGSEGV crash
when processing buffered backlog data on startup.

Fix: Add a dedicated 'f_tmp' variable for the inner loop to prevent
iterator corruption.

Also adds a regression test (buffering_backlog) that exercises the
buffering/backlog restart code path to guard against future regressions.

Signed-off-by: Yash Ananth <[email protected]>
Signed-off-by: Yashwanth Anantharaju <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/runtime/out_azure_kusto.c (2)

40-49: Consider checking nftw return value.

The nftw return value is not checked. While this is acceptable for cleanup code, checking it and logging failures would improve debuggability if cleanup doesn't complete.

Apply this diff to add return value checking:

 static void flb_kusto_rm_rf(const char *path)
 {
+    int ret;
     struct stat st;
 
     if (stat(path, &st) != 0) {
         return;
     }
 
-    nftw(path, flb_kusto_unlink_cb, 64, FTW_DEPTH | FTW_PHYS);
+    ret = nftw(path, flb_kusto_unlink_cb, 64, FTW_DEPTH | FTW_PHYS);
+    if (ret != 0) {
+        fprintf(stderr, "Warning: nftw cleanup failed for %s\n", path);
+    }
 }

286-291: Consider verifying that buffered chunks were created.

The test could verify that files were written to buffer_dir after phase 1 to confirm buffering actually occurred. This would make the test more robust and explicit about what it's testing.

Add verification after phase 1 (after line 294):

    /* Verify that buffering wrote files to disk */
    struct stat st;
    ret = stat(buffer_dir, &st);
    TEST_CHECK(ret == 0);
    /* Could also check that directory is not empty using readdir */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9e2389 and 03323cf.

📒 Files selected for processing (1)
  • tests/runtime/out_azure_kusto.c (5 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • tests/runtime/out_azure_kusto.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/runtime/out_azure_kusto.c
🧬 Code graph analysis (1)
tests/runtime/out_azure_kusto.c (1)
src/flb_lib.c (10)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_input_set (305-335)
  • flb_output (279-289)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_lib_push (843-870)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
🔇 Additional comments (4)
tests/runtime/out_azure_kusto.c (4)

3-5: LGTM!

The _XOPEN_SOURCE 700 definition is necessary for nftw() (used in the cleanup helpers) and properly guarded against redefinition.


27-31: LGTM!

All includes are necessary for the new cleanup helpers and test implementation (getpid, stat, mkdir, nftw, PATH_MAX).


59-59: LGTM!

Test prototype and registration follow the established pattern in this test file.


243-325: Effective regression test for the SIGSEGV fix.

The two-phase approach correctly simulates the restart scenario that triggered the nested loop bug. The test exercises the ingest_all_chunks() code path with buffered backlog, which is the critical path for this regression.

Using PID for the buffer directory effectively addresses the previous review concern about hardcoded paths and test isolation.

Signed-off-by: Yashwanth Anantharaju <[email protected]>
@yaananth yaananth force-pushed the fix/azure-kusto-nested-loop-sigsegv branch from 03323cf to 5602e1f Compare December 4, 2025 21:27
@yaananth
Copy link
Author

@edsiper @cosmo0920 👋 any feedback on this PR? thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant