Skip to content

Conversation

@Abhijeet94
Copy link

@Abhijeet94 Abhijeet94 commented Nov 18, 2025

Description

Fixes log rotation reliability by ensuring rotation checks occur even during periods with no log traffic. Previously, log files would only rotate when writes occurred, causing unpredictable rotation timing during zero-traffic periods.

This implementation consolidates log maintenance operations (rotation and flush) into a single unified thread, improving efficiency and maintainability.

Main Changes

Unified maintenance thread: Consolidates rotation and flush operations into a single log-maintenance thread that:

  • Checks rotation every 1 second by calling Write("", 0) on registered contexts
  • Performs flush operations at the configured heartbeat.output-flush-interval (if enabled)
  • Wakes every 500ms for responsive shutdown handling

Registration system: Added RegisterContextForLogRotation() to manage contexts requiring rotation checks, with a linked list tracked via log_rotation_next pointers

Skip I/O for empty buffers: Modified write functions to perform I/O only when buffer_len > 0, allowing rotation checks without actual writes

Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations

Architecture

The solution reuses existing rotation mechanisms without duplicating logic, ensuring compatibility with all current rotation features (SIGHUP, rotate-interval, filename patterns). The heartbeat thread to flush is re-used to trigger log rotation as well.

Ticket

https://redmine.openinfosecfoundation.org/issues/8115

Make sure these boxes are checked accordingly before submitting your Pull Request -- thank you.

Contribution style:

Our Contribution agreements:

Changes (if applicable):

@github-actions
Copy link

NOTE: This PR may contain new authors.

@catenacyber
Copy link
Contributor

Thanks for the conribution, could you check the CI failure ? (SV test about a big dataset)

@github-actions
Copy link

NOTE: This PR may contain new authors.

…iods

Description

Fixes time-based log rotation reliability by ensuring rotation checks occur even during periods with no log traffic. Previously, log files would only rotate when writes occurred, causing unpredictable rotation timing during zero-traffic periods.

Main changes:

- Background rotation thread: Periodically triggers rotation checks by calling Write("", 0) on registered contexts which helps during zero-traffic periods

- Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations

- Registration system: Added RegisterTimeBasedLogRotation() to manage contexts requiring time-based rotation checks

- Skip I/O for empty buffers: Modified write functions to perform I/O only when buffer_len > 0, allowing rotation checks without actual writes

The solution reuses existing rotation mechanisms without duplicating logic, ensuring compatibility with all current rotation features (SIGHUP, rotate-interval, filename patterns).

Ticket: OISF#8115
@victorjulien
Copy link
Member

Hi @Abhijeet94 , there seems to be quite a bit of overlap with this work that was merged in 8.0: #12527. Can that be used/extended address the issue?

@github-actions
Copy link

NOTE: This PR may contain new authors.

@Abhijeet94
Copy link
Author

Hi @Abhijeet94 , there seems to be quite a bit of overlap with this work that was merged in 8.0: #12527. Can that be used/extended address the issue?

Hi @victorjulien , it seems like the other PR is more about flushing of in memory data to disk rather than dealing with file close/log rotation. These two things seem to be independent since former can be required without any log rotation and log rotation may need to be triggered even if there is nothing new to flush.

I checked if there is something that can be re-used. The other PR introduces a function WorkerFlushLogs in log-flush.c that is called periodically. My understanding is that log rotation should be different from log-flush, plus it needs objects/processes defined in util-logopenfile.c, hence can't merge the two functionalities in that same worker thread.

Let me know if you think otherwise, thank you!

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 85.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.21%. Comparing base (626027a) to head (823c31c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14343   +/-   ##
=======================================
  Coverage   84.20%   84.21%           
=======================================
  Files        1012     1012           
  Lines      261769   261804   +35     
=======================================
+ Hits       220415   220470   +55     
+ Misses      41354    41334   -20     
Flag Coverage Δ
fuzzcorpus 63.29% <25.00%> (-0.01%) ⬇️
livemode 18.77% <85.00%> (+0.04%) ⬆️
pcap 44.65% <85.00%> (-0.01%) ⬇️
suricata-verify 65.00% <86.44%> (+0.03%) ⬆️
unittests 59.22% <10.16%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@victorjulien
Copy link
Member

@jlucovsky any thoughts on this? In my mind both efforts revolve around doing house keeping task at some interval. I'm wondering if we can just expand the flush logic to issue the "flush packets" to do more than only flush, but also trigger rotation and possibly other things in the future.

@Abhijeet94
Copy link
Author

Let me make an attempt to re-use the housekeeping thread now that I know that I can refactor to increase its scope considerably.

…iods

Consolidates rotation (1s) and flush (configurable) into single maintenance
thread. Rotation via Write("", 0) on registered contexts with I/O skipped
for empty buffers. Fixes rotate_time update for all rotation types.

Redmine Ticket: 8115
@Abhijeet94
Copy link
Author

Do you know why the approval workflows are not running? Previously they used to start within a few hours. After my recent commit it hasn't started in a day

@victorjulien
Copy link
Member

Do you know why the approval workflows are not running? Previously they used to start within a few hours. After my recent commit it hasn't started in a day

We're all at suricon so paying less attention to github this week. I'll have a look now.

@github-actions
Copy link

NOTE: This PR may contain new authors.

@victorjulien victorjulien marked this pull request as draft November 24, 2025 09:40
@victorjulien
Copy link
Member

"Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations"

This should go into it's own commit.

@victorjulien
Copy link
Member

I've set the PR to draft as the git history is messy and in general we need a new PR when feedback is implemented.

// Threaded eve.json identifier
static SC_ATOMIC_DECL_AND_INIT_WITH_VAL(uint16_t, eve_file_id, 1);

// Log rotation globals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse / update output.cs

static SCMutex output_file_rotation_mutex = SCMUTEX_INITIALIZER;
 
TAILQ_HEAD(, OutputFileRolloverFlag_) output_file_rotation_flags =
    TAILQ_HEAD_INITIALIZER(output_file_rotation_flags);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so (please correct me if I am wrong!). There are several log_ctx in the system. Some are configured to use rotation flag, some are using rotate interval only, some both, some none. The log_ctxs that use rotation flag are kept in the output_file_rotation_flags. But the issue of "Log rotation doesn't work during zero traffic" can happen with any kind of log_ctx.

So I think the list of output_file_rotation_flags will be separate from the one that would hold which ones need log rotation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are injecting a pseudo packet to rotate as per below comment, I think we may be able to do without log_rotation_list as well. I will check that part.

LogFileCtx *ctx = log_rotation_list;

while (ctx != NULL) {
ctx->Write("", 0, ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still want to consider reusing more of the flush logic, so instead of flushing from the side here use the same inject flush packet logic so that the thread that owns the file can do the rotate itself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand the code for that but I am not sure how the flush itself is working.

In output.c, the FlushFunc inside typedef struct RootLogger_ never seems to get initialized. Further, module->PacketFlushFunc never seems to be used in the code. Am I missing something? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlucovsky can you help explain things?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants