Skip to content

Conversation

@RomanSofyin
Copy link

When fsync=N parameter specified and the final IO block issued happens to be Nth the fsync should follow it is missed. This update checks if it needs to loop again in do_io to issue that final fsync when all the IO work is done.

Signed-off-by: Roman Sofin [email protected]

Copy link
Author

@RomanSofyin RomanSofyin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the issues.

@RomanSofyin RomanSofyin requested a review from axboe August 7, 2024 17:36
@axboe
Copy link
Owner

axboe commented Aug 7, 2024

Addressed the issues.

Please squash the two commits into one, we should not have fix-up commits in the history. Once you do, I'll take a look again.

@RomanSofyin RomanSofyin force-pushed the r.sofin/fsync-fixup branch from b1c0bb5 to 2b11473 Compare August 8, 2024 07:45
@RomanSofyin
Copy link
Author

Hi @axboe, can you look at this again?

@axboe
Copy link
Owner

axboe commented Aug 12, 2024

You still have the should_fsync() change in there. If you want to have it take a const td, then please do a prep patch with just that. Like I said initially, it's not related to your fix at all outside of you adding a new caller that needs it done. That's by definition a prep patch, should not be included with a fix.

@RomanSofyin
Copy link
Author

I created PR #1805 which I guess we merge first, then I update the current one in case there will be a conflict.

@axboe
Copy link
Owner

axboe commented Aug 13, 2024

Since the change depends on the other one, please just have both commits in a single pr.

@RomanSofyin
Copy link
Author

Should be fine now. If so, I'll close #1805.

@RomanSofyin
Copy link
Author

@axboe does this look fine?

Copy link
Collaborator

@sitsofe sitsofe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomanSofyin Looks good! Some additional comments:

Can you start your commit titles with a section? So for the first commit

io_u: make should_fsync() take a const

Make should_fsync() safer by using a pointer to a constant thread_data
structure instance.

<your signed off by>

and for the second commit just a bit of minor rewording:

io_u/backend: fix missing final fsync() when using fsync=N

When fsync=N is specified and the final I/O issued happens to be the Nth, the 
following fsync() is missed. 

This update checks if it needs to loop again in do_io() to issue that final
fsync when all the IO work is done.

<your signed off by>

@sitsofe sitsofe added the needreporterinfo Waiting on information from the issue reporter label Aug 23, 2025
Make should_fsync() safer by using a pointer to a constant thread_data
structure instance.

Signed-off-by: Roman Sofin [email protected]
@RomanSofyin RomanSofyin requested a review from sitsofe August 25, 2025 07:18
Copy link
Collaborator

@sitsofe sitsofe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentkfu, @axboe This LGTM.

@sitsofe sitsofe removed the needreporterinfo Waiting on information from the issue reporter label Aug 25, 2025
@RomanSofyin
Copy link
Author

I see build-containers (oracle, oraclelinux, 9, x86_64) has failed, is this smth expected?

@sitsofe
Copy link
Collaborator

sitsofe commented Aug 25, 2025

@RomanSofyin I kicked off a rebuild of that job and it passed so it seems like it was a transient error.

int do_io_u_sync(const struct thread_data *, struct io_u *);
int do_io_u_trim(struct thread_data *, struct io_u *);

bool ddir_is_sync(const struct thread_data *);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd personally make that static inline, and require that get_sync_ddir() is called only if should_fsync() is true already. That skips a function call in the hot issue path loop for each IO, even if fsync hasn't been set at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested I could call get_sync_ddir() only if should_fsync() is true:

bool ddir_is_sync(const struct thread_data *td)
{
	return should_fsync(td) && get_sync_ddir(td) != DDIR_INVAL;
}

But I don't quite get how you want me to make ddir_is_sync static. Since ddir_is_sync gets called from backend.c it needs to be either in backend.c or in io_u.h to have it static. But ddir_is_sync calls should_fsync and get_sync_ddir which are static as well so can't be referred from anywhere but io_u.c.

Can you please clarify this?

Copy link
Collaborator

@sitsofe sitsofe Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomanSofyin if you make ddir_is_sync static inline the whole definition can live in the io_u.h header (rather than being in io_u.c) and the whole thing will be copied inline into every place it used (see dprint_io_u() in io_u.h as an example).

Scratch that - your point about about not being able to see should_fsync and get_sync_ddir from the io_u.h header stands.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomanSofyin I've chatted to @axboe and it should be sufficient to guard the call sites to ddir_is_sync() by a should_fsync() pre-check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking if it should be a bit nicer condition being wrapped in a macro or inline function. I mean both components together: (should_fsync(td) && ddir_is_sync(td)).

backend.c Outdated
!td->o.read_iolog_file &&
(!td->o.time_based ||
(td->o.time_based && td->o.verify != VERIFY_NONE)))
((!td->o.time_based && !ddir_is_sync(td)) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're a time based job won't we have already exited due to the runtime_exceeded check further up? Further, wouldn't we want to avoid checking that we need to add a trailing fsync in the case that we are a non-time based job but still had a non-zero runtime limit (but again wouldn't we have already exited by now)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime_exceeded condition hits if timeout (aka runtime) limit of a job is exceeded, no matter if it's a time based job or not. So if a job gets this line of code, it's allowed to do all its stuff.

As I see this, we can't avoid checking that because for a non-time based (i.e. "size based") job we terminate the loop when the number of bytes requested is exceeded even if there is an fsync call to issue left.

int do_io_u_sync(const struct thread_data *, struct io_u *);
int do_io_u_trim(struct thread_data *, struct io_u *);

bool ddir_is_sync(const struct thread_data *);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomanSofyin I've chatted to @axboe and it should be sufficient to guard the call sites to ddir_is_sync() by a should_fsync() pre-check.

backend.c Outdated

while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
(!flist_empty(&td->trim_list)) || !io_issue_bytes_exceeded(td) ||
ddir_is_sync(td) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this to be the last case as it's hopefully less common.

When fsync=N is specified and the final I/O issued happens to be the Nth, the
following fsync() is missed.

This update checks if it needs to loop again in do_io() to issue that final
fsync when all the IO work is done.

Signed-off-by: Roman Sofin [email protected]
@RomanSofyin
Copy link
Author

@sitsofe a CI job seems to hang again, can you rerun it please? link

Also would you mind looking at the changes I delivered in 7b7f18c and tell if it's sufficient?

@sitsofe
Copy link
Collaborator

sitsofe commented Sep 11, 2025

@RomanSofyin

Also would you mind looking at the changes I delivered in 7b7f18c and tell if it's sufficient?

I'm currently on holiday but I'll take a proper look when they're over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants