fs: support io_uring with `tokio::fs::read` #7696

Daksh14 · 2025-10-19T08:56:00Z

Motivation

We slowly use io uring everywhere here I made the simple change of supporting fs::read however it might not be as simple. Let me know if the unsafe I used is correct or not.

We currently use the blocking std::fs::MetaData to obtain file size for buffer capacity and extend the length of the vector according to the bytes read in the CQE. This implementation sounds good on paper to me.

Later we should implement an internal statx helper, in this PR or a seperate PR to make our uring implementation less painful to use. As this pr failed #7616

Lets put statx helper in different PR to avoid merging an inefficient read implementation given io uring is about being more efficient in file IO

Solution

Continue adopting io uring

strace on a tokio::fs::read after this change

io_uring_setup(256, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=256, cq_entries=512, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABL
E|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS
|IORING_FEAT_RSRC_TAGS|IORING_FEAT_CQE_SKIP|IORING_FEAT_LINKED_FILE|IORING_FEAT_REG_REG_RING|IORING_FEAT_RECVSEND_BUNDLE|IORING_FEAT_MIN_TIMEOUT|IORING_FEAT_RW_ATTR, sq_off=
{head=0, tail=4, ring_mask=16, ring_entries=24, flags=36, dropped=32, array=8256, user_addr=0}, cq_off={head=8, tail=12, ring_mask=20, ring_entries=28, overflow=44, cqes=64,
 flags=40, user_addr=0}}) = 9
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 9, 0x10000000) = 0xfaf0bf1e2000
mmap(NULL, 9280, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 9, 0) = 0xfaf0be71d000
epoll_ctl(5, EPOLL_CTL_ADD, 9, {events=EPOLLIN|EPOLLRDHUP|EPOLLET, data=0}) = 0
io_uring_enter(9, 1, 0, 0, NULL, 128)   = 1
futex(0xfaf0bf2557f0, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
mmap(NULL, 2162688, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xfaf0be50d000
mprotect(0xfaf0be51d000, 2097152, PROT_READ|PROT_WRITE) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0xfaf0be71c150, parent_
tid=0xfaf0be71c150, exit_signal=0, stack=0xfaf0be50d000, stack_size=0x20e960, tls=0xfaf0be71c7a0} => {parent_tid=[746758]}, 88) = 746758
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0xfaf0be71c850, FUTEX_WAKE_PRIVATE, 1) = 1
io_uring_enter(9, 1, 0, 0, NULL, 128)   = 1
futex(0xfaf0bf2557f0, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable)
close(10)                               = 0

tokio/src/fs/read.rs

tokio/src/io/uring/read.rs

tokio/src/fs/read.rs

tokio/src/io/uring/read.rs

tokio/src/fs/read.rs

martin-augment · 2025-10-22T12:18:15Z

tokio/src/fs/read.rs

+    let mut offset = 0;
+
+    while size_read < size {
+        let left_to_read = (size - size_read) as u32;


Suggested change

let left_to_read = (size - size_read) as u32;

let left_to_read = u32::try_from(size - size_read).unwrap_or(u32::MAX);

To properly support files bigger than 4GB.

max read size at a time is u32::MAX, we read the rest in other next iterations

In the future, if we know we're reading more than u32::MAX then we can batch 2 read requests to avoid extra syscalls

tokio/src/fs/read.rs

tokio/src/io/uring/read.rs

tokio/src/fs/read.rs

tokio/src/fs/read_uring.rs

tokio/tests/fs_uring_read.rs

ADD-SP

This looks complicated, I will review it incrementally.

tokio/src/fs/read_uring.rs

mox692

I haven't checked all of the details in read_uring.rs, but left some comments I've noticed so far.

tokio/src/io/uring/read.rs

tokio/src/fs/read_uring.rs

Darksonn · 2025-11-12T11:59:54Z

tokio/src/fs/read_uring.rs

+
+    let (size_read, r_fd, mut r_buf) = op_read(fd, buf, offset, &mut read_len).await?;
+
+    r_buf.splice(back_bytes_len..back_bytes_len, temp_arr);


We can handle allocation failure by error rather than panic by allocating before calling splice. Note that this doesn't allocate if size_read == 0 since the capacity is guaranteed enough in that case.

Suggested change

r_buf.splice(back_bytes_len..back_bytes_len, temp_arr);

r_buf.try_reserve(PROBE_SIZE)?;

r_buf.splice(back_bytes_len..back_bytes_len, temp_arr);

@Darksonn Will this splice write in the reserved area? Because we can only index in the length and not capacity

No. The length of r_buf before the op_read is exactly back_bytes_len, so after the op_read we have back_bytes_len <= r_buf.len() which is the requirement for using splice.

tokio/src/fs/read_uring.rs

Darksonn · 2025-11-12T12:04:49Z

tokio/tests/fs_uring_read.rs

+        std::thread::spawn(move || {
+            let rt: Runtime = rx.recv().unwrap();
+            rt.shutdown_timeout(Duration::from_millis(300));
+            done_tx.send(()).unwrap();
+        });
+
+        tx.send(rt).unwrap();


This is equivalent to just moving the runtime.

Suggested change

std::thread::spawn(move || {

let rt: Runtime = rx.recv().unwrap();

rt.shutdown_timeout(Duration::from_millis(300));

done_tx.send(()).unwrap();

});

tx.send(rt).unwrap();

std::thread::spawn(move || {

rt.shutdown_timeout(Duration::from_millis(300));

done_tx.send(()).unwrap();

});

If you wanted an actual sleep for the loop to make some progress, then what you have now does not achieve that.

@Darksonn Should I sleep in the std::thread so the loop can make progress? Is that allowed or I shouldn't do it

Poll the read(path).await manually to submit it to the kernel.

Then increase the semaphore.

Keep the spawn task pending for ever.

So that we can wait on the semaphore and then shutdown the runtime.

Is this work for multi-thread runtime? For current thread runtime, this won't work. But I believe we can apply the similar pattern to the current thread runtime.

rt.spawn(async move { let path = path[0].clone(); let mut futs = vec![]; // spawning a bunch of uring operations. for _ in 0..N { let path = path.clone(); let cl = Arc::clone(&cl); let fut = Box::pin(read(path)); assert_pending!(fut.poll(cx)); futs.push(fut); } pending_forever().await; }); std::thread::spawn(move || { rt.shutdown_timeout(Duration::from_millis(300)); done_tx.send(()).unwrap(); }); done_rx.recv().unwrap();

tokio/tests/fs_uring_read.rs

Darksonn · 2025-11-12T12:07:17Z

tokio/tests/fs_uring_read.rs

+        poll_fn(|cx| {
+            let fut = read(&path[0]);
+
+            // If io_uring is enabled (and not falling back to the thread pool),
+            // the first poll should return Pending.
+            let _pending = Box::pin(fut).poll_unpin(cx);


This code currently drops the read future right away. If you want the future to stay around until the handle.abort() call, then you need to modify this code.

Suggested change

poll_fn(|cx| {

let fut = read(&path[0]);

// If io_uring is enabled (and not falling back to the thread pool),

// the first poll should return Pending.

let _pending = Box::pin(fut).poll_unpin(cx);

let fut = read(&path[0]);

tokio::pin!(fut);

poll_fn(|cx| {

// If io_uring is enabled (and not falling back to the thread pool),

// the first poll should return Pending.

let _pending = fut.as_mut().poll(cx);

I have pushed a slightly different version of this

tokio/src/fs/read_uring.rs

tokio/src/fs/read.rs

tokio/src/fs/read_uring.rs

Darksonn · 2025-11-17T14:24:34Z

tokio/src/fs/read_uring.rs

+    let start_cap = buf.capacity();
+
+    // if buffer has no room and no size_hint, start with a small probe_read from 0 offset
+    if (size_hint.is_none() || size_hint == Some(0)) && buf.capacity() - buf.len() < PROBE_SIZE {


I think we can just delete this if. If the size hint is zero/missing, then we always have buf.capacity() == 0. If we delete it, then it will fall through to the loop, which will do the same thing as what this block does.

Reading 64 MB chunks at a time and keeping the kernel busy surpases std::fs::read time with unoptimized io_uring one being 1.12% fast

martin-g · 2025-11-18T11:26:24Z

tokio/src/fs/read.rs

+        feature = "rt",
+        feature = "fs",
+        target_os = "linux"
+    ))]


Any reason why cfg_io_uring! { is not used here ?

@martin-augment its following the currently used code block macro here in tokio/src/fs/open_options.rs, I can change those in a different "refactor" PR https://github.com/tokio-rs/tokio/blob/master/tokio/src/fs/open_options.rs#L592

tokio/src/io/uring/read.rs

tokio/tests/fs_uring_read.rs

martin-g · 2025-11-18T13:11:11Z

tokio/src/fs/read_uring.rs

+
+// Max bytes we can read using io uring submission at a time
+// SAFETY: cannot be higher than u32::MAX for safe cast
+// Set to read max 64 blocks at time


What kind of block is meant here ?
File system blocks are usually around 4096 bytes. Here we are talking about a block of 1MiB.

Suggested change

// Set to read max 64 blocks at time

// Set to read max 64 MiB at time

?

@martin-augment this number came after benchmarking with io-uring against std::fs::read as its a reasonable amount of data for kernel to copy

martin-g · 2025-11-18T13:13:58Z

tokio/tests/fs_uring_read.rs

+
+    // Wait for the first poll
+
+    let _ = rx.try_recv();


Why try_recv() ?
This may return Empty before the task has a chance to be executed.

I added an assert with recv await

tokio/src/io/uring/read.rs

Fix tests fail on allocation failure Doc fix and fix double subtraction of offset Fix typos and use assert the stopped task's first poll

Darksonn added A-tokio Area: The main tokio crate M-fs Module: tokio/fs labels Oct 19, 2025

Darksonn reviewed Oct 19, 2025

View reviewed changes

tokio/src/fs/read.rs Outdated Show resolved Hide resolved

ADD-SP reviewed Oct 19, 2025

View reviewed changes

tokio/src/fs/read.rs Outdated Show resolved Hide resolved

tokio/src/io/uring/read.rs Outdated Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch from f4c97c2 to 7e73f17 Compare October 19, 2025 10:34

ADD-SP reviewed Oct 19, 2025

View reviewed changes

tokio/src/fs/read.rs Outdated Show resolved Hide resolved

Darksonn reviewed Oct 19, 2025

View reviewed changes

tokio/src/io/uring/read.rs Show resolved Hide resolved

tokio/src/fs/read.rs Outdated Show resolved Hide resolved

Darksonn changed the title ~~Fs read io uring~~ fs: support io_uring with tokio::fs::read Oct 20, 2025

martin-augment reviewed Oct 22, 2025

View reviewed changes

Daksh14 force-pushed the fs_read_io_uring branch 7 times, most recently from 6237a4c to 636cfb8 Compare October 27, 2025 13:36

Daksh14 requested review from ADD-SP, Darksonn and martin-augment October 28, 2025 08:21

martin-g reviewed Oct 28, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

martin-g reviewed Oct 28, 2025

View reviewed changes

tokio/tests/fs_uring_read.rs Outdated Show resolved Hide resolved

ADD-SP reviewed Oct 28, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

ADD-SP reviewed Oct 29, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch from 6e0abae to 8120700 Compare October 29, 2025 06:12

mox692 reviewed Nov 1, 2025

View reviewed changes

tokio/src/io/uring/read.rs Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch 2 times, most recently from e6c6ce7 to b9c3885 Compare November 2, 2025 17:12

Daksh14 requested review from ADD-SP, martin-g and mox692 November 3, 2025 06:52

Daksh14 requested a review from ADD-SP November 8, 2025 12:42

ADD-SP reviewed Nov 8, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch 4 times, most recently from 19bda66 to c1f8dc9 Compare November 12, 2025 11:35

Darksonn mentioned this pull request Nov 12, 2025

Implement uring in AsyncWrite for fs::File #7713

Open

Daksh14 force-pushed the fs_read_io_uring branch 2 times, most recently from b29ed97 to 2acee44 Compare November 12, 2025 11:45

Darksonn reviewed Nov 12, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch 2 times, most recently from 6261e0e to 974be76 Compare November 12, 2025 11:56

Darksonn reviewed Nov 12, 2025

View reviewed changes

martin-g reviewed Nov 12, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read_uring.rs Outdated Show resolved Hide resolved

tokio/src/fs/read.rs Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch from 902191e to 52dcdca Compare November 12, 2025 15:38

ADD-SP reviewed Nov 12, 2025

View reviewed changes

tokio/src/fs/read_uring.rs Show resolved Hide resolved

Daksh14 force-pushed the fs_read_io_uring branch 3 times, most recently from b599115 to 44f8f98 Compare November 14, 2025 19:40

Darksonn reviewed Nov 17, 2025

View reviewed changes

Daksh14 added 4 commits November 17, 2025 23:55

io-uring: Implement tokio::fs::read

001517e

io-uring: re-think fs read with std lib's implementation

6e24938

io-uring: Use stack method to prevent realloc in case of 0 size read

d544c91

io-uring: Fix check and reduce MAX_READ_SIZE to 64 blocks

a5dbf4e

Reading 64 MB chunks at a time and keeping the kernel busy surpases std::fs::read time with unoptimized io_uring one being 1.12% fast

Daksh14 force-pushed the fs_read_io_uring branch from 1475f00 to b6cfb9b Compare November 17, 2025 18:25

martin-g reviewed Nov 18, 2025

View reviewed changes

Daksh14 force-pushed the fs_read_io_uring branch from 31431a2 to 4e69595 Compare November 18, 2025 14:30

martin-g reviewed Nov 18, 2025

View reviewed changes

tokio/src/io/uring/read.rs Show resolved Hide resolved

io-uring: Create op_read helper function

b7ccee7

Fix tests fail on allocation failure Doc fix and fix double subtraction of offset Fix typos and use assert the stopped task's first poll

Daksh14 force-pushed the fs_read_io_uring branch from 67ac09f to b7ccee7 Compare November 20, 2025 11:25

	let left_to_read = (size - size_read) as u32;
	let left_to_read = u32::try_from(size - size_read).unwrap_or(u32::MAX);


		let (size_read, r_fd, mut r_buf) = op_read(fd, buf, offset, &mut read_len).await?;

		r_buf.splice(back_bytes_len..back_bytes_len, temp_arr);

	r_buf.splice(back_bytes_len..back_bytes_len, temp_arr);
	r_buf.try_reserve(PROBE_SIZE)?;
	r_buf.splice(back_bytes_len..back_bytes_len, temp_arr);

	// Set to read max 64 blocks at time
	// Set to read max 64 MiB at time

Uh oh!

fs: support io_uring with tokio::fs::read #7696

Are you sure you want to change the base?

fs: support io_uring with tokio::fs::read #7696

Conversation

Daksh14 commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ADD-SP left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mox692 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daksh14 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ADD-SP Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ADD-SP Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fs: support io_uring with `tokio::fs::read` #7696

fs: support io_uring with `tokio::fs::read` #7696

Daksh14 commented Oct 19, 2025 •

edited

Loading

Daksh14 Nov 12, 2025 •

edited

Loading

ADD-SP Nov 13, 2025 •

edited

Loading

ADD-SP Nov 13, 2025 •

edited

Loading