forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
Sync updates #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Sync updates #34
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When apic=verbose is specified, the LAPIC timer calibration prints its results
to the console. At least while debugging virtualization code, the CPU and bus
frequencies are printed incorrectly.
Specifically, for a 1.7 GHz CPU with 1 GHz bus frequency and HZ=1000,
the log includes a superfluous 0 after the period:
..... calibration result: 999978
..... CPU clock speed is 1696.0783 MHz.
..... host bus clock speed is 999.0978 MHz.
Looking at the code, this only worked as intended for HZ=100. After the fix,
the correct frequency is printed:
..... calibration result: 999828
..... CPU clock speed is 1696.507 MHz.
..... host bus clock speed is 999.828 MHz.
There is no functional change to the LAPIC calibration here, beyond the
printing format changes.
[ bp: - Massage commit message
- Figures it should apply this patch about ~4 years later
- Massage it into the current code ]
Suggested-by: Markus Napierkowski <[email protected]>
Signed-off-by: Julian Stecklina <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://patch.msgid.link/[email protected]
The third argument of div_Xsig() is the output of the division, but is marked
'const', which means the compiler is not expecting it to be updated and may
generate bad code around the call. clang-21 now warns about the pattern since
an uninitialized variable is passed into two 'const' arguments by reference:
arch/x86/math-emu/poly_atan.c:93:28: error: variable 'argSignif' is uninitialized \
when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
93 | div_Xsig(&Numer, &Denom, &argSignif);
| ^~~~~~~~~
arch/x86/math-emu/poly_l2.c:195:29: error: variable 'argSignif' is uninitialized \
when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
195 | div_Xsig(&Numer, &Denom, &argSignif);
| ^~~~~~~~~
The implementation is in assembly, so the problem has gone unnoticed since the
code was added in the linux-1.1 days. Remove the 'const' marker here.
Fixes: e19a1bd ("Import 1.1.38")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Copy from 54da6a0 ("locking: Introduce __cleanup() based infrastructure") the bits which mark the variable with a cleanup attribute unused so that my clang 15 can dispose of it properly instead of warning that it is unused which then fails the build due to -Werror. Suggested-by: Nathan Chancellor <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/20251031114919.GBaQSiPxZrziOs3RCW@fat_crate.local
When executing a task in proxy context, handle yields as if they were requested by the donor task. This matches the traditional PI semantics of yield() as well. This avoids scenario like proxy task yielding, pick next task selecting the same previous blocked donor, running the proxy task again, etc. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Fernand Sieber <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://patch.msgid.link/[email protected]
Early return true if the core cookie matches. This avoids the SMT mask loop to check for an idle core, which might be more expensive on wide platforms. Signed-off-by: Fernand Sieber <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: K Prateek Nayak <[email protected]> Reviewed-by: Madadi Vineeth Reddy <[email protected]> Link: https://patch.msgid.link/[email protected]
I always end up having to re-read these emails every time I look at this code. And a future patch is going to change this story a little. This means it is past time to stick them in a comment so it can be modified and stay current. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://patch.msgid.link/[email protected]
Basically, from the constraint that the sum of lag is zero, you can
infer that the 0-lag point is the weighted average of the individual
vruntime, which is what we're trying to compute:
\Sum w_i * v_i
avg = --------------
\Sum w_i
Now, since vruntime takes the whole u64 (worse, it wraps), this
multiplication term in the numerator is not something we can compute;
instead we do the min_vruntime (v0 henceforth) thing like:
v_i = (v_i - v0) + v0
This does two things:
- it keeps the key: (v_i - v0) 'small';
- it creates a relative 0-point in the modular space.
If you do that subtitution and work it all out, you end up with:
\Sum w_i * (v_i - v0)
avg = --------------------- + v0
\Sum w_i
Since you cannot very well track a ratio like that (and not suffer
terrible numerical problems) we simpy track the numerator and
denominator individually and only perform the division when strictly
needed.
Notably, the numerator lives in cfs_rq->avg_vruntime and the denominator
lives in cfs_rq->avg_load.
The one extra 'funny' is that these numbers track the entities in the
tree, and current is typically outside of the tree, so avg_vruntime()
adds current when needed before doing the division.
(vruntime_eligible() elides the division by cross-wise multiplication)
Anyway, as mentioned above, we currently use the CFS era min_vruntime
for this purpose. However, this thing can only move forward, while the
above avg can in fact move backward (when a non-eligible task leaves,
the average becomes smaller), this can cause trouble when through
happenstance (or construction) these values drift far enough apart to
wreck the game.
Replace cfs_rq::min_vruntime with cfs_rq::zero_vruntime which is kept
near/at avg_vruntime, following its motion.
The down-side is that this requires computing the avg more often.
Fixes: 147f3ef ("sched/fair: Implement an EEVDF-like scheduling policy")
Reported-by: Zicheng Qu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Cc: [email protected]
…_locked() Since commit d4c6420 ("sched: Cleanup the sched_change NOCLOCK usage"), update_rq_clock() is called in do_set_cpus_allowed() -> sched_change_begin() to update the rq clock. This results in a duplicate call update_rq_clock() in __set_cpus_allowed_ptr_locked(). While holding the rq lock and before calling do_set_cpus_allowed(), there is nothing that depends on an updated rq_clock. Therefore, remove the redundant update_rq_clock() in __set_cpus_allowed_ptr_locked() to avoid the warning about double rq clock updates. Fixes: d4c6420 ("sched: Cleanup the sched_change NOCLOCK usage") Signed-off-by: Hao Jia <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: K Prateek Nayak <[email protected]> Link: https://patch.msgid.link/[email protected]
The dl_server time accounting code is a little odd. The normal scheduler pattern is to update curr before doing something, such that the old state is fully accounted before changing state. Notably, the dl_server_timer() needs to propagate the current time accounting since the current task could be ran by dl_server and thus this can affect dl_se->runtime. Similarly for dl_server_start(). And since the (deferred) dl_server wants idle time accounted, rework sched_idle_class time accounting to be more like all the others. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://patch.msgid.link/[email protected]
Gabriel reported that the dl_server doesn't stop as expected.
The problem was found to be the fact that idle time and fair runtime are
treated equally. Both will count towards dl_server runtime and push the
activation forwards when it is in the zero-laxity wait state.
Notably:
dl_server_update_idle()
update_curr_dl_se()
if (dl_defer && dl_throttled && dl_runtime_exceeded())
hrtimer_try_to_cancel(); // stop timer
replenish_dl_new_period()
deadline = now + dl_deadline; // fwd period
runtime = dl_runtime;
start_dl_timer(); // restart timer
And while we do want idle time accounted towards the *current* activation of
the dl_server -- after all, a fair task could've ran if we had any -- we don't
necessarily want idle time to cause or push forward an activation.
Introduce dl_defer_idle to make this distinction. It will be set once idle time
pushed the activation forward, once set idle time will only be allowed to
consume any runtime but not push the activation. This will then cause
dl_server_timer() to fire, which will stop the dl_server.
Any non-idle time accounting during this phase will clear dl_defer_idle, so
only a full period of idle will cause the dl_server to stop.
Reported-by: Gabriele Monaco <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Place the notes that resulted from going through the dl_server code in a comment. Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
cpumask_subset(a,b) -> cpumask_weight(a) should be same as cpumask_weight_and(a,b) for_each_cpu_and(a,b) to count cpus could be replaced by cpumask_weight_and(a,b) No Functional Change. It could save a few cycles since cpumask_weight_and would be more efficient. Plus one less stack variable. Signed-off-by: Shrikanth Hegde <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Juri Lelli <[email protected]> Link: https://patch.msgid.link/[email protected]
In select_task_rq_dl, there is only one goto statement, there is no need for it. No functional changes. Signed-off-by: Shrikanth Hegde <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Juri Lelli <[email protected]> Link: https://patch.msgid.link/[email protected]
__break_lease() currently overrides the flc_flags field in the lease after allocating it. A forthcoming patch will add the ability to request a FL_DELEG type lease. Instead of overriding the flags field, add a flags argument to lease_alloc() and lease_init() so it's set correctly after allocating. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Currently __break_lease takes both a type and an openmode. With the addition of directory leases, that makes less sense. Declare a set of LEASE_BREAK_* flags that can be used to control how lease breaks work instead of requiring a type and an openmode. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
The current API requires a pointer to an inode pointer. It's easy for callers to get this wrong. Add a new delegated_inode structure and use that to pass back any inode that needs to be waited on. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
When nfsd starts requesting directory delegations, setlease handlers may see requests for leases on directories. Push the !S_ISREG check down into the non-trivial setlease handlers, so we can selectively enable them where they're supported. FUSE is special: It's the only filesystem that supports atomic_open and allows kernel-internal leases. atomic_open is issued when the VFS doesn't know the state of the dentry being opened. If the file doesn't exist, it may be created, in which case the dir lease should be broken. The existing kernel-internal lease implementation has no provision for this. Ensure that we don't allow directory leases by default going forward by explicitly disabling them there. Reviewed-by: NeilBrown <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. vfs_link, vfs_unlink, and vfs_rename all have existing delegation break handling for the children in the rename. Add the necessary calls for breaking delegations in the parent(s) as well. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a new delegated_inode parameter to vfs_mkdir. All of the existing callers set that to NULL for now, except for do_mkdirat which will properly block until the lease is gone. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a delegated_inode struct to vfs_rmdir() and populate that pointer with the parent inode if it's non-NULL. Most existing in-kernel callers pass in a NULL pointer. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a delegated_inode parameter to lookup_open and have it break the delegation. Then, open_last_lookups can wait for the delegation break and retry the call to lookup_open once it's done. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
As Neil points out: "I would be in favour of dropping the "dir" arg because it is always d_inode(dentry->d_parent) which is stable." ...and... "Also *every* caller of vfs_create() passes ".excl = true". So maybe we don't need that arg at all." Drop both arguments from vfs_create() and fix up the callers. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a delegated_inode parameter to vfs_create. Most callers are converted to pass in NULL, but do_mknodat() is changed to wait for a delegation break if there is one. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a new delegated_inode pointer to vfs_mknod() and have the appropriate callers wait when there is an outstanding delegation. All other callers just set the pointer to NULL. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we must break delegations on the parent on any change to the directory. Add a delegated_inode parameter to vfs_symlink() and have it break the delegation. do_symlinkat() can then wait on the delegation break before proceeding. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
With the addition of the try_break_lease calls in directory changing operations, allow generic_setlease to hand them out. Write leases on directories are never allowed however, so continue to reject them. For now, there is no API for requesting delegations from userland, so ensure that userland is prevented from acquiring a lease on a directory. Reviewed-by: Jan Kara <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
The filecache infrastructure will only handle S_IFREG files at the moment. Directory delegations will require adding support for opening S_IFDIR inodes. Plumb a "type" argument into nfsd_file_do_acquire() and have all of the existing callers set it to S_IFREG. Add a new nfsd_file_acquire_dir() wrapper that nfsd can call to request a nfsd_file that holds a directory open. For now, there is no need for a fsnotify_mark for directories, as CB_NOTIFY is not yet supported. Change nfsd_file_do_acquire() to avoid allocating one for non-S_IFREG inodes. Reviewed-by: Chuck Lever <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
As Trond pointed out: "...provided that the presented stateid is actually valid, it is also sufficient to uniquely identify the file to which it is associated (see RFC8881 Section 8.2.4), so the filehandle should be considered mostly irrelevant for operations like DELEGRETURN." Don't ask fh_verify to filter on file type. Reviewed-by: Chuck Lever <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Add a new routine for acquiring a read delegation on a directory. These are recallable-only delegations with no support for CB_NOTIFY. That will be added in a later phase. Since the same CB_RECALL/DELEGRETURN infrastructure is used for regular and directory delegations, a normal nfs4_delegation is used to represent a directory delegation. Reviewed-by: NeilBrown <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Now that support for recallable directory delegations is available, expose this functionality to userland with new F_SETDELEG and F_GETDELEG commands for fcntl(). Note that this also allows userland to request a FL_DELEG type lease on files too. Userland applications that do will get signalled when there are metadata changes in addition to just data changes (which is a limitation of FL_LEASE leases). These commands accept a new "struct delegation" argument that contains a flags field for future expansion. Signed-off-by: Jeff Layton <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Christian Brauner <[email protected]> says: The fix sent in [1] was squashed into this commit. Fixes: https://lore.kernel.org/[email protected] [1] Reported-by: Mark Brown <[email protected]> [1] Suggested-by: Linus Torvalds <[email protected]> [1] Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
Christian Brauner <[email protected]> says: This now removes roughly double the code that it adds. I've been playing with this to allow for moderately flexible usage of the get_unused_fd_flags() + create file + fd_install() pattern that's used quite extensively and requires cumbersome cleanup paths. How callers allocate files is really heterogenous so it's not really convenient to fold them into a single class. It's possibe to split them into subclasses like for anon inodes. I think that's not necessarily nice as well. This adds two primitives: (1) FD_ADD() the simple cases a file is installed: fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device)); if (fd < 0) vfio_device_put_registration(device); return fd; (2) FD_PREPARE() that captures all the cases where access to fd or file or additional work before publishing the fd is needed: FD_PREPARE(fdf, O_CLOEXEC, sync_file->file); if (fdf.err) { fput(sync_file->file); return fdf.err; } data.fence = fd_prepare_fd(fdf); if (copy_to_user((void __user *)arg, &data, sizeof(data))) return -EFAULT; return fd_publish(fdf); I've converted all of the easy cases over to it and it gets rid of an aweful lot of convoluted cleanup logic. There are a bunch of other cases that can also be converted after a bit of massaging. It's centered around a simple struct. FD_PREPARE() encapsulates all of allocation and cleanup logic and must be followed by a call to fd_publish() which associates the fd with the file and installs it into the callers fdtable. If fd_publish() isn't called both are deallocated. FD_ADD() is a shorthand that does the fd_publish() and never exposes the struct to the caller. That's often the case when they don't need access to anything after installing the fd. It mandates a specific order namely that first we allocate the fd and then instantiate the file. But that shouldn't be a problem. Nearly everyone I've converted used this order anyway. There's a bunch of additional cases where it would be easy to convert them to this pattern. For example, the whole sync file stuff in dma currently returns the containing structure of the file instead of the file itself even though it's only used to allocate files. Changing that would make it fall into the FD_PREPARE() pattern easily. I've not done that work yet. There's room for extending this in a way that wed'd have subclasses for some particularly often use patterns but as I said I'm not even sure that's worth it. * patches from https://patch.msgid.link/[email protected]: (47 commits) kvm: convert kvm_vcpu_ioctl_get_stats_fd() to FD_PREPARE() kvm: convert kvm_arch_supports_gmem_init_shared() to FD_PREPARE() io_uring: convert io_create_mock_file() to FD_PREPARE() file: convert replace_fd() to FD_PREPARE() vfio: convert vfio_group_ioctl_get_device_fd() to FD_PREPARE() tty: convert ptm_open_peer() to FD_PREPARE() ntsync: convert ntsync_obj_get_fd() to FD_PREPARE() media: convert media_request_alloc() to FD_PREPARE() hv: convert mshv_ioctl_create_partition() to FD_PREPARE() gpio: convert linehandle_create() to FD_PREPARE() dma: port sw_sync_ioctl_create_fence() to FD_PREPARE() pseries: port papr_rtas_setup_file_interface() to FD_PREPARE() pseries: convert papr_platform_dump_create_handle() to FD_PREPARE() spufs: convert spufs_gang_open() to FD_PREPARE() papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE() spufs: convert spufs_context_open() to FD_PREPARE() net/socket: convert __sys_accept4_file() to FD_PREPARE() net/socket: convert sock_map_fd() to FD_PREPARE() net/sctp: convert sctp_getsockopt_peeloff_common() to FD_PREPARE() net/kcm: convert kcm_ioctl() to FD_PREPARE() ... Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner <[email protected]>
mutex_init() invokes __mutex_init() providing the name of the lock and
a pointer to a the lock class. With LOCKDEP enabled this information is
useful but without LOCKDEP it not used at all. Passing the pointer
information of the lock class might be considered negligible but the
name of the lock is passed as well and the string is stored. This
information is wasting storage.
Split __mutex_init() into a _genereic() variant doing the initialisation
of the lock and a _lockdep() version which does _genereic() plus the
lockdep bits. Restrict the lockdep version to lockdep enabled builds
allowing the compiler to remove the unused parameter.
This results in the following size reduction:
text data bss dec filename
| 30237599 8161430 1176624 39575653 vmlinux.defconfig
| 30233269 8149142 1176560 39558971 vmlinux.defconfig.patched
-4.2KiB -12KiB
| 32455099 8471098 12934684 53860881 vmlinux.defconfig.lockdep
| 32455100 8471098 12934684 53860882 vmlinux.defconfig.patched.lockdep
| 27152407 7191822 2068040 36412269 vmlinux.defconfig.preempt_rt
| 27145937 7183630 2067976 36397543 vmlinux.defconfig.patched.preempt_rt
-6.3KiB -8KiB
| 29382020 7505742 13784608 50672370 vmlinux.defconfig.preempt_rt.lockdep
| 29376229 7505742 13784544 50666515 vmlinux.defconfig.patched.preempt_rt.lockdep
-5.6KiB
[peterz: folded fix from boqun]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Waiman Long <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://patch.msgid.link/[email protected]
The local_lock_t was never added to the MAINTAINERS file since its inclusion. Add local_lock_t to the locking primitives section. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://patch.msgid.link/[email protected]
…dowing The Linux kernel coding style advises to avoid common variable names in function-like macros to reduce the risk of namespace collisions. Throughout local_lock_internal.h, several macros use the rather common variable names 'l' and 'tl'. This already resulted in an actual collision: the __local_lock_acquire() function like macro is currently shadowing the parameter 'l' of the: class_##_name##_t class_##_name##_constructor(_type *l) function factory from <linux/cleanup.h>. Rename the variable 'l' to '__l' and the variable 'tl' to '__tl' throughout the file to fix the current namespace collision and to prevent future ones. [ bigeasy: Rebase, update all l and tl instances in macros ] Signed-off-by: Vincent Mailhol <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://patch.msgid.link/[email protected]
Modify kernel-doc comments in local_lock.h to prevent warnings: Warning: include/linux/local_lock.h:9 function parameter 'lock' not described in 'local_lock_init' Warning: include/linux/local_lock.h:56 function parameter 'lock' not described in 'local_trylock_init' Warning: include/linux/local_lock.h:56 expecting prototype for local_lock_init(). Prototype was for local_trylock_init() instead Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://patch.msgid.link/[email protected]
So 'objtool --link -d vmlinux.o' gets surprised by this endbr64+endbr64 pattern in ___bpf_prog_run(): ___bpf_prog_run: 1e7680: ___bpf_prog_run+0x0 push %r12 1e7682: ___bpf_prog_run+0x2 mov %rdi,%r12 1e7685: ___bpf_prog_run+0x5 push %rbp 1e7686: ___bpf_prog_run+0x6 xor %ebp,%ebp 1e7688: ___bpf_prog_run+0x8 push %rbx 1e7689: ___bpf_prog_run+0x9 mov %rsi,%rbx 1e768c: ___bpf_prog_run+0xc movzbl (%rbx),%esi 1e768f: ___bpf_prog_run+0xf movzbl %sil,%edx 1e7693: ___bpf_prog_run+0x13 mov %esi,%eax 1e7695: ___bpf_prog_run+0x15 mov 0x0(,%rdx,8),%rdx 1e769d: ___bpf_prog_run+0x1d jmp 0x1e76a2 <__x86_indirect_thunk_rdx> 1e76a2: ___bpf_prog_run+0x22 endbr64 1e76a6: ___bpf_prog_run+0x26 endbr64 1e76aa: ___bpf_prog_run+0x2a mov 0x4(%rbx),%edx And crashes due to blindly dereferencing alt->insn->alt_group. Bail out on NULL ->alt_group, which produces this warning and continues with the disassembly, instead of a segfault: .git/O/vmlinux.o: warning: objtool: <alternative.1e769d>: failed to disassemble alternative Cc: Alexandre Chartre <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
…g/pub/scm/linux/kernel/git/vfs/vfs
Pull directory delegations update from Christian Brauner:
"This contains the work for recall-only directory delegations for
knfsd.
Add support for simple, recallable-only directory delegations. This
was decided at the fall NFS Bakeathon where the NFS client and server
maintainers discussed how to merge directory delegation support.
The approach starts with recallable-only delegations for several reasons:
1. RFC8881 has gaps that are being addressed in RFC8881bis. In
particular, it requires directory position information for
CB_NOTIFY callbacks, which is difficult to implement properly
under Linux. The spec is being extended to allow that information
to be omitted.
2. Client-side support for CB_NOTIFY still lags. The client side
involves heuristics about when to request a delegation.
3. Early indication shows simple, recallable-only delegations can
help performance. Anna Schumaker mentioned seeing a multi-minute
speedup in xfstests runs with them enabled.
With these changes, userspace can also request a read lease on a
directory that will be recalled on conflicting accesses. This may be
useful for applications like Samba. Users can disable leases
altogether via the fs.leases-enable sysctl if needed.
VFS changes:
- Dedicated Type for Delegations
Introduce struct delegated_inode to track inodes that may have
delegations that need to be broken. This replaces the previous
approach of passing raw inode pointers through the delegation
breaking code paths, providing better type safety and clearer
semantics for the delegation machinery.
- Break parent directory delegations in open(..., O_CREAT) codepath
- Allow mkdir to wait for delegation break on parent
- Allow rmdir to wait for delegation break on parent
- Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
and vfs_unlink()
- Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
on parent directory
- Clean up argument list for vfs_create()
- Expose delegation support to userland
Filelock changes:
- Make lease_alloc() take a flags argument
- Rework the __break_lease API to use flags
- Add struct delegated_inode
- Push the S_ISREG check down to ->setlease handlers
- Lift the ban on directory leases in generic_setlease
NFSD changes:
- Allow filecache to hold S_IFDIR files
- Allow DELEGRETURN on directories
- Wire up GET_DIR_DELEGATION handling
Fixes:
- Fix kernel-doc warnings in __fcntl_getlease
- Add needed headers for new struct delegation definition"
* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: add needed headers for new struct delegation definition
filelock: __fcntl_getlease: fix kernel-doc warnings
vfs: expose delegation support to userland
nfsd: wire up GET_DIR_DELEGATION handling
nfsd: allow DELEGRETURN on directories
nfsd: allow filecache to hold S_IFDIR files
filelock: lift the ban on directory leases in generic_setlease
vfs: make vfs_symlink break delegations on parent dir
vfs: make vfs_mknod break delegations on parent directory
vfs: make vfs_create break delegations on parent directory
vfs: clean up argument list for vfs_create()
vfs: break parent dir delegations in open(..., O_CREAT) codepath
vfs: allow rmdir to wait for delegation break on parent
vfs: allow mkdir to wait for delegation break on parent
vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
filelock: push the S_ISREG check down to ->setlease handlers
filelock: add struct delegated_inode
filelock: rework the __break_lease API to use flags
filelock: make lease_alloc() take a flags argument
…b/scm/linux/kernel/git/vfs/vfs Pull directory locking updates from Christian Brauner: "This contains the work to add centralized APIs for directory locking operations. This series is part of a larger effort to change directory operation locking to allow multiple concurrent operations in a directory. The ultimate goal is to lock the target dentry(s) rather than the whole parent directory. To help with changing the locking protocol, this series centralizes locking and lookup in new helper functions. The helpers establish a pattern where it is the dentry that is being locked and unlocked (currently the lock is held on dentry->d_parent->d_inode, but that can change in the future). This also changes vfs_mkdir() to unlock the parent on failure, as well as dput()ing the dentry. This allows end_creating() to only require the target dentry (which may be IS_ERR() after vfs_mkdir()), not the parent" * tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfsd: fix end_creating() conversion VFS: introduce end_creating_keep() VFS: change vfs_mkdir() to unlock on failure. ecryptfs: use new start_creating/start_removing APIs Add start_renaming_two_dentries() VFS/ovl/smb: introduce start_renaming_dentry() VFS/nfsd/ovl: introduce start_renaming() and end_renaming() VFS: add start_creating_killable() and start_removing_killable() VFS: introduce start_removing_dentry() smb/server: use end_removing_noperm for for target of smb2_create_link() VFS: introduce start_creating_noperm() and start_removing_noperm() VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() VFS: tidy up do_unlinkat() VFS: introduce start_dirop() and end_dirop() debugfs: rename end_creating() to debugfs_end_creating()
…rnel/git/vfs/vfs
Pull overlayfs cred guard conversion from Christian Brauner:
"This converts all of overlayfs to use credential guards, eliminating
manual credential management throughout the filesystem.
Credential guard conversion:
- Convert all of overlayfs to use credential guards, replacing the
manual ovl_override_creds()/ovl_revert_creds() pattern with scoped
guards.
This makes credential handling visually explicit and eliminates a
class of potential bugs from mismatched override/revert calls.
(1) Basic credential guard (with_ovl_creds)
(2) Creator credential guard (ovl_override_creator_creds):
Introduced a specialized guard for file creation operations
that handles the two-phase credential override (mounter
credentials, then fs{g,u}id override). The new pattern is much
clearer:
with_ovl_creds(dentry->d_sb) {
scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) {
if (IS_ERR(cred))
return PTR_ERR(cred);
/* creation operations */
}
}
(3) Copy-up credential guard (ovl_cu_creds):
Introduced a specialized guard for copy-up operations,
simplifying the previous struct ovl_cu_creds helper and
associated functions.
Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this
pattern.
Cleanups:
- Remove ovl_revert_creds() after all callers converted to guards
- Remove struct ovl_cu_creds and associated functions
- Drop ovl_setup_cred_for_create() after conversion
- Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(),
ovl_rename() for cleaner credential guard scope
- Introduce struct ovl_renamedata to simplify rename handling
- Don't override credentials for ovl_check_whiteouts() (unnecessary)
- Remove unneeded semicolon"
* tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits)
ovl: remove unneeded semicolon
ovl: remove struct ovl_cu_creds and associated functions
ovl: port ovl_copy_up_tmpfile() to cred guard
ovl: mark *_cu_creds() as unused temporarily
ovl: port ovl_copy_up_workdir() to cred guard
ovl: add copy up credential guard
ovl: drop ovl_setup_cred_for_create()
ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard
ovl: mark ovl_setup_cred_for_create() as unused temporarily
ovl: reflow ovl_create_or_link()
ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard
ovl: add ovl_override_creator_creds cred guard
ovl: remove ovl_revert_creds()
ovl: port ovl_fill_super() to cred guard
ovl: refactor ovl_fill_super()
ovl: port ovl_lower_positive() to cred guard
ovl: port ovl_lookup() to cred guard
ovl: refactor ovl_lookup()
ovl: port ovl_copyfile() to cred guard
ovl: port ovl_rename() to cred guard
...
…/kernel/git/vfs/vfs Pull autofs update from Christian Brauner: "Prevent futile mount triggers in private mount namespaces. Fix a problematic loop in autofs when a mount namespace contains autofs mounts that are propagation private and there is no namespace-specific automount daemon to handle possible automounting. Previously, attempted path resolution would loop until MAXSYMLINKS was reached before failing, causing significant noise in the log. The fix adds a check in autofs ->d_automount() so that the VFS can immediately return EPERM in this case. Since the mount is propagation private, EPERM is the most appropriate error code" * tag 'vfs-6.19-rc1.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: autofs: dont trigger mount if it cant succeed
…m/linux/kernel/git/vfs/vfs
Pull fd prepare updates from Christian Brauner:
"This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the
common pattern of get_unused_fd_flags() + create file + fd_install()
that is used extensively throughout the kernel and currently requires
cumbersome cleanup paths.
FD_ADD() - For simple cases where a file is installed immediately:
fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device));
if (fd < 0)
vfio_device_put_registration(device);
return fd;
FD_PREPARE() - For cases requiring access to the fd or file, or
additional work before publishing:
FD_PREPARE(fdf, O_CLOEXEC, sync_file->file);
if (fdf.err) {
fput(sync_file->file);
return fdf.err;
}
data.fence = fd_prepare_fd(fdf);
if (copy_to_user((void __user *)arg, &data, sizeof(data)))
return -EFAULT;
return fd_publish(fdf);
The primitives are centered around struct fd_prepare. FD_PREPARE()
encapsulates all allocation and cleanup logic and must be followed by
a call to fd_publish() which associates the fd with the file and
installs it into the caller's fdtable. If fd_publish() isn't called,
both are deallocated automatically. FD_ADD() is a shorthand that does
fd_publish() immediately and never exposes the struct to the caller.
I've implemented this in a way that it's compatible with the cleanup
infrastructure while also being usable separately. IOW, it's centered
around struct fd_prepare which is aliased to class_fd_prepare_t and so
we can make use of all the basica guard infrastructure"
* tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
io_uring: convert io_create_mock_file() to FD_PREPARE()
file: convert replace_fd() to FD_PREPARE()
vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD()
tty: convert ptm_open_peer() to FD_ADD()
ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
media: convert media_request_alloc() to FD_PREPARE()
hv: convert mshv_ioctl_create_partition() to FD_ADD()
gpio: convert linehandle_create() to FD_PREPARE()
pseries: port papr_rtas_setup_file_interface() to FD_ADD()
pseries: convert papr_platform_dump_create_handle() to FD_ADD()
spufs: convert spufs_gang_open() to FD_PREPARE()
papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
spufs: convert spufs_context_open() to FD_PREPARE()
net/socket: convert __sys_accept4_file() to FD_ADD()
net/socket: convert sock_map_fd() to FD_ADD()
net/kcm: convert kcm_ioctl() to FD_PREPARE()
net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()
secretmem: convert memfd_secret() to FD_ADD()
memfd: convert memfd_create() to FD_ADD()
bpf: convert bpf_token_create() to FD_PREPARE()
...
…inux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Mutexes:
- Redo __mutex_init() to reduce generated code size (Sebastian
Andrzej Siewior)
Seqlocks:
- Introduce scoped_seqlock_read() (Peter Zijlstra)
- Change thread_group_cputime() to use scoped_seqlock_read() (Oleg
Nesterov)
- Change do_task_stat() to use scoped_seqlock_read() (Oleg Nesterov)
- Change do_io_accounting() to use scoped_seqlock_read() (Oleg
Nesterov)
- Fix the incorrect documentation of read_seqbegin_or_lock() /
need_seqretry() (Oleg Nesterov)
- Allow KASAN to fail optimizing (Peter Zijlstra)
Local lock updates:
- Fix all kernel-doc warnings (Randy Dunlap)
- Add the <linux/local_lock*.h> headers to MAINTAINERS (Sebastian
Andrzej Siewior)
- Reduce the risk of shadowing via s/l/__l/ and s/tl/__tl/ (Vincent
Mailhol)
Lock debugging:
- spinlock/debug: Fix data-race in do_raw_write_lock (Alexander
Sverdlin)
Atomic primitives infrastructure:
- atomic: Skip alignment check for try_cmpxchg() old arg (Arnd
Bergmann)
Rust runtime integration:
- sync: atomic: Enable generated Atomic<T> usage (Boqun Feng)
- sync: atomic: Implement Debug for Atomic<Debug> (Boqun Feng)
- debugfs: Remove Rust native atomics and replace them with Linux
versions (Boqun Feng)
- debugfs: Implement Reader for Mutex<T> only when T is Unpin (Boqun
Feng)
- lock: guard: Add T: Unpin bound to DerefMut (Daniel Almeida)
- lock: Pin the inner data (Daniel Almeida)
- lock: Add a Pin<&mut T> accessor (Daniel Almeida)"
* tag 'locking-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/local_lock: Fix all kernel-doc warnings
locking/local_lock: s/l/__l/ and s/tl/__tl/ to reduce the risk of shadowing
locking/local_lock: Add the <linux/local_lock*.h> headers to MAINTAINERS
locking/mutex: Redo __mutex_init() to reduce generated code size
rust: debugfs: Replace the usage of Rust native atomics
rust: sync: atomic: Implement Debug for Atomic<Debug>
rust: sync: atomic: Make Atomic*Ops pub(crate)
seqlock: Allow KASAN to fail optimizing
rust: debugfs: Implement Reader for Mutex<T> only when T is Unpin
seqlock: Change do_io_accounting() to use scoped_seqlock_read()
seqlock: Change do_task_stat() to use scoped_seqlock_read()
seqlock: Change thread_group_cputime() to use scoped_seqlock_read()
seqlock: Introduce scoped_seqlock_read()
documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry
atomic: Skip alignment check for try_cmpxchg() old arg
rust: lock: Add a Pin<&mut T> accessor
rust: lock: Pin the inner data
rust: lock: guard: Add T: Unpin bound to DerefMut
locking/spinlock/debug: Fix data-race in do_raw_write_lock
…inux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- klp-build livepatch module generation (Josh Poimboeuf)
Introduce new objtool features and a klp-build script to generate
livepatch modules using a source .patch as input.
This builds on concepts from the longstanding out-of-tree kpatch
project which began in 2012 and has been used for many years to
generate livepatch modules for production kernels. However, this is a
complete rewrite which incorporates hard-earned lessons from 12+
years of maintaining kpatch.
Key improvements compared to kpatch-build:
- Integrated with objtool: Leverages objtool's existing control-flow
graph analysis to help detect changed functions.
- Works on vmlinux.o: Supports late-linked objects, making it
compatible with LTO, IBT, and similar.
- Simplified code base: ~3k fewer lines of code.
- Upstream: No more out-of-tree #ifdef hacks, far less cruft.
- Cleaner internals: Vastly simplified logic for
symbol/section/reloc inclusion and special section extraction.
- Robust __LINE__ macro handling: Avoids false positive binary diffs
caused by the __LINE__ macro by introducing a fix-patch-lines
script which injects #line directives into the source .patch to
preserve the original line numbers at compile time.
- Disassemble code with libopcodes instead of running objdump
(Alexandre Chartre)
- Disassemble support (-d option to objtool) by Alexandre Chartre,
which supports the decoding of various Linux kernel code generation
specials such as alternatives:
17ef: sched_balance_find_dst_group+0x62f mov 0x34(%r9),%edx
17f3: sched_balance_find_dst_group+0x633 | <alternative.17f3> | X86_FEATURE_POPCNT
17f3: sched_balance_find_dst_group+0x633 | call 0x17f8 <__sw_hweight64> | popcnt %rdi,%rax
17f8: sched_balance_find_dst_group+0x638 cmp %eax,%edx
... jump table alternatives:
1895: sched_use_asym_prio+0x5 test $0x8,%ch
1898: sched_use_asym_prio+0x8 je 0x18a9 <sched_use_asym_prio+0x19>
189a: sched_use_asym_prio+0xa | <jump_table.189a> | JUMP
189a: sched_use_asym_prio+0xa | jmp 0x18ae <sched_use_asym_prio+0x1e> | nop2
189c: sched_use_asym_prio+0xc mov $0x1,%eax
18a1: sched_use_asym_prio+0x11 and $0x80,%ecx
... exception table alternatives:
native_read_msr:
5b80: native_read_msr+0x0 mov %edi,%ecx
5b82: native_read_msr+0x2 | <ex_table.5b82> | EXCEPTION
5b82: native_read_msr+0x2 | rdmsr | resume at 0x5b84 <native_read_msr+0x4>
5b84: native_read_msr+0x4 shl $0x20,%rdx
.... x86 feature flag decoding (also see the X86_FEATURE_POPCNT
example in sched_balance_find_dst_group() above):
2faaf: start_thread_common.constprop.0+0x1f jne 0x2fba4 <start_thread_common.constprop.0+0x114>
2fab5: start_thread_common.constprop.0+0x25 | <alternative.2fab5> | X86_FEATURE_ALWAYS | X86_BUG_NULL_SEG
2fab5: start_thread_common.constprop.0+0x25 | jmp 0x2faba <.altinstr_aux+0x2f4> | jmp 0x4b0 <start_thread_common.constprop.0+0x3f> | nop5
2faba: start_thread_common.constprop.0+0x2a mov $0x2b,%eax
... NOP sequence shortening:
1048e2: snapshot_write_finalize+0xc2 je 0x104917 <snapshot_write_finalize+0xf7>
1048e4: snapshot_write_finalize+0xc4 nop6
1048ea: snapshot_write_finalize+0xca nop11
1048f5: snapshot_write_finalize+0xd5 nop11
104900: snapshot_write_finalize+0xe0 mov %rax,%rcx
104903: snapshot_write_finalize+0xe3 mov 0x10(%rdx),%rax
... and much more.
- Function validation tracing support (Alexandre Chartre)
- Various -ffunction-sections fixes (Josh Poimboeuf)
- Clang AutoFDO (Automated Feedback-Directed Optimizations) support
(Josh Poimboeuf)
- Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo
Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra,
Thorsten Blum)
* tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
objtool: Fix segfault on unknown alternatives
objtool: Build with disassembly can fail when including bdf.h
objtool: Trim trailing NOPs in alternative
objtool: Add wide output for disassembly
objtool: Compact output for alternatives with one instruction
objtool: Improve naming of group alternatives
objtool: Add Function to get the name of a CPU feature
objtool: Provide access to feature and flags of group alternatives
objtool: Fix address references in alternatives
objtool: Disassemble jump table alternatives
objtool: Disassemble exception table alternatives
objtool: Print addresses with alternative instructions
objtool: Disassemble group alternatives
objtool: Print headers for alternatives
objtool: Preserve alternatives order
objtool: Add the --disas=<function-pattern> action
objtool: Do not validate IBT for .return_sites and .call_sites
objtool: Improve tracing of alternative instructions
objtool: Add functions to better name alternatives
objtool: Identify the different types of alternatives
...
…x/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Callchain support:
- Add support for deferred user-space stack unwinding for perf,
enabled on x86. (Peter Zijlstra, Steven Rostedt)
- unwind_user/x86: Enable frame pointer unwinding on x86 (Josh
Poimboeuf)
x86 PMU support and infrastructure:
- x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra)
- x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra)
Intel PMU driver:
- Large series to prepare for and implement architectural PEBS
support for Intel platforms such as Clearwater Forest (CWF) and
Panther Lake (PTL). (Dapeng Mi, Kan Liang)
- Check dynamic constraints (Kan Liang)
- Optimize PEBS extended config (Peter Zijlstra)
- cstates:
- Remove PC3 support from LunarLake (Zhang Rui)
- Add Pantherlake support (Zhang Rui)
- Clearwater Forest support (Zide Chen)
AMD PMU driver:
- x86/amd: Check event before enable to avoid GPF (George Kennedy)
Fixes and cleanups:
- task_work: Fix NMI race condition (Peter Zijlstra)
- perf/x86: Fix NULL event access and potential PEBS record loss
(Dapeng Mi)
- Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter
Zijlstra)"
* tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use
perf/x86/intel: Optimize PEBS extended config
perf/x86/intel: Check PEBS dyn_constraints
perf/x86/intel: Add a check for dynamic constraints
perf/x86/intel: Add counter group support for arch-PEBS
perf/x86/intel: Setup PEBS data configuration and enable legacy groups
perf/x86/intel: Update dyn_constraint base on PEBS event precise level
perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
perf/x86/intel: Process arch-PEBS records or record fragments
perf/x86/intel/ds: Factor out PEBS group processing code to functions
perf/x86/intel/ds: Factor out PEBS record processing code to functions
perf/x86/intel: Initialize architectural PEBS
perf/x86/intel: Correct large PEBS flag check
perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
perf/x86: Fix NULL event access and potential PEBS record loss
perf/x86: Remove redundant is_x86_event() prototype
entry,unwind/deferred: Fix unwind_reset_info() placement
unwind_user/x86: Fix arch=um build
perf: Support deferred user unwind
unwind_user/x86: Teach FP unwind about start of function
...
…ux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Scalability and load-balancing improvements:
- Enable scheduler feature NEXT_BUDDY (Mel Gorman)
- Reimplement NEXT_BUDDY to align with EEVDF goals (Mel Gorman)
- Skip sched_balance_running cmpxchg when balance is not due (Tim
Chen)
- Implement generic code for architecture specific sched domain NUMA
distances (Tim Chen)
- Optimize the NUMA distances of the sched-domains builds of Intel
Granite Rapids (GNR) and Clearwater Forest (CWF) platforms (Tim
Chen)
- Implement proportional newidle balance: a randomized algorithm that
runs newidle balancing proportional to its success rate. (Peter
Zijlstra)
Scheduler infrastructure changes:
- Implement the 'sched_change' scoped_guard() pattern for the entire
scheduler (Peter Zijlstra)
- More broadly utilize the sched_change guard (Peter Zijlstra)
- Add support to pick functions to take runqueue-flags (Joel
Fernandes)
- Provide and use set_need_resched_current() (Peter Zijlstra)
Fair scheduling enhancements:
- Forfeit vruntime on yield (Fernand Sieber)
- Only update stats for allowed CPUs when looking for dst group (Adam
Li)
CPU-core scheduling enhancements:
- Optimize core cookie matching check (Fernand Sieber)
Deadline scheduler fixes:
- Only set free_cpus for online runqueues (Doug Berger)
- Fix dl_server time accounting (Peter Zijlstra)
- Fix dl_server stop condition (Peter Zijlstra)
Proxy scheduling fixes:
- Yield the donor task (Fernand Sieber)
Fixes and cleanups:
- Fix do_set_cpus_allowed() locking (Peter Zijlstra)
- Fix migrate_disable_switch() locking (Peter Zijlstra)
- Remove double update_rq_clock() in __set_cpus_allowed_ptr_locked()
(Hao Jia)
- Increase sched_tick_remote timeout (Phil Auld)
- sched/deadline: Use cpumask_weight_and() in dl_bw_cpus() (Shrikanth
Hegde)
- sched/deadline: Clean up select_task_rq_dl() (Shrikanth Hegde)"
* tag 'sched-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
sched: Provide and use set_need_resched_current()
sched/fair: Proportional newidle balance
sched/fair: Small cleanup to update_newidle_cost()
sched/fair: Small cleanup to sched_balance_newidle()
sched/fair: Revert max_newidle_lb_cost bump
sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals
sched/fair: Enable scheduler feature NEXT_BUDDY
sched: Increase sched_tick_remote timeout
sched/fair: Have SD_SERIALIZE affect newidle balancing
sched/fair: Skip sched_balance_running cmpxchg when balance is not due
sched/deadline: Minor cleanup in select_task_rq_dl()
sched/deadline: Use cpumask_weight_and() in dl_bw_cpus
sched/deadline: Document dl_server
sched/deadline: Fix dl_server stop condition
sched/deadline: Fix dl_server time accounting
sched/core: Remove double update_rq_clock() in __set_cpus_allowed_ptr_locked()
sched/eevdf: Fix min_vruntime vs avg_vruntime
sched/core: Add comment explaining force-idle vruntime snapshots
sched/core: Optimize core cookie matching check
sched/proxy: Yield the donor task
...
…/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: - x86/apic: Fix the frequency in apic=verbose log output (Julian Stecklina) - Simplify mp_irqdomain_alloc() slightly (Christophe JAILLET) * tag 'x86-apic-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix frequency in apic=verbose log output x86/ioapic: Simplify mp_irqdomain_alloc() slightly
…x/kernel/git/tip/tip Pull x86 math-emu fix from Ingo Molnar: "A single fix for an ancient prototype in the math-emu code, by Arnd Bergmann" * tag 'x86-build-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/math-emu: Fix div_Xsig() prototype
…/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: - x86/alternatives: Drop unnecessary test after call to alt_replace_call() (Juergen Gross) - x86/dumpstack: Prevent KASAN false positive warnings in __show_regs() (Tengda Wu) * tag 'x86-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/dumpstack: Prevent KASAN false positive warnings in __show_regs() x86/alternative: Drop not needed test after call of alt_replace_call()
…x/kernel/git/tip/tip
Pull bug handling infrastructure updates from Ingo Molnar:
"Core updates:
- Improve WARN(), which has vararg printf like arguments, to work
with the x86 #UD based WARN-optimizing infrastructure by hiding the
format in the bug_table and replacing this first argument with the
address of the bug-table entry, while making the actual function
that's called a UD1 instruction (Peter Zijlstra)
- Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
Molnar, s390 support by Heiko Carstens)
Fixes and cleanups:
- bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)
- <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
Zijlstra)"
* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
x86/bug: Fix BUG_FORMAT vs KASLR
x86_64/bug: Inline the UD1
x86/bug: Implement WARN_ONCE()
x86_64/bug: Implement __WARN_printf()
x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
x86/bug: Add BUG_FORMAT basics
bug: Allow architectures to provide __WARN_printf()
bug: Implement WARN_ON() using __WARN_FLAGS()
bug: Add report_bug_entry()
bug: Add BUG_FORMAT_ARGS infrastructure
bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
bug: Add BUG_FORMAT infrastructure
x86: Rework __bug_table helpers
bugs/s390: Remove private WARN_ON() implementation
bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.