Skip to content

Conversation

@sunflower2333
Copy link
Owner

No description provided.

blitz and others added 30 commits November 7, 2025 17:48
When apic=verbose is specified, the LAPIC timer calibration prints its results
to the console. At least while debugging virtualization code, the CPU and bus
frequencies are printed incorrectly.

Specifically, for a 1.7 GHz CPU with 1 GHz bus frequency and HZ=1000,
the log includes a superfluous 0 after the period:

  ..... calibration result: 999978
  ..... CPU clock speed is 1696.0783 MHz.
  ..... host bus clock speed is 999.0978 MHz.

Looking at the code, this only worked as intended for HZ=100. After the fix,
the correct frequency is printed:

  ..... calibration result: 999828
  ..... CPU clock speed is 1696.507 MHz.
  ..... host bus clock speed is 999.828 MHz.

There is no functional change to the LAPIC calibration here, beyond the
printing format changes.

  [ bp: - Massage commit message
        - Figures it should apply this patch about ~4 years later
        - Massage it into the current code ]

Suggested-by: Markus Napierkowski <[email protected]>
Signed-off-by: Julian Stecklina <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://patch.msgid.link/[email protected]
The third argument of div_Xsig() is the output of the division, but is marked
'const', which means the compiler is not expecting it to be updated and may
generate bad code around the call. clang-21 now warns about the pattern since
an uninitialized variable is passed into two 'const' arguments by reference:

  arch/x86/math-emu/poly_atan.c:93:28: error: variable 'argSignif' is uninitialized \
  when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
     93 |         div_Xsig(&Numer, &Denom, &argSignif);
        |                                   ^~~~~~~~~
  arch/x86/math-emu/poly_l2.c:195:29: error: variable 'argSignif' is uninitialized \
  when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
    195 |                 div_Xsig(&Numer, &Denom, &argSignif);
        |                                           ^~~~~~~~~

The implementation is in assembly, so the problem has gone unnoticed since the
code was added in the linux-1.1 days. Remove the 'const' marker here.

Fixes: e19a1bd ("Import 1.1.38")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Copy from

  54da6a0 ("locking: Introduce __cleanup() based infrastructure")

the bits which mark the variable with a cleanup attribute unused so that my
clang 15 can dispose of it properly instead of warning that it is unused which
then fails the build due to -Werror.

Suggested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/20251031114919.GBaQSiPxZrziOs3RCW@fat_crate.local
When executing a task in proxy context, handle yields as if they were
requested by the donor task. This matches the traditional PI semantics
of yield() as well.

This avoids scenario like proxy task yielding, pick next task selecting the
same previous blocked donor, running the proxy task again, etc.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Fernand Sieber <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Early return true if the core cookie matches. This avoids the SMT mask
loop to check for an idle core, which might be more expensive on wide
platforms.

Signed-off-by: Fernand Sieber <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: K Prateek Nayak <[email protected]>
Reviewed-by: Madadi Vineeth Reddy <[email protected]>
Link: https://patch.msgid.link/[email protected]
I always end up having to re-read these emails every time I look at
this code. And a future patch is going to change this story a little.
This means it is past time to stick them in a comment so it can be
modified and stay current.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://patch.msgid.link/[email protected]
Basically, from the constraint that the sum of lag is zero, you can
infer that the 0-lag point is the weighted average of the individual
vruntime, which is what we're trying to compute:

        \Sum w_i * v_i
  avg = --------------
           \Sum w_i

Now, since vruntime takes the whole u64 (worse, it wraps), this
multiplication term in the numerator is not something we can compute;
instead we do the min_vruntime (v0 henceforth) thing like:

  v_i = (v_i - v0) + v0

This does two things:
 - it keeps the key: (v_i - v0) 'small';
 - it creates a relative 0-point in the modular space.

If you do that subtitution and work it all out, you end up with:

        \Sum w_i * (v_i - v0)
  avg = --------------------- + v0
              \Sum w_i

Since you cannot very well track a ratio like that (and not suffer
terrible numerical problems) we simpy track the numerator and
denominator individually and only perform the division when strictly
needed.

Notably, the numerator lives in cfs_rq->avg_vruntime and the denominator
lives in cfs_rq->avg_load.

The one extra 'funny' is that these numbers track the entities in the
tree, and current is typically outside of the tree, so avg_vruntime()
adds current when needed before doing the division.

(vruntime_eligible() elides the division by cross-wise multiplication)

Anyway, as mentioned above, we currently use the CFS era min_vruntime
for this purpose. However, this thing can only move forward, while the
above avg can in fact move backward (when a non-eligible task leaves,
the average becomes smaller), this can cause trouble when through
happenstance (or construction) these values drift far enough apart to
wreck the game.

Replace cfs_rq::min_vruntime with cfs_rq::zero_vruntime which is kept
near/at avg_vruntime, following its motion.

The down-side is that this requires computing the avg more often.

Fixes: 147f3ef ("sched/fair: Implement an EEVDF-like scheduling policy")
Reported-by: Zicheng Qu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Cc: [email protected]
…_locked()

Since commit d4c6420 ("sched: Cleanup the sched_change NOCLOCK usage"),
update_rq_clock() is called in do_set_cpus_allowed() -> sched_change_begin()
to update the rq clock. This results in a duplicate call update_rq_clock()
in __set_cpus_allowed_ptr_locked().

While holding the rq lock and before calling do_set_cpus_allowed(),
there is nothing that depends on an updated rq_clock.

Therefore, remove the redundant update_rq_clock() in
__set_cpus_allowed_ptr_locked() to avoid the warning about double
rq clock updates.

Fixes: d4c6420 ("sched: Cleanup the sched_change NOCLOCK usage")
Signed-off-by: Hao Jia <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: K Prateek Nayak <[email protected]>
Link: https://patch.msgid.link/[email protected]
The dl_server time accounting code is a little odd. The normal scheduler
pattern is to update curr before doing something, such that the old state is
fully accounted before changing state.

Notably, the dl_server_timer() needs to propagate the current time accounting
since the current task could be ran by dl_server and thus this can affect
dl_se->runtime. Similarly for dl_server_start().

And since the (deferred) dl_server wants idle time accounted, rework
sched_idle_class time accounting to be more like all the others.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Gabriel reported that the dl_server doesn't stop as expected.

The problem was found to be the fact that idle time and fair runtime are
treated equally. Both will count towards dl_server runtime and push the
activation forwards when it is in the zero-laxity wait state.

Notably:

  dl_server_update_idle()
    update_curr_dl_se()
      if (dl_defer && dl_throttled && dl_runtime_exceeded())
        hrtimer_try_to_cancel(); // stop timer
	replenish_dl_new_period()
	  deadline = now + dl_deadline; // fwd period
	  runtime = dl_runtime;
        start_dl_timer(); // restart timer

And while we do want idle time accounted towards the *current* activation of
the dl_server -- after all, a fair task could've ran if we had any -- we don't
necessarily want idle time to cause or push forward an activation.

Introduce dl_defer_idle to make this distinction. It will be set once idle time
pushed the activation forward, once set idle time will only be allowed to
consume any runtime but not push the activation. This will then cause
dl_server_timer() to fire, which will stop the dl_server.

Any non-idle time accounting during this phase will clear dl_defer_idle, so
only a full period of idle will cause the dl_server to stop.

Reported-by: Gabriele Monaco <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Place the notes that resulted from going through the dl_server code in a
comment.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
cpumask_subset(a,b) -> cpumask_weight(a) should be same as cpumask_weight_and(a,b)
for_each_cpu_and(a,b) to count cpus could be replaced by cpumask_weight_and(a,b)

No Functional Change. It could save a few cycles since cpumask_weight_and
would be more efficient. Plus one less stack variable.

Signed-off-by: Shrikanth Hegde <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
In select_task_rq_dl, there is only one goto statement, there is no
need for it.

No functional changes.

Signed-off-by: Shrikanth Hegde <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
__break_lease() currently overrides the flc_flags field in the lease
after allocating it. A forthcoming patch will add the ability to request
a FL_DELEG type lease.

Instead of overriding the flags field, add a flags argument to
lease_alloc() and lease_init() so it's set correctly after allocating.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
Currently __break_lease takes both a type and an openmode. With the
addition of directory leases, that makes less sense. Declare a set of
LEASE_BREAK_* flags that can be used to control how lease breaks work
instead of requiring a type and an openmode.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
The current API requires a pointer to an inode pointer. It's easy for
callers to get this wrong. Add a new delegated_inode structure and use
that to pass back any inode that needs to be waited on.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
When nfsd starts requesting directory delegations, setlease handlers may
see requests for leases on directories. Push the !S_ISREG check down
into the non-trivial setlease handlers, so we can selectively enable
them where they're supported.

FUSE is special: It's the only filesystem that supports atomic_open and
allows kernel-internal leases. atomic_open is issued when the VFS
doesn't know the state of the dentry being opened. If the file doesn't
exist, it may be created, in which case the dir lease should be broken.

The existing kernel-internal lease implementation has no provision for
this. Ensure that we don't allow directory leases by default going
forward by explicitly disabling them there.

Reviewed-by: NeilBrown <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

vfs_link, vfs_unlink, and vfs_rename all have existing delegation break
handling for the children in the rename. Add the necessary calls for
breaking delegations in the parent(s) as well.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a new delegated_inode parameter to vfs_mkdir. All of the existing
callers set that to NULL for now, except for do_mkdirat which will
properly block until the lease is gone.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a delegated_inode struct to vfs_rmdir() and populate that
pointer with the parent inode if it's non-NULL. Most existing in-kernel
callers pass in a NULL pointer.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a delegated_inode parameter to lookup_open and have it break the
delegation. Then, open_last_lookups can wait for the delegation break
and retry the call to lookup_open once it's done.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
As Neil points out:

"I would be in favour of dropping the "dir" arg because it is always
d_inode(dentry->d_parent) which is stable."

...and...

"Also *every* caller of vfs_create() passes ".excl = true".  So maybe we
don't need that arg at all."

Drop both arguments from vfs_create() and fix up the callers.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a delegated_inode parameter to vfs_create. Most callers are
converted to pass in NULL, but do_mknodat() is changed to wait for a
delegation break if there is one.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a new delegated_inode pointer to vfs_mknod() and have the
appropriate callers wait when there is an outstanding delegation. All
other callers just set the pointer to NULL.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
In order to add directory delegation support, we must break delegations
on the parent on any change to the directory.

Add a delegated_inode parameter to vfs_symlink() and have it break the
delegation. do_symlinkat() can then wait on the delegation break before
proceeding.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
With the addition of the try_break_lease calls in directory changing
operations, allow generic_setlease to hand them out. Write leases on
directories are never allowed however, so continue to reject them.

For now, there is no API for requesting delegations from userland, so
ensure that userland is prevented from acquiring a lease on a directory.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
The filecache infrastructure will only handle S_IFREG files at the
moment. Directory delegations will require adding support for opening
S_IFDIR inodes.

Plumb a "type" argument into nfsd_file_do_acquire() and have all of the
existing callers set it to S_IFREG. Add a new nfsd_file_acquire_dir()
wrapper that nfsd can call to request a nfsd_file that holds a directory
open.

For now, there is no need for a fsnotify_mark for directories, as
CB_NOTIFY is not yet supported. Change nfsd_file_do_acquire() to avoid
allocating one for non-S_IFREG inodes.

Reviewed-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
As Trond pointed out: "...provided that the presented stateid is
actually valid, it is also sufficient to uniquely identify the file to
which it is associated (see RFC8881 Section 8.2.4), so the filehandle
should be considered mostly irrelevant for operations like DELEGRETURN."

Don't ask fh_verify to filter on file type.

Reviewed-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
Add a new routine for acquiring a read delegation on a directory. These
are recallable-only delegations with no support for CB_NOTIFY. That will
be added in a later phase.

Since the same CB_RECALL/DELEGRETURN infrastructure is used for regular
and directory delegations, a normal nfs4_delegation is used to represent
a directory delegation.

Reviewed-by: NeilBrown <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
Now that support for recallable directory delegations is available,
expose this functionality to userland with new F_SETDELEG and F_GETDELEG
commands for fcntl().

Note that this also allows userland to request a FL_DELEG type lease on
files too. Userland applications that do will get signalled when there
are metadata changes in addition to just data changes (which is a
limitation of FL_LEASE leases).

These commands accept a new "struct delegation" argument that contains a
flags field for future expansion.

Signed-off-by: Jeff Layton <[email protected]>
Link: https://patch.msgid.link/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
brauner and others added 29 commits November 28, 2025 12:42
Christian Brauner <[email protected]> says:

The fix sent in [1] was squashed into this commit.

Fixes: https://lore.kernel.org/[email protected] [1]
Reported-by: Mark Brown <[email protected]> [1]
Suggested-by: Linus Torvalds <[email protected]> [1]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
Christian Brauner <[email protected]> says:

This now removes roughly double the code that it adds.

I've been playing with this to allow for moderately flexible usage of
the get_unused_fd_flags() + create file + fd_install() pattern that's
used quite extensively and requires cumbersome cleanup paths.

How callers allocate files is really heterogenous so it's not really
convenient to fold them into a single class. It's possibe to split them
into subclasses like for anon inodes. I think that's not necessarily
nice as well. This adds two primitives:

(1) FD_ADD() the simple cases a file is installed:

    fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device));
    if (fd < 0)
            vfio_device_put_registration(device);
    return fd;

(2) FD_PREPARE() that captures all the cases where access to fd or file
    or additional work before publishing the fd is needed:

    FD_PREPARE(fdf, O_CLOEXEC, sync_file->file);
    if (fdf.err) {
            fput(sync_file->file);
            return fdf.err;
    }

    data.fence = fd_prepare_fd(fdf);
    if (copy_to_user((void __user *)arg, &data, sizeof(data)))
            return -EFAULT;

    return fd_publish(fdf);

I've converted all of the easy cases over to it and it gets rid of an
aweful lot of convoluted cleanup logic. There are a bunch of other cases
that can also be converted after a bit of massaging.

It's centered around a simple struct. FD_PREPARE() encapsulates all of
allocation and cleanup logic and must be followed by a call to
fd_publish() which associates the fd with the file and installs it into
the callers fdtable. If fd_publish() isn't called both are deallocated.
FD_ADD() is a shorthand that does the fd_publish() and never exposes the
struct to the caller. That's often the case when they don't need access
to anything after installing the fd.

It mandates a specific order namely that first we allocate the fd and
then instantiate the file. But that shouldn't be a problem. Nearly
everyone I've converted used this order anyway.

There's a bunch of additional cases where it would be easy to convert
them to this pattern. For example, the whole sync file stuff in dma
currently returns the containing structure of the file instead of the
file itself even though it's only used to allocate files. Changing that
would make it fall into the FD_PREPARE() pattern easily. I've not done
that work yet.

There's room for extending this in a way that wed'd have subclasses for
some particularly often use patterns but as I said I'm not even sure
that's worth it.

* patches from https://patch.msgid.link/[email protected]: (47 commits)
  kvm: convert kvm_vcpu_ioctl_get_stats_fd() to FD_PREPARE()
  kvm: convert kvm_arch_supports_gmem_init_shared() to FD_PREPARE()
  io_uring: convert io_create_mock_file() to FD_PREPARE()
  file: convert replace_fd() to FD_PREPARE()
  vfio: convert vfio_group_ioctl_get_device_fd() to FD_PREPARE()
  tty: convert ptm_open_peer() to FD_PREPARE()
  ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
  media: convert media_request_alloc() to FD_PREPARE()
  hv: convert mshv_ioctl_create_partition() to FD_PREPARE()
  gpio: convert linehandle_create() to FD_PREPARE()
  dma: port sw_sync_ioctl_create_fence() to FD_PREPARE()
  pseries: port papr_rtas_setup_file_interface() to FD_PREPARE()
  pseries: convert papr_platform_dump_create_handle() to FD_PREPARE()
  spufs: convert spufs_gang_open() to FD_PREPARE()
  papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
  spufs: convert spufs_context_open() to FD_PREPARE()
  net/socket: convert __sys_accept4_file() to FD_PREPARE()
  net/socket: convert sock_map_fd() to FD_PREPARE()
  net/sctp: convert sctp_getsockopt_peeloff_common() to FD_PREPARE()
  net/kcm: convert kcm_ioctl() to FD_PREPARE()
  ...

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
mutex_init() invokes __mutex_init() providing the name of the lock and
a pointer to a the lock class. With LOCKDEP enabled this information is
useful but without LOCKDEP it not used at all. Passing the pointer
information of the lock class might be considered negligible but the
name of the lock is passed as well and the string is stored. This
information is wasting storage.

Split __mutex_init() into a _genereic() variant doing the initialisation
of the lock and a _lockdep() version which does _genereic() plus the
lockdep bits. Restrict the lockdep version to lockdep enabled builds
allowing the compiler to remove the unused parameter.

This results in the following size reduction:

        text     data       bss        dec  filename
  | 30237599  8161430   1176624   39575653  vmlinux.defconfig
  | 30233269  8149142   1176560   39558971  vmlinux.defconfig.patched
     -4.2KiB   -12KiB

  | 32455099  8471098  12934684   53860881  vmlinux.defconfig.lockdep
  | 32455100  8471098  12934684   53860882  vmlinux.defconfig.patched.lockdep

  | 27152407  7191822   2068040   36412269  vmlinux.defconfig.preempt_rt
  | 27145937  7183630   2067976   36397543  vmlinux.defconfig.patched.preempt_rt
     -6.3KiB    -8KiB

  | 29382020  7505742  13784608   50672370  vmlinux.defconfig.preempt_rt.lockdep
  | 29376229  7505742  13784544   50666515  vmlinux.defconfig.patched.preempt_rt.lockdep
     -5.6KiB

[peterz: folded fix from boqun]

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Waiman Long <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://patch.msgid.link/[email protected]
The local_lock_t was never added to the MAINTAINERS file since its
inclusion.

Add local_lock_t to the locking primitives section.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Waiman Long <[email protected]>
Link: https://patch.msgid.link/[email protected]
…dowing

The Linux kernel coding style advises to avoid common variable
names in function-like macros to reduce the risk of namespace
collisions.

Throughout local_lock_internal.h, several macros use the rather common
variable names 'l' and 'tl'. This already resulted in an actual
collision: the __local_lock_acquire() function like macro is currently
shadowing the parameter 'l' of the:

  class_##_name##_t class_##_name##_constructor(_type *l)

function factory from <linux/cleanup.h>.

Rename the variable 'l' to '__l' and the variable 'tl' to '__tl'
throughout the file to fix the current namespace collision and
to prevent future ones.

[ bigeasy: Rebase, update all l and tl instances in macros ]

Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Waiman Long <[email protected]>
Link: https://patch.msgid.link/[email protected]
Modify kernel-doc comments in local_lock.h to prevent warnings:

  Warning: include/linux/local_lock.h:9 function parameter 'lock' not described in 'local_lock_init'
  Warning: include/linux/local_lock.h:56 function parameter 'lock' not described in 'local_trylock_init'
  Warning: include/linux/local_lock.h:56 expecting prototype for local_lock_init(). Prototype was for local_trylock_init() instead

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://patch.msgid.link/[email protected]
So 'objtool --link -d vmlinux.o' gets surprised by this endbr64+endbr64 pattern
in ___bpf_prog_run():

	___bpf_prog_run:
	1e7680:  ___bpf_prog_run+0x0                                                     push   %r12
	1e7682:  ___bpf_prog_run+0x2                                                     mov    %rdi,%r12
	1e7685:  ___bpf_prog_run+0x5                                                     push   %rbp
	1e7686:  ___bpf_prog_run+0x6                                                     xor    %ebp,%ebp
	1e7688:  ___bpf_prog_run+0x8                                                     push   %rbx
	1e7689:  ___bpf_prog_run+0x9                                                     mov    %rsi,%rbx
	1e768c:  ___bpf_prog_run+0xc                                                     movzbl (%rbx),%esi
	1e768f:  ___bpf_prog_run+0xf                                                     movzbl %sil,%edx
	1e7693:  ___bpf_prog_run+0x13                                                    mov    %esi,%eax
	1e7695:  ___bpf_prog_run+0x15                                                    mov    0x0(,%rdx,8),%rdx
	1e769d:  ___bpf_prog_run+0x1d                                                    jmp    0x1e76a2 <__x86_indirect_thunk_rdx>
	1e76a2:  ___bpf_prog_run+0x22                                                    endbr64
	1e76a6:  ___bpf_prog_run+0x26                                                    endbr64
	1e76aa:  ___bpf_prog_run+0x2a                                                    mov    0x4(%rbx),%edx

And crashes due to blindly dereferencing alt->insn->alt_group.

Bail out on NULL ->alt_group, which produces this warning and continues
with the disassembly, instead of a segfault:

  .git/O/vmlinux.o: warning: objtool: <alternative.1e769d>: failed to disassemble alternative

Cc: Alexandre Chartre <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
…g/pub/scm/linux/kernel/git/vfs/vfs

Pull directory delegations update from Christian Brauner:
 "This contains the work for recall-only directory delegations for
  knfsd.

  Add support for simple, recallable-only directory delegations. This
  was decided at the fall NFS Bakeathon where the NFS client and server
  maintainers discussed how to merge directory delegation support.

  The approach starts with recallable-only delegations for several reasons:

   1. RFC8881 has gaps that are being addressed in RFC8881bis. In
      particular, it requires directory position information for
      CB_NOTIFY callbacks, which is difficult to implement properly
      under Linux. The spec is being extended to allow that information
      to be omitted.

   2. Client-side support for CB_NOTIFY still lags. The client side
      involves heuristics about when to request a delegation.

   3. Early indication shows simple, recallable-only delegations can
      help performance. Anna Schumaker mentioned seeing a multi-minute
      speedup in xfstests runs with them enabled.

  With these changes, userspace can also request a read lease on a
  directory that will be recalled on conflicting accesses. This may be
  useful for applications like Samba. Users can disable leases
  altogether via the fs.leases-enable sysctl if needed.

  VFS changes:

   - Dedicated Type for Delegations

     Introduce struct delegated_inode to track inodes that may have
     delegations that need to be broken. This replaces the previous
     approach of passing raw inode pointers through the delegation
     breaking code paths, providing better type safety and clearer
     semantics for the delegation machinery.

   - Break parent directory delegations in open(..., O_CREAT) codepath

   - Allow mkdir to wait for delegation break on parent

   - Allow rmdir to wait for delegation break on parent

   - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
     and vfs_unlink()

   - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
     on parent directory

   - Clean up argument list for vfs_create()

   - Expose delegation support to userland

  Filelock changes:

   - Make lease_alloc() take a flags argument

   - Rework the __break_lease API to use flags

   - Add struct delegated_inode

   - Push the S_ISREG check down to ->setlease handlers

   - Lift the ban on directory leases in generic_setlease

  NFSD changes:

   - Allow filecache to hold S_IFDIR files

   - Allow DELEGRETURN on directories

   - Wire up GET_DIR_DELEGATION handling

  Fixes:

   - Fix kernel-doc warnings in __fcntl_getlease

   - Add needed headers for new struct delegation definition"

* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: add needed headers for new struct delegation definition
  filelock: __fcntl_getlease: fix kernel-doc warnings
  vfs: expose delegation support to userland
  nfsd: wire up GET_DIR_DELEGATION handling
  nfsd: allow DELEGRETURN on directories
  nfsd: allow filecache to hold S_IFDIR files
  filelock: lift the ban on directory leases in generic_setlease
  vfs: make vfs_symlink break delegations on parent dir
  vfs: make vfs_mknod break delegations on parent directory
  vfs: make vfs_create break delegations on parent directory
  vfs: clean up argument list for vfs_create()
  vfs: break parent dir delegations in open(..., O_CREAT) codepath
  vfs: allow rmdir to wait for delegation break on parent
  vfs: allow mkdir to wait for delegation break on parent
  vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
  filelock: push the S_ISREG check down to ->setlease handlers
  filelock: add struct delegated_inode
  filelock: rework the __break_lease API to use flags
  filelock: make lease_alloc() take a flags argument
…b/scm/linux/kernel/git/vfs/vfs

Pull directory locking updates from Christian Brauner:
 "This contains the work to add centralized APIs for directory locking
  operations.

  This series is part of a larger effort to change directory operation
  locking to allow multiple concurrent operations in a directory. The
  ultimate goal is to lock the target dentry(s) rather than the whole
  parent directory.

  To help with changing the locking protocol, this series centralizes
  locking and lookup in new helper functions. The helpers establish a
  pattern where it is the dentry that is being locked and unlocked
  (currently the lock is held on dentry->d_parent->d_inode, but that can
  change in the future).

  This also changes vfs_mkdir() to unlock the parent on failure, as well
  as dput()ing the dentry. This allows end_creating() to only require
  the target dentry (which may be IS_ERR() after vfs_mkdir()), not the
  parent"

* tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nfsd: fix end_creating() conversion
  VFS: introduce end_creating_keep()
  VFS: change vfs_mkdir() to unlock on failure.
  ecryptfs: use new start_creating/start_removing APIs
  Add start_renaming_two_dentries()
  VFS/ovl/smb: introduce start_renaming_dentry()
  VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
  VFS: add start_creating_killable() and start_removing_killable()
  VFS: introduce start_removing_dentry()
  smb/server: use end_removing_noperm for for target of smb2_create_link()
  VFS: introduce start_creating_noperm() and start_removing_noperm()
  VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
  VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
  VFS: tidy up do_unlinkat()
  VFS: introduce start_dirop() and end_dirop()
  debugfs: rename end_creating() to debugfs_end_creating()
…rnel/git/vfs/vfs

Pull overlayfs cred guard conversion from Christian Brauner:
 "This converts all of overlayfs to use credential guards, eliminating
  manual credential management throughout the filesystem.

  Credential guard conversion:

   - Convert all of overlayfs to use credential guards, replacing the
     manual ovl_override_creds()/ovl_revert_creds() pattern with scoped
     guards.

     This makes credential handling visually explicit and eliminates a
     class of potential bugs from mismatched override/revert calls.

     (1) Basic credential guard (with_ovl_creds)
     (2) Creator credential guard (ovl_override_creator_creds):

         Introduced a specialized guard for file creation operations
         that handles the two-phase credential override (mounter
         credentials, then fs{g,u}id override). The new pattern is much
         clearer:

         with_ovl_creds(dentry->d_sb) {
                 scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) {
                         if (IS_ERR(cred))
                                 return PTR_ERR(cred);
                         /* creation operations */
                 }
         }

     (3) Copy-up credential guard (ovl_cu_creds):

         Introduced a specialized guard for copy-up operations,
         simplifying the previous struct ovl_cu_creds helper and
         associated functions.

         Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this
         pattern.

  Cleanups:

   - Remove ovl_revert_creds() after all callers converted to guards

   - Remove struct ovl_cu_creds and associated functions

   - Drop ovl_setup_cred_for_create() after conversion

   - Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(),
     ovl_rename() for cleaner credential guard scope

   - Introduce struct ovl_renamedata to simplify rename handling

   - Don't override credentials for ovl_check_whiteouts() (unnecessary)

   - Remove unneeded semicolon"

* tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits)
  ovl: remove unneeded semicolon
  ovl: remove struct ovl_cu_creds and associated functions
  ovl: port ovl_copy_up_tmpfile() to cred guard
  ovl: mark *_cu_creds() as unused temporarily
  ovl: port ovl_copy_up_workdir() to cred guard
  ovl: add copy up credential guard
  ovl: drop ovl_setup_cred_for_create()
  ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard
  ovl: mark ovl_setup_cred_for_create() as unused temporarily
  ovl: reflow ovl_create_or_link()
  ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard
  ovl: add ovl_override_creator_creds cred guard
  ovl: remove ovl_revert_creds()
  ovl: port ovl_fill_super() to cred guard
  ovl: refactor ovl_fill_super()
  ovl: port ovl_lower_positive() to cred guard
  ovl: port ovl_lookup() to cred guard
  ovl: refactor ovl_lookup()
  ovl: port ovl_copyfile() to cred guard
  ovl: port ovl_rename() to cred guard
  ...
…/kernel/git/vfs/vfs

Pull autofs update from Christian Brauner:
 "Prevent futile mount triggers in private mount namespaces.

  Fix a problematic loop in autofs when a mount namespace contains
  autofs mounts that are propagation private and there is no
  namespace-specific automount daemon to handle possible automounting.

  Previously, attempted path resolution would loop until MAXSYMLINKS was
  reached before failing, causing significant noise in the log.

 The fix adds a check in autofs ->d_automount() so that the VFS can
 immediately return EPERM in this case. Since the mount is propagation
 private, EPERM is the most appropriate error code"

* tag 'vfs-6.19-rc1.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  autofs: dont trigger mount if it cant succeed
…m/linux/kernel/git/vfs/vfs

Pull fd prepare updates from Christian Brauner:
 "This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the
  common pattern of get_unused_fd_flags() + create file + fd_install()
  that is used extensively throughout the kernel and currently requires
  cumbersome cleanup paths.

  FD_ADD() - For simple cases where a file is installed immediately:

      fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device));
      if (fd < 0)
          vfio_device_put_registration(device);
      return fd;

  FD_PREPARE() - For cases requiring access to the fd or file, or
  additional work before publishing:

      FD_PREPARE(fdf, O_CLOEXEC, sync_file->file);
      if (fdf.err) {
          fput(sync_file->file);
          return fdf.err;
      }

      data.fence = fd_prepare_fd(fdf);
      if (copy_to_user((void __user *)arg, &data, sizeof(data)))
          return -EFAULT;

      return fd_publish(fdf);

  The primitives are centered around struct fd_prepare. FD_PREPARE()
  encapsulates all allocation and cleanup logic and must be followed by
  a call to fd_publish() which associates the fd with the file and
  installs it into the caller's fdtable. If fd_publish() isn't called,
  both are deallocated automatically. FD_ADD() is a shorthand that does
  fd_publish() immediately and never exposes the struct to the caller.

  I've implemented this in a way that it's compatible with the cleanup
  infrastructure while also being usable separately. IOW, it's centered
  around struct fd_prepare which is aliased to class_fd_prepare_t and so
  we can make use of all the basica guard infrastructure"

* tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
  io_uring: convert io_create_mock_file() to FD_PREPARE()
  file: convert replace_fd() to FD_PREPARE()
  vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD()
  tty: convert ptm_open_peer() to FD_ADD()
  ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
  media: convert media_request_alloc() to FD_PREPARE()
  hv: convert mshv_ioctl_create_partition() to FD_ADD()
  gpio: convert linehandle_create() to FD_PREPARE()
  pseries: port papr_rtas_setup_file_interface() to FD_ADD()
  pseries: convert papr_platform_dump_create_handle() to FD_ADD()
  spufs: convert spufs_gang_open() to FD_PREPARE()
  papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
  spufs: convert spufs_context_open() to FD_PREPARE()
  net/socket: convert __sys_accept4_file() to FD_ADD()
  net/socket: convert sock_map_fd() to FD_ADD()
  net/kcm: convert kcm_ioctl() to FD_PREPARE()
  net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()
  secretmem: convert memfd_secret() to FD_ADD()
  memfd: convert memfd_create() to FD_ADD()
  bpf: convert bpf_token_create() to FD_PREPARE()
  ...
…inux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Mutexes:

   - Redo __mutex_init() to reduce generated code size (Sebastian
     Andrzej Siewior)

  Seqlocks:

   - Introduce scoped_seqlock_read() (Peter Zijlstra)

   - Change thread_group_cputime() to use scoped_seqlock_read() (Oleg
     Nesterov)

   - Change do_task_stat() to use scoped_seqlock_read() (Oleg Nesterov)

   - Change do_io_accounting() to use scoped_seqlock_read() (Oleg
     Nesterov)

   - Fix the incorrect documentation of read_seqbegin_or_lock() /
     need_seqretry() (Oleg Nesterov)

   - Allow KASAN to fail optimizing (Peter Zijlstra)

  Local lock updates:

   - Fix all kernel-doc warnings (Randy Dunlap)

   - Add the <linux/local_lock*.h> headers to MAINTAINERS (Sebastian
     Andrzej Siewior)

   - Reduce the risk of shadowing via s/l/__l/ and s/tl/__tl/ (Vincent
     Mailhol)

  Lock debugging:

   - spinlock/debug: Fix data-race in do_raw_write_lock (Alexander
     Sverdlin)

  Atomic primitives infrastructure:

   - atomic: Skip alignment check for try_cmpxchg() old arg (Arnd
     Bergmann)

  Rust runtime integration:

   - sync: atomic: Enable generated Atomic<T> usage (Boqun Feng)

   - sync: atomic: Implement Debug for Atomic<Debug> (Boqun Feng)

   - debugfs: Remove Rust native atomics and replace them with Linux
     versions (Boqun Feng)

   - debugfs: Implement Reader for Mutex<T> only when T is Unpin (Boqun
     Feng)

   - lock: guard: Add T: Unpin bound to DerefMut (Daniel Almeida)

   - lock: Pin the inner data (Daniel Almeida)

   - lock: Add a Pin<&mut T> accessor (Daniel Almeida)"

* tag 'locking-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/local_lock: Fix all kernel-doc warnings
  locking/local_lock: s/l/__l/ and s/tl/__tl/ to reduce the risk of shadowing
  locking/local_lock: Add the <linux/local_lock*.h> headers to MAINTAINERS
  locking/mutex: Redo __mutex_init() to reduce generated code size
  rust: debugfs: Replace the usage of Rust native atomics
  rust: sync: atomic: Implement Debug for Atomic<Debug>
  rust: sync: atomic: Make Atomic*Ops pub(crate)
  seqlock: Allow KASAN to fail optimizing
  rust: debugfs: Implement Reader for Mutex<T> only when T is Unpin
  seqlock: Change do_io_accounting() to use scoped_seqlock_read()
  seqlock: Change do_task_stat() to use scoped_seqlock_read()
  seqlock: Change thread_group_cputime() to use scoped_seqlock_read()
  seqlock: Introduce scoped_seqlock_read()
  documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry
  atomic: Skip alignment check for try_cmpxchg() old arg
  rust: lock: Add a Pin<&mut T> accessor
  rust: lock: Pin the inner data
  rust: lock: guard: Add T: Unpin bound to DerefMut
  locking/spinlock/debug: Fix data-race in do_raw_write_lock
…inux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - klp-build livepatch module generation (Josh Poimboeuf)

   Introduce new objtool features and a klp-build script to generate
   livepatch modules using a source .patch as input.

   This builds on concepts from the longstanding out-of-tree kpatch
   project which began in 2012 and has been used for many years to
   generate livepatch modules for production kernels. However, this is a
   complete rewrite which incorporates hard-earned lessons from 12+
   years of maintaining kpatch.

   Key improvements compared to kpatch-build:

    - Integrated with objtool: Leverages objtool's existing control-flow
      graph analysis to help detect changed functions.

    - Works on vmlinux.o: Supports late-linked objects, making it
      compatible with LTO, IBT, and similar.

    - Simplified code base: ~3k fewer lines of code.

    - Upstream: No more out-of-tree #ifdef hacks, far less cruft.

    - Cleaner internals: Vastly simplified logic for
      symbol/section/reloc inclusion and special section extraction.

    - Robust __LINE__ macro handling: Avoids false positive binary diffs
      caused by the __LINE__ macro by introducing a fix-patch-lines
      script which injects #line directives into the source .patch to
      preserve the original line numbers at compile time.

 - Disassemble code with libopcodes instead of running objdump
   (Alexandre Chartre)

 - Disassemble support (-d option to objtool) by Alexandre Chartre,
   which supports the decoding of various Linux kernel code generation
   specials such as alternatives:

      17ef:  sched_balance_find_dst_group+0x62f                 mov    0x34(%r9),%edx
      17f3:  sched_balance_find_dst_group+0x633               | <alternative.17f3>             | X86_FEATURE_POPCNT
      17f3:  sched_balance_find_dst_group+0x633               | call   0x17f8 <__sw_hweight64> | popcnt %rdi,%rax
      17f8:  sched_balance_find_dst_group+0x638                 cmp    %eax,%edx

   ... jump table alternatives:

      1895:  sched_use_asym_prio+0x5                            test   $0x8,%ch
      1898:  sched_use_asym_prio+0x8                            je     0x18a9 <sched_use_asym_prio+0x19>
      189a:  sched_use_asym_prio+0xa                          | <jump_table.189a>                        | JUMP
      189a:  sched_use_asym_prio+0xa                          | jmp    0x18ae <sched_use_asym_prio+0x1e> | nop2
      189c:  sched_use_asym_prio+0xc                            mov    $0x1,%eax
      18a1:  sched_use_asym_prio+0x11                           and    $0x80,%ecx

   ... exception table alternatives:

    native_read_msr:
      5b80:  native_read_msr+0x0                                                     mov    %edi,%ecx
      5b82:  native_read_msr+0x2                                                   | <ex_table.5b82> | EXCEPTION
      5b82:  native_read_msr+0x2                                                   | rdmsr           | resume at 0x5b84 <native_read_msr+0x4>
      5b84:  native_read_msr+0x4                                                     shl    $0x20,%rdx

   .... x86 feature flag decoding (also see the X86_FEATURE_POPCNT
        example in sched_balance_find_dst_group() above):

      2faaf:  start_thread_common.constprop.0+0x1f                                    jne    0x2fba4 <start_thread_common.constprop.0+0x114>
      2fab5:  start_thread_common.constprop.0+0x25                                  | <alternative.2fab5>                  | X86_FEATURE_ALWAYS                                  | X86_BUG_NULL_SEG
      2fab5:  start_thread_common.constprop.0+0x25                                  | jmp    0x2faba <.altinstr_aux+0x2f4> | jmp    0x4b0 <start_thread_common.constprop.0+0x3f> | nop5
      2faba:  start_thread_common.constprop.0+0x2a                                    mov    $0x2b,%eax

   ... NOP sequence shortening:

      1048e2:  snapshot_write_finalize+0xc2                                            je     0x104917 <snapshot_write_finalize+0xf7>
      1048e4:  snapshot_write_finalize+0xc4                                            nop6
      1048ea:  snapshot_write_finalize+0xca                                            nop11
      1048f5:  snapshot_write_finalize+0xd5                                            nop11
      104900:  snapshot_write_finalize+0xe0                                            mov    %rax,%rcx
      104903:  snapshot_write_finalize+0xe3                                            mov    0x10(%rdx),%rax

   ... and much more.

 - Function validation tracing support (Alexandre Chartre)

 - Various -ffunction-sections fixes (Josh Poimboeuf)

 - Clang AutoFDO (Automated Feedback-Directed Optimizations) support
   (Josh Poimboeuf)

 - Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo
   Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra,
   Thorsten Blum)

* tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  objtool: Fix segfault on unknown alternatives
  objtool: Build with disassembly can fail when including bdf.h
  objtool: Trim trailing NOPs in alternative
  objtool: Add wide output for disassembly
  objtool: Compact output for alternatives with one instruction
  objtool: Improve naming of group alternatives
  objtool: Add Function to get the name of a CPU feature
  objtool: Provide access to feature and flags of group alternatives
  objtool: Fix address references in alternatives
  objtool: Disassemble jump table alternatives
  objtool: Disassemble exception table alternatives
  objtool: Print addresses with alternative instructions
  objtool: Disassemble group alternatives
  objtool: Print headers for alternatives
  objtool: Preserve alternatives order
  objtool: Add the --disas=<function-pattern> action
  objtool: Do not validate IBT for .return_sites and .call_sites
  objtool: Improve tracing of alternative instructions
  objtool: Add functions to better name alternatives
  objtool: Identify the different types of alternatives
  ...
…x/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Callchain support:

   - Add support for deferred user-space stack unwinding for perf,
     enabled on x86. (Peter Zijlstra, Steven Rostedt)

   - unwind_user/x86: Enable frame pointer unwinding on x86 (Josh
     Poimboeuf)

  x86 PMU support and infrastructure:

   - x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra)

   - x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra)

  Intel PMU driver:

   - Large series to prepare for and implement architectural PEBS
     support for Intel platforms such as Clearwater Forest (CWF) and
     Panther Lake (PTL). (Dapeng Mi, Kan Liang)

   - Check dynamic constraints (Kan Liang)

   - Optimize PEBS extended config (Peter Zijlstra)

   - cstates:
      - Remove PC3 support from LunarLake (Zhang Rui)
      - Add Pantherlake support (Zhang Rui)
      - Clearwater Forest support (Zide Chen)

  AMD PMU driver:

   - x86/amd: Check event before enable to avoid GPF (George Kennedy)

  Fixes and cleanups:

   - task_work: Fix NMI race condition (Peter Zijlstra)

   - perf/x86: Fix NULL event access and potential PEBS record loss
     (Dapeng Mi)

   - Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter
     Zijlstra)"

* tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use
  perf/x86/intel: Optimize PEBS extended config
  perf/x86/intel: Check PEBS dyn_constraints
  perf/x86/intel: Add a check for dynamic constraints
  perf/x86/intel: Add counter group support for arch-PEBS
  perf/x86/intel: Setup PEBS data configuration and enable legacy groups
  perf/x86/intel: Update dyn_constraint base on PEBS event precise level
  perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
  perf/x86/intel: Process arch-PEBS records or record fragments
  perf/x86/intel/ds: Factor out PEBS group processing code to functions
  perf/x86/intel/ds: Factor out PEBS record processing code to functions
  perf/x86/intel: Initialize architectural PEBS
  perf/x86/intel: Correct large PEBS flag check
  perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
  perf/x86: Fix NULL event access and potential PEBS record loss
  perf/x86: Remove redundant is_x86_event() prototype
  entry,unwind/deferred: Fix unwind_reset_info() placement
  unwind_user/x86: Fix arch=um build
  perf: Support deferred user unwind
  unwind_user/x86: Teach FP unwind about start of function
  ...
…ux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Scalability and load-balancing improvements:

   - Enable scheduler feature NEXT_BUDDY (Mel Gorman)

   - Reimplement NEXT_BUDDY to align with EEVDF goals (Mel Gorman)

   - Skip sched_balance_running cmpxchg when balance is not due (Tim
     Chen)

   - Implement generic code for architecture specific sched domain NUMA
     distances (Tim Chen)

   - Optimize the NUMA distances of the sched-domains builds of Intel
     Granite Rapids (GNR) and Clearwater Forest (CWF) platforms (Tim
     Chen)

   - Implement proportional newidle balance: a randomized algorithm that
     runs newidle balancing proportional to its success rate. (Peter
     Zijlstra)

  Scheduler infrastructure changes:

   - Implement the 'sched_change' scoped_guard() pattern for the entire
     scheduler (Peter Zijlstra)

   - More broadly utilize the sched_change guard (Peter Zijlstra)

   - Add support to pick functions to take runqueue-flags (Joel
     Fernandes)

   - Provide and use set_need_resched_current() (Peter Zijlstra)

  Fair scheduling enhancements:

   - Forfeit vruntime on yield (Fernand Sieber)

   - Only update stats for allowed CPUs when looking for dst group (Adam
     Li)

  CPU-core scheduling enhancements:

   - Optimize core cookie matching check (Fernand Sieber)

  Deadline scheduler fixes:

   - Only set free_cpus for online runqueues (Doug Berger)

   - Fix dl_server time accounting (Peter Zijlstra)

   - Fix dl_server stop condition (Peter Zijlstra)

  Proxy scheduling fixes:

   - Yield the donor task (Fernand Sieber)

  Fixes and cleanups:

   - Fix do_set_cpus_allowed() locking (Peter Zijlstra)

   - Fix migrate_disable_switch() locking (Peter Zijlstra)

   - Remove double update_rq_clock() in __set_cpus_allowed_ptr_locked()
     (Hao Jia)

   - Increase sched_tick_remote timeout (Phil Auld)

   - sched/deadline: Use cpumask_weight_and() in dl_bw_cpus() (Shrikanth
     Hegde)

   - sched/deadline: Clean up select_task_rq_dl() (Shrikanth Hegde)"

* tag 'sched-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  sched: Provide and use set_need_resched_current()
  sched/fair: Proportional newidle balance
  sched/fair: Small cleanup to update_newidle_cost()
  sched/fair: Small cleanup to sched_balance_newidle()
  sched/fair: Revert max_newidle_lb_cost bump
  sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals
  sched/fair: Enable scheduler feature NEXT_BUDDY
  sched: Increase sched_tick_remote timeout
  sched/fair: Have SD_SERIALIZE affect newidle balancing
  sched/fair: Skip sched_balance_running cmpxchg when balance is not due
  sched/deadline: Minor cleanup in select_task_rq_dl()
  sched/deadline: Use cpumask_weight_and() in dl_bw_cpus
  sched/deadline: Document dl_server
  sched/deadline: Fix dl_server stop condition
  sched/deadline: Fix dl_server time accounting
  sched/core: Remove double update_rq_clock() in __set_cpus_allowed_ptr_locked()
  sched/eevdf: Fix min_vruntime vs avg_vruntime
  sched/core: Add comment explaining force-idle vruntime snapshots
  sched/core: Optimize core cookie matching check
  sched/proxy: Yield the donor task
  ...
…/kernel/git/tip/tip

Pull x86 apic updates from Ingo Molnar:

 - x86/apic: Fix the frequency in apic=verbose log output (Julian
   Stecklina)

 - Simplify mp_irqdomain_alloc() slightly (Christophe JAILLET)

* tag 'x86-apic-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix frequency in apic=verbose log output
  x86/ioapic: Simplify mp_irqdomain_alloc() slightly
…x/kernel/git/tip/tip

Pull x86 math-emu fix from Ingo Molnar:
 "A single fix for an ancient prototype in the math-emu code, by Arnd
  Bergmann"

* tag 'x86-build-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/math-emu: Fix div_Xsig() prototype
…/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:

 - x86/alternatives: Drop unnecessary test after call to
   alt_replace_call() (Juergen Gross)

 - x86/dumpstack: Prevent KASAN false positive warnings in
   __show_regs() (Tengda Wu)

* tag 'x86-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Prevent KASAN false positive warnings in __show_regs()
  x86/alternative: Drop not needed test after call of alt_replace_call()
…x/kernel/git/tip/tip

Pull bug handling infrastructure updates from Ingo Molnar:
 "Core updates:

   - Improve WARN(), which has vararg printf like arguments, to work
     with the x86 #UD based WARN-optimizing infrastructure by hiding the
     format in the bug_table and replacing this first argument with the
     address of the bug-table entry, while making the actual function
     that's called a UD1 instruction (Peter Zijlstra)

   - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
     Molnar, s390 support by Heiko Carstens)

  Fixes and cleanups:

   - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)

   - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
     Zijlstra)"

* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
  x86/bug: Fix BUG_FORMAT vs KASLR
  x86_64/bug: Inline the UD1
  x86/bug: Implement WARN_ONCE()
  x86_64/bug: Implement __WARN_printf()
  x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
  x86/bug: Add BUG_FORMAT basics
  bug: Allow architectures to provide __WARN_printf()
  bug: Implement WARN_ON() using __WARN_FLAGS()
  bug: Add report_bug_entry()
  bug: Add BUG_FORMAT_ARGS infrastructure
  bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
  bug: Add BUG_FORMAT infrastructure
  x86: Rework __bug_table helpers
  bugs/s390: Remove private WARN_ON() implementation
  bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
  bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
  bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
  bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
  bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
  ...
@sunflower2333 sunflower2333 merged commit 01e89ff into sunflower2333:master Dec 2, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.