summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2026-01-22rseq: Allow registering RSEQ with slice extensionPeter Zijlstra
Since glibc cares about the number of syscalls required to initialize a new thread, allow initializing rseq with slice extension on. This avoids having to do another prctl(). Requested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143207.814193010@infradead.org
2026-01-22rseq: Add prctl() to enable time slice extensionsThomas Gleixner
Implement a prctl() so that tasks can enable the time slice extension mechanism. This fails, when time slice extensions are disabled at compile time or on the kernel command line and when no rseq pointer is registered in the kernel. That allows to implement a single trivial check in the exit to user mode hotpath, to decide whether the whole mechanism needs to be invoked. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.858717691@linutronix.de
2026-01-22rseq: Add fields and constants for time slice extensionThomas Gleixner
Aside of a Kconfig knob add the following items: - Two flag bits for the rseq user space ABI, which allow user space to query the availability and enablement without a syscall. - A new member to the user space ABI struct rseq, which is going to be used to communicate request and grant between kernel and user space. - A rseq state struct to hold the kernel state of this - Documentation of the new mechanism Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.669472597@linutronix.de
2026-01-21block: make the new blkzoned UAPI constants discoverableChristoph Hellwig
The Linux 6.19 merge window added the new BLKREPORTZONESV2 ioctl, and with it the new BLK_ZONE_REP_CACHED and BLK_ZONE_COND_ACTIVE constants. The two constants are defined as part of enums, which makes it very painful for userspace to discover if they are present in the installed system headers. Use the #define to the same name trick to make them trivially discoverable using CPP directives. Fixes: 0bf0e2e46668 ("block: track zone conditions") Fixes: b30ffcdc0c15 ("block: introduce BLKREPORTZONESV2 ioctl") Reported-by: Andrey Albershteyn <aalbersh@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-21media: v4l2-ctrls: Add hevc_ext_sps_[ls]t_rps controlsDetlev Casanova
The vdpu381 decoder found on newer Rockchip SoC need the information from the long term and short term ref pic sets from the SPS. So far, it wasn't included in the v4l2 API, so add it with new dynamic sized controls. Each element of the hevc_ext_sps_lt_rps array contains the long term ref pic set at that index. Each element of the hevc_ext_sps_st_rps contains the short term ref pic set at that index, as the raw data. It is the role of the drivers to calculate the reference sets values. Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2026-01-20mm/block/fs: remove laptop_modeJohannes Weiner
Laptop mode was introduced to save battery, by delaying and consolidating writes and thereby maximize the time rotating hard drives wouldn't have to spin. Luckily, rotating hard drives, with their high spin-up times and power draw, are a thing of the past for battery-powered devices. Reclaim has also since changed to not write single filesystem pages anymore, and regular filesystem writeback is lumpy by design. The juice doesn't appear worth the squeeze anymore. The footprint of the feature is small, but nevertheless it's a complicating factor in mm, block, filesystems. Developers don't think about it, and it likely hasn't been tested with new reclaim and writeback changes in years. Let's sunset it. Keep the sysctl with a deprecation warning around for a few more cycles, but remove all functionality behind it. [akpm@linux-foundation.org: fix Documentation/admin-guide/laptops/index.rst] Link: https://lkml.kernel.org/r/20251216185201.GH905277@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'"Jakub Kicinski
This reverts commit 77b9c4a438fc66e2ab004c411056b3fb71a54f2c, reversing changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497: 931420a2fc36 ("selftests/net: Add netkit container tests") ab771c938d9a ("selftests/net: Make NetDrvContEnv support queue leasing") 6be87fbb2776 ("selftests/net: Add env for container based tests") 61d99ce3dfc2 ("selftests/net: Add bpf skb forwarding program") 920da3634194 ("netkit: Add xsk support for af_xdp applications") eef51113f8af ("netkit: Add netkit notifier to check for unregistering devices") b5ef109d22d4 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create") b5c3fa4a0b16 ("netkit: Add single device mode for netkit") 0073d2fd679d ("xsk: Proxy pool management for leased queues") 1ecea95dd3b5 ("xsk: Extend xsk_rcv_check validation") 804bf334d08a ("net: Proxy netdev_queue_get_dma_dev for leased queues") 0caa9a8ddec3 ("net: Proxy net_mp_{open,close}_rxq for leased queues") ff8889ff9107 ("net, ethtool: Disallow leased real rxqs to be resized") 9e2103f36110 ("net: Add lease info to queue-get response") 31127deddef4 ("net: Implement netdev_nl_queue_create_doit") a5546e18f77c ("net: Add queue-create operation") The series will conflict with io_uring work, and the code needs more polish. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20netkit: Add single device mode for netkitDaniel Borkmann
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and support has been implemented in projects like Cilium [0]. For the rxq leasing the plan is to support two main scenarios related to single device mode: * For the use-case of io_uring zero-copy, the control plane can either set up a netkit pair where the peer device can perform rxq leasing which is then tied to the lifetime of the peer device, or the control plane can use a regular netkit pair to connect the hostns to a Pod/container and dynamically add/remove rxq leasing through a single device without having to interrupt the device pair. In the case of io_uring, the memory pool is used as skb non-linear pages, and thus the skb will go its way through the regular stack into netkit. Things like the netkit policy when no BPF is attached or skb scrubbing etc apply as-is in case the paired devices are used, or if the backend memory is tied to the single device and traffic goes through a paired device. * For the use-case of AF_XDP, the control plane needs to use netkit in the single device mode. The single device mode currently enforces only a pass policy when no BPF is attached, and does not yet support BPF link attachments for AF_XDP. skbs sent to that device get dropped at the moment. Given AF_XDP operates at a lower layer of the stack tying this to the netkit pair did not make sense. In future, the plan is to allow BPF at the XDP layer which can: i) process traffic coming from the AF_XDP application (e.g. QEMU with AF_XDP backend) to filter egress traffic or to push selected egress traffic up to the single netkit device to the local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the single netkit into the AF_XDP application (e.g. DHCP replies). Also, the control-plane can dynamically manage rxq leasing for the single netkit device without having to interrupt (e.g. down/up cycle) the main netkit pair for the Pod which has traffic going in and out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jordan Rife <jordan@jrife.io> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0] Link: https://patch.msgid.link/20260115082603.219152-10-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20net: Add queue-create operationDaniel Borkmann
Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice: name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller. A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device. In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx. An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net. For leasing queues, the virtual netdev must have real_num_rx_queue less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is currently only intended for dumping as part of the queue-get operation. Also, it is modeled as an s32 type similarly as done elsewhere in the stack. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20wifi: cfg80211: ignore link disabled flag from userspaceBenjamin Berg
When the AP has an advertised TID to Link Mapping (TTLM) it shall include the element in the association response. As such, when this element is present it needs to be used for the currently dormant links. See Draft P802.11REVmf_D1.0 section 35.3.7.2.3 ("Negotiation of TTLM") for the details. The flag is also not usable in case userspace wants to specify a negotiated TTLM during association. Note that for the link reconfiguration case, mac80211 did not use the information. Draft P802.11REVmf_D1.0 states in section 35.3.6.4 ("Link reconfiguration to the setup links) that we "shall operate with all the TIDs mapped to the newly added links ..." All this means that the flag is not needed. The implementation should parse the information from the association response. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260118093904.754e057896a5.Ifd06f5ef839a93bfd54d0593dc932870f95f3242@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-19net: ethtool: Add support for 80Gbps speedMika Westerberg
USB4 v2 link used in peer-to-peer networking is symmetric 80Gbps so in order to support reading this link speed, add support for it to ethtool. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260115115646.328898-3-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19wifi: nl80211: ignore cluster id after NAN startedMiri Korenblit
After NAN was started, cluster id updates from the user space should not happen, since the device already started a cluster with the previousely provided id. Since NL80211_CMD_CHANGE_NAN_CONFIG requires to set the full NAN configuration, we can't require that NL80211_NAN_CONF_CLUSTER_ID won't be included in this command, and keeping the last confgiured value just to be able to compare it against the new one seems a bit overkill. Therefore, just ignore cluster id in this command and clarify the documentation. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260107142229.fb55e5853269.I10d18c8f69d98b28916596d6da4207c15ea4abb5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-18Merge tag 'landlock-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fixes from Mickaël Salaün: "This fixes TCP handling, tests, documentation, non-audit elided code, and minor cosmetic changes" * tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Clarify documentation for the IOCTL access right selftests/landlock: Properly close a file descriptor landlock: Improve the comment for domain_is_scoped selftests/landlock: Use scoped_base_variants.h for ptrace_test selftests/landlock: Fix missing semicolon selftests/landlock: Fix typo in fs_test landlock: Optimize stack usage when !CONFIG_AUDIT landlock: Fix spelling landlock: Clean up hook_ptrace_access_check() landlock: Improve erratum documentation landlock: Remove useless include landlock: Fix wrong type usage selftests/landlock: NULL-terminate unix pathname addresses selftests/landlock: Remove invalid unix socket bind() selftests/landlock: Add missing connect(minimal AF_UNSPEC) test selftests/landlock: Fix TCP bind(AF_UNSPEC) test case landlock: Fix TCP handling of short AF_UNSPEC addresses landlock: Fix formatting
2026-01-18Merge tag 'ext4_for_linus-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: - Fix an inconsistency in structure size on 32-bit platforms caused by padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls - Fix a buffer leak on the error path when dropping the refcount an xattr value stored in an inode - Fix missing locking on the error path for the file defragmentation ioctl leading to a BUG * tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref ext4: add missing down_write_data_sem in mext_move_extent(). ext4: fix ext4_tune_sb_params padding
2026-01-18ext4: fix ext4_tune_sb_params paddingArnd Bergmann
The padding at the end of struct ext4_tune_sb_params is architecture specific and in particular is different between x86-32 and x86-64, since the __u64 member only enforces struct alignment on the latter. This shows up as a new warning when test-building the headers with -Wpadded: include/linux/ext4.h:144:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] All members inside the structure are naturally aligned, so the only difference here is the amount of padding at the end. Make the padding explicit, to have a consistent sizeof(struct ext4_tune_sb_params) of 232 on all architectures and avoid adding compat ioctl handling for EXT4_IOC_GET_TUNE_SB_PARAM/EXT4_IOC_SET_TUNE_SB_PARAM. This is an ABI break on x86-32 but hopefully this can go into 6.18.y early enough as a fixup so no actual users will be affected. Alternatively, the kernel could handle the ioctl commands for both sizes (232 and 228 bytes) on all architectures. Fixes: 04a91570ac67 ("ext4: implemet new ioctls to set and get superblock parameters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20251204101914.1037148-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-01-18iommufd: Introduce data struct for AMD nested domain allocationSuravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table, which is used for stage-1 in nested translation. The data structure contains information necessary for setting up the AMD HW-vIOMMU support. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-18iommu/amd: Add support for hw_info for iommu capability querySuravee Suthikulpanit
AMD IOMMU Extended Feature (EFR) and Extended Feature 2 (EFR2) registers specify features supported by each IOMMU hardware instance. The IOMMU driver checks each feature-specific bits before enabling each feature at run time. For IOMMUFD, the hypervisor passes the raw value of amd_iommu_efr and amd_iommu_efr2 to VMM via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-17netfilter: uapi: Use UAPI definition of INT_MAX and INT_MINThomas Weißschuh
Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a dependency on a libc, which UAPI headers should not do. Use the equivalent UAPI constants. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-3-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17ethtool: uapi: Use UAPI definition of INT_MAXThomas Weißschuh
Using <limits.h> to gain access to INT_MAX introduces a dependency on a libc, which UAPI headers should not do. Use the equivalent UAPI constant. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-2-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17uapi: add INT_MAX and INT_MIN constantsThomas Weißschuh
Some UAPI headers use INT_MAX and INT_MIN. Currently they include <limits.h> for their definitions, which introduces a problematic dependency on libc. Add custom, namespaced definitions of INT_MAX and INT_MIN using the same values as the regular kernel code. These definitions are not added to uapi/linux/limits.h, as that header will conflict with libc definitions on some platforms. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-1-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17compiler_types.h: Attributes: Add __counted_by_ptr macroBill Wendling
Introduce __counted_by_ptr(), which works like __counted_by(), but for pointer struct members. struct foo { int a, b, c; char *buffer __counted_by_ptr(bytes); short nr_bars; struct bar *bars __counted_by_ptr(nr_bars); size_t bytes; }; Because "counted_by" can only be applied to pointer members in very recent compiler versions, its application ends up needing to be distinct from flexibe array "counted_by" annotations, hence a separate macro. This is a reworking of Kees' previous patch [1]. Link: https://lore.kernel.org/all/20251020220118.1226740-1-kees@kernel.org/ [1] Co-developed-by: Kees Cook <kees@kernel.org> Signed-off-by: Bill Wendling <morbo@google.com> Link: https://patch.msgid.link/20260116005838.2419118-1-morbo@google.com Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-16virt: vbox: uapi: Mark inner unions in packed structs as packedThomas Weißschuh
The unpacked unions within a packed struct generates alignment warnings on clang for 32-bit ARM: ./usr/include/linux/vbox_vmmdev_types.h:239:4: error: field u within 'struct vmmdev_hgcm_function_parameter32' is less aligned than 'union (unnamed union at ./usr/include/linux/vbox_vmmdev_types.h:223:2)' and is usually due to 'struct vmmdev_hgcm_function_parameter32' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] 239 | } u; | ^ ./usr/include/linux/vbox_vmmdev_types.h:254:6: error: field u within 'struct vmmdev_hgcm_function_parameter64::(anonymous union)::(unnamed at ./usr/include/linux/vbox_vmmdev_types.h:249:3)' is less aligned than 'union (unnamed union at ./usr/include/linux/vbox_vmmdev_types.h:251:4)' and is usually due to 'struct vmmdev_hgcm_function_parameter64::(anonymous union)::(unnamed at ./usr/include/linux/vbox_vmmdev_types.h:249:3)' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] With the recent changes to compile-test the UAPI headers in more cases, these warning in combination with CONFIG_WERROR breaks the build. Fix the warnings. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512140314.DzDxpIVn-lkp@intel.com/ Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/linux-kbuild/20260110-uapi-test-disable-headers-arm-clang-unaligned-access-v1-1-b7b0fa541daa@kernel.org/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-kbuild/29b2e736-d462-45b7-a0a9-85f8d8a3de56@app.fastmail.com/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Tested-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nicolas Schier <nsc@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260115-kbuild-alignment-vbox-v1-2-076aed1623ff@linutronix.de Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-01-16hyper-v: Mark inner union in hv_kvp_exchg_msg_value as packedThomas Weißschuh
The unpacked union within a packed struct generates alignment warnings on clang for 32-bit ARM: ./usr/include/linux/hyperv.h:361:2: error: field within 'struct hv_kvp_exchg_msg_value' is less aligned than 'union hv_kvp_exchg_msg_value::(anonymous at ./usr/include/linux/hyperv.h:361:2)' and is usually due to 'struct hv_kvp_exchg_msg_value' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] 361 | union { | ^ With the recent changes to compile-test the UAPI headers in more cases, this warning in combination with CONFIG_WERROR breaks the build. Fix the warning. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512140314.DzDxpIVn-lkp@intel.com/ Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/linux-kbuild/20260110-uapi-test-disable-headers-arm-clang-unaligned-access-v1-1-b7b0fa541daa@kernel.org/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-kbuild/29b2e736-d462-45b7-a0a9-85f8d8a3de56@app.fastmail.com/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Wei Liu (Microsoft) <wei.liu@kernel.org> Tested-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nicolas Schier <nsc@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260115-kbuild-alignment-vbox-v1-1-076aed1623ff@linutronix.de Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-01-16Merge tag 'pm-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an error path memory leak in the energy model management code, fix a kerneldoc comment in it, and fix and revamp the energy model YNL specification added recently along with the new energy model management netlink interface (that received feedback after being added): - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout) - Fix stale description of the cost field in struct em_perf_state to reflect the current code (Yaxiong Tian) - Fix and revamp the energy model YNL specification added recently along with the energy model netlink interface (Changwoo Min)" * tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: EM: Add dump to get-perf-domains in the EM YNL spec PM: EM: Change cpus' type from string to u64 array in the EM YNL spec PM: EM: Rename em.yaml to dev-energymodel.yaml PM: EM: Fix yamllint warnings in the EM YNL spec PM: EM: Fix memory leak in em_create_pd() error path PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16mount: add OPEN_TREE_NAMESPACEChristian Brauner
When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts. On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful. This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly. With a large mount table and a system where thousands or ten-thousands of containers are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore. Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs. The caller can setns() into that mount namespace and perform any additionally required setup such as move_mount() detached mounts in there. This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root(). A caller may for example choose to create an extremely minimal rootfs: fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts. This also works with user namespaces: unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs. Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16comedi: Fix getting range information for subdevices 16 to 255Ian Abbott
The `COMEDI_RANGEINFO` ioctl does not work properly for subdevice indices above 15. Currently, the only in-tree COMEDI drivers that support more than 16 subdevices are the "8255" driver and the "comedi_bond" driver. Making the ioctl work for subdevice indices up to 255 is achievable. It needs minor changes to the handling of the `COMEDI_RANGEINFO` and `COMEDI_CHANINFO` ioctls that should be mostly harmless to user-space, apart from making them less broken. Details follow... The `COMEDI_RANGEINFO` ioctl command gets the list of supported ranges (usually with units of volts or milliamps) for a COMEDI subdevice or channel. (Only some subdevices have per-channel range tables, indicated by the `SDF_RANGETYPE` flag in the subdevice information.) It uses a `range_type` value and a user-space pointer, both supplied by user-space, but the `range_type` value should match what was obtained using the `COMEDI_CHANINFO` ioctl (if the subdevice has per-channel range tables) or `COMEDI_SUBDINFO` ioctl (if the subdevice uses a single range table for all channels). Bits 15 to 0 of the `range_type` value contain the length of the range table, which is the only part that user-space should care about (so it can use a suitably sized buffer to fetch the range table). Bits 23 to 16 store the channel index, which is assumed to be no more than 255 if the subdevice has per-channel range tables, and is set to 0 if the subdevice has a single range table. For `range_type` values produced by the `COMEDI_SUBDINFO` ioctl, bits 31 to 24 contain the subdevice index, which is assumed to be no more than 255. But for `range_type` values produced by the `COMEDI_CHANINFO` ioctl, bits 27 to 24 contain the subdevice index, which is assumed to be no more than 15, and bits 31 to 28 contain the COMEDI device's minor device number for some unknown reason lost in the mists of time. The `COMEDI_RANGEINFO` ioctl extract the length from bits 15 to 0 of the user-supplied `range_type` value, extracts the channel index from bits 23 to 16 (only used if the subdevice has per-channel range tables), extracts the subdevice index from bits 27 to 24, and ignores bits 31 to 28. So for subdevice indices 16 to 255, the `COMEDI_SUBDINFO` or `COMEDI_CHANINFO` ioctl will report a `range_type` value that doesn't work with the `COMEDI_RANGEINFO` ioctl. It will either get the range table for the subdevice index modulo 16, or will fail with `-EINVAL`. To fix this, always use bits 31 to 24 of the `range_type` value to hold the subdevice index (assumed to be no more than 255). This affects the `COMEDI_CHANINFO` and `COMEDI_RANGEINFO` ioctls. There should not be anything in user-space that depends on the old, broken usage, although it may now see different values in bits 31 to 28 of the `range_type` values reported by the `COMEDI_CHANINFO` ioctl for subdevices that have per-channel subdevices. User-space should not be trying to decode bits 31 to 16 of the `range_type` values anyway. Fixes: ed9eccbe8970 ("Staging: add comedi core") Cc: stable@vger.kernel.org #5.17+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://patch.msgid.link/20251203162438.176841-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-16Merge branch 'pm-em'Rafael J. Wysocki
Merge fixes related to the energy model management for 6.19-rc6: - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout) - Fix stale description of the cost field in struct em_perf_state to reflect the current code (Yaxiong Tian) - Fix and revamp the energy model YNL specification added recently along with the energy model netlink interface (Changwoo Min) * pm-em: PM: EM: Add dump to get-perf-domains in the EM YNL spec PM: EM: Change cpus' type from string to u64 array in the EM YNL spec PM: EM: Rename em.yaml to dev-energymodel.yaml PM: EM: Fix yamllint warnings in the EM YNL spec PM: EM: Fix memory leak in em_create_pd() error path PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16media: v4l: ctrls: add a control for enabling strobe outputRichard Leitner
Add a control V4L2_CID_FLASH_STROBE_OE to en- or disable the strobe output of v4l2 devices (most likely sensors). Signed-off-by: Richard Leitner <richard.leitner@linux.dev> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2026-01-16media: v4l: ctrls: add a control for flash/strobe durationRichard Leitner
Add a V4L2_CID_FLASH_DURATION control to set the duration of a flash/strobe pulse. This controls the length of the flash/strobe pulse output by device (typically a camera sensor) and connected to the flash controller. This is different to the V4L2_CID_FLASH_TIMEOUT control, which is implemented by the flash controller and defines a limit after which the flash is "forcefully" turned off again. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Richard Leitner <richard.leitner@linux.dev> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2026-01-15ethtool: Clarify len/n_stats fields in/out semanticsGal Pressman
Document that the 'len' field in ethtool_gstrings and 'n_stats' field in ethtool_stats optionally serve dual purposes: on entry they specify the number of items requested, and on return they indicate the number actually returned (which is not necessarily the same). Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20260115060544.481550-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.19-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15perf/x86/intel: Add support for PEBS memory auxiliary info field in DMRDapeng Mi
With the introduction of the OMR feature, the PEBS memory auxiliary info field for load and store latency events has been restructured for DMR. The memory auxiliary info field's bit[8] indicates whether a L2 cache miss occurred for a memory load or store instruction. If bit[8] is 0, it signifies no L2 cache miss, and bits[7:0] specify the exact cache data source (up to the L2 cache level). If bit[8] is 1, bits[7:0] represent the OMR encoding, indicating the specific L3 cache or memory region involved in the memory access. A significant enhancement is OMR encoding provides up to 8 fine-grained memory regions besides the cache region. A significant enhancement for OMR encoding is the ability to provide up to 8 fine-grained memory regions in addition to the cache region, offering more detailed insights into memory access regions. For detailed information on the memory auxiliary info encoding, please refer to section 16.2 "PEBS LOAD LATENCY AND STORE LATENCY FACILITY" in the ISE documentation. This patch ensures that the PEBS memory auxiliary info field is correctly interpreted and utilized in DMR. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260114011750.350569-3-dapeng1.mi@linux.intel.com
2026-01-15Merge tag 'amd-drm-next-6.20-2026-01-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.20-2026-01-09: amdgpu: - GPUVM updates - Initial support for larger GPU address spaces - Initial SMUIO 15.x support - Documentation updates - Initial PSP 15.x support - Initial IH 7.1 support - Initial IH 6.1.1 support - SMU 13.0.12 updates - RAS updates - Initial MMHUB 3.4 support - Initial MMHUB 4.2 support - Initial GC 12.1 support - Initial GC 11.5.4 support - HDMI fixes - Panel replay improvements - DML updates - DC FP fixes - Initial SDMA 6.1.4 support - Initial SDMA 7.1 support - Userq updates - DC HPD refactor - SwSMU cleanups and refactoring - TTM memory ops parallelization - DCN 3.5 fixes - DP audio fixes - Clang fixes - Misc spelling fixes and cleanups - Initial SDMA 7.11.4 support - Convert legacy DRM logging helpers to new drm logging helpers - Initial JPEG 5.3 support - Add support for changing UMA size via the driver - DC analog fixes - GC 9 gfx queue reset support - Initial SMU 15.x support amdkfd: - Reserved SDMA rework - Refactor SPM - Initial GC 12.1 support - Initial GC 11.5.4 support - Initial SDMA 7.1 support - Initial SDMA 6.1.4 support - Increase the kfd process hash table - Per context support - Topology fixes radeon: - Convert legacy DRM logging helpers to new drm logging helpers - Use devm for i2c adapters - Variable sized array fix - Misc cleanups UAPI: - KFD context support. Proposed userspace: https://github.com/ROCm/rocm-systems/pull/1705 https://github.com/ROCm/rocm-systems/pull/1701 - Add userq metadata queries for more queue types. Proposed userspace: https://gitlab.freedesktop.org/yogeshmohan/mesa/-/commits/userq_query From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260109154713.3242957-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. No conflicts. Adjacent: Auto-merging MAINTAINERS Auto-merging Makefile Auto-merging kernel/bpf/verifier.c Auto-merging kernel/sched/ext.c Auto-merging mm/memcontrol.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-14crypto: af_alg - Annotate struct af_alg_iv with __counted_byThorsten Blum
Add the __counted_by() compiler attribute to the flexible array member 'iv' to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20260105122402.2685-2-thorsten.blum@linux.dev Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-14Merge tag 'media/v6.19-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - ov02c10: some fixes related to preserving bayer pattern and horizontal control - ipu-bridge: Add quirks for some Dell XPS laptops with inverted sensors - mali-c55: Fix version identifier logic - rzg2l-cru: csi-2: fix RZ/V2H input sizes on some variants * tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: ov02c10: Remove unnecessary hflip and vflip pointers media: ipu-bridge: Add DMI quirk for Dell XPS laptops with upside down sensors media: ov02c10: Fix the horizontal flip control media: ov02c10: Adjust x-win/y-win when changing flipping to preserve bayer-pattern media: ov02c10: Fix bayer-pattern change after default vflip change media: rzg2l-cru: csi-2: Support RZ/V2H input sizes media: uapi: mali-c55-config: Remove version identifier media: mali-c55: Remove duplicated version check media: Documentation: mali-c55: Use v4l2-isp version identifier
2026-01-14wifi: nl80211: Add support for EPP peer indicationSai Pratyusha Magam
Introduce a new netlink attribute NL80211_ATTR_EPP_PEER to be used with NL80211_CMD_NEW_STA and NL80211_CMD_ADD_LINK_STA for the userspace to indicate that a non-AP STA is an Enhanced Privacy Protection (EPP) peer. Co-developed-by: Rohan Dutta <quic_drohan@quicinc.com> Signed-off-by: Rohan Dutta <quic_drohan@quicinc.com> Signed-off-by: Sai Pratyusha Magam <sai.magam@oss.qualcomm.com> Signed-off-by: Kavita Kavita <kavita.kavita@oss.qualcomm.com> Link: https://patch.msgid.link/20260114111900.2196941-5-kavita.kavita@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-14wifi: cfg80211: add feature flag for (re)association frame encryptionAiny Kumari
Introduce an extended feature flag that allows drivers to signal support for encryption of (Re)Association Request and Response frames in both non-AP STA and AP mode, as specified in specification "IEEE P802.11bi/D3.0, 12.16.6". Signed-off-by: Ainy Kumari <ainy.kumari@oss.qualcomm.com> Signed-off-by: Kavita Kavita <kavita.kavita@oss.qualcomm.com> Link: https://patch.msgid.link/20260114111900.2196941-3-kavita.kavita@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-14wifi: cfg80211: add support for EPPKE Authentication ProtocolAiny Kumari
Add an extended feature flag NL80211_EXT_FEATURE_EPPKE to allow a driver to indicate support for the Enhanced Privacy Protection Key Exchange (EPPKE) authentication protocol in non-AP STA mode, as defined in "IEEE P802.11bi/D3.0, 12.16.9". In case of SME in userspace, the Authentication frame body is prepared in userspace while the driver finalizes the Authentication frame once it receives the required fields and elements. The driver indicates support for EPPKE using the extended feature flag so that userspace can initiate EPPKE authentication. When the feature flag is set, process EPPKE Authentication frames from userspace in non-AP STA mode. If the flag is not set, reject EPPKE Authentication frames. Define a new authentication type NL80211_AUTHTYPE_EPPKE for EPPKE. Signed-off-by: Ainy Kumari <ainy.kumari@oss.qualcomm.com> Co-developed-by: Kavita Kavita <kavita.kavita@oss.qualcomm.com> Signed-off-by: Kavita Kavita <kavita.kavita@oss.qualcomm.com> Link: https://patch.msgid.link/20260114111900.2196941-2-kavita.kavita@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-13net/sched: sch_cake: share shaper state across sub-instances of cake_mqJonas Köppeler
This commit adds shared shaper state across the cake instances beneath a cake_mq qdisc. It works by periodically tracking the number of active instances, and scaling the configured rate by the number of active queues. The scan is lockless and simply reads the qlen and the last_active state variable of each of the instances configured beneath the parent cake_mq instance. Locking is not required since the values are only updated by the owning instance, and eventual consistency is sufficient for the purpose of estimating the number of active queues. The interval for scanning the number of active queues is set to 200 us. We found this to be a good tradeoff between overhead and response time. For a detailed analysis of this aspect see the Netdevconf talk: https://netdevconf.info/0x19/docs/netdev-0x19-paper16-talk-paper.pdf Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-5-8d613fece5d8@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-14media: uapi: videodev2: Add support for AV1 stateful decoderDeepa Guthyappa Madivalara
Introduce a new pixel format, V4L2_PIX_FMT_AV1, to the Video4Linux2(V4L2) API. This format is intended for AV1 bitstreams in stateful decoding/encoding workflows. The fourcc code 'AV10' is used to distinguish this format from the existing V4L2_PIX_FMT_AV1_FRAME, which is used for stateless AV1 decoder implementation. Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Reviewed-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Deepa Guthyappa Madivalara <deepa.madivalara@oss.qualcomm.com> Tested-by: Val Packett <val@packett.cool> Signed-off-by: Bryan O'Donoghue <bod@kernel.org> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2026-01-12Merge tag 'wireless-next-2026-01-12' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== First set of changes for the current -next cycle, of note: - ath12k gets an overhaul to support multi-wiphy device wiphy and pave the way for future device support in the same driver (rather than splitting to ath13k) - mac80211 gets some better iteration macros * tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits) wifi: mac80211: remove width argument from ieee80211_parse_bitrates wifi: mac80211_hwsim: remove NAN by default wifi: mac80211: improve station iteration ergonomics wifi: mac80211: improve interface iteration ergonomics wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel wifi: mac80211: unexport ieee80211_get_bssid() wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate() wifi: mac80211: don't send an unused argument to ieee80211_check_combinations wifi: libertas: fix WARNING in usb_tx_block wifi: mwifiex: Allocate dev name earlier for interface workqueue name wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM wifi: cfg80211: Fix use_for flag update on BSS refresh wifi: brcmfmac: rename function that frees vif wifi: brcmfmac: fix/add kernel-doc comments wifi: mac80211: Update csa_finalize to use link_id wifi: cfg80211: add cfg80211_stop_link() for per-link teardown wifi: ath12k: Skip DP peer creation for scan vdev wifi: ath12k: move firmware stats request outside of atomic context wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf() ... ==================== Link: https://patch.msgid.link/20260112185836.378736-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12ublk: add UBLK_CMD_TRY_STOP_DEV commandYoav Cohen
Add a best-effort stop command, UBLK_CMD_TRY_STOP_DEV, which only stops a ublk device when it has no active openers. Unlike UBLK_CMD_STOP_DEV, this command does not disrupt existing users. New opens are blocked only after disk_openers has reached zero; if the device is busy, the command returns -EBUSY and leaves it running. The ub->block_open flag is used only to close a race with an in-progress open and does not otherwise change open behavior. Advertise support via the UBLK_F_SAFE_STOP_DEV feature flag. Signed-off-by: Yoav Cohen <yoav@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channelLachlan Hodges
When sending a channel ensure we include the IEEE80211_CHAN_S1G_NO_PRIMARY flag. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20260109081439.3168-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-12init: remove /proc/sys/kernel/real-root-devAskar Safin
It is not used anymore. Signed-off-by: Askar Safin <safinaskar@gmail.com> Link: https://patch.msgid.link/20251119222407.3333257-4-safinaskar@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12ublk: implement integrity user copyStanley Zhang
Add a function ublk_copy_user_integrity() to copy integrity information between a request and a user iov_iter. This mirrors the existing ublk_copy_user_pages() but operates on request integrity data instead of regular data. Check UBLKSRV_IO_INTEGRITY_FLAG in iocb->ki_pos in ublk_user_copy() to choose between copying data or integrity data. [csander: change offset units from data bytes to integrity data bytes, fix CONFIG_BLK_DEV_INTEGRITY=n build, rebase on user copy refactor] Signed-off-by: Stanley Zhang <stazhang@purestorage.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: set UBLK_IO_F_INTEGRITY in ublksrv_io_descCaleb Sander Mateos
Indicate to the ublk server when an incoming request has integrity data by setting UBLK_IO_F_INTEGRITY in the ublksrv_io_desc's op_flags field. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: support UBLK_PARAM_TYPE_INTEGRITY in device creationStanley Zhang
Add a feature flag UBLK_F_INTEGRITY for a ublk server to request integrity/metadata support when creating a ublk device. The ublk server can also check for the feature flag on the created device or the result of UBLK_U_CMD_GET_FEATURES to tell if the ublk driver supports it. UBLK_F_INTEGRITY requires UBLK_F_USER_COPY, as user copy is the only data copy mode initially supported for integrity data. Add UBLK_PARAM_TYPE_INTEGRITY and struct ublk_param_integrity to struct ublk_params to specify the integrity params of a ublk device. UBLK_PARAM_TYPE_INTEGRITY requires UBLK_F_INTEGRITY and a nonzero metadata_size. The LBMD_PI_CAP_* and LBMD_PI_CSUM_* values from the linux/fs.h UAPI header are used for the flags and csum_type fields. If the UBLK_PARAM_TYPE_INTEGRITY flag is set, validate the integrity parameters and apply them to the blk_integrity limits. The struct ublk_param_integrity validations are based on the checks in blk_validate_integrity_limits(). Any invalid parameters should be rejected before being applied to struct blk_integrity. [csander: drop redundant pi_tuple_size field, use block metadata UAPI constants, add param validation] Signed-off-by: Stanley Zhang <stazhang@purestorage.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12landlock: Clarify documentation for the IOCTL access rightGünther Noack
Move the description of the LANDLOCK_ACCESS_FS_IOCTL_DEV access right together with the file access rights. This group of access rights applies to files (in this case device files), and they can be added to file or directory inodes using landlock_add_rule(2). The check for that works the same for all file access rights, including LANDLOCK_ACCESS_FS_IOCTL_DEV. Invoking ioctl(2) on directory FDs can not currently be restricted with Landlock. Having it grouped separately in the documentation is a remnant from earlier revisions of the LANDLOCK_ACCESS_FS_IOCTL_DEV patch set. Link: https://lore.kernel.org/all/20260108.Thaex5ruach2@digikod.net/ Signed-off-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20260111175203.6545-2-gnoack3000@gmail.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-12fs: add immutable rootfsChristian Brauner
Currently pivot_root() doesn't work on the real rootfs because it cannot be unmounted. Userspace has to do a recursive removal of the initramfs contents manually before continuing the boot. Really all we want from the real rootfs is to serve as the parent mount for anything that is actually useful such as the tmpfs or ramfs for initramfs unpacking or the rootfs itself. There's no need for the real rootfs to actually be anything meaningful or useful. Add a immutable rootfs called "nullfs" that can be selected via the "nullfs_rootfs" kernel command line option. The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs and fire up userspace which mounts the rootfs and can then just do: chdir(rootfs); pivot_root(".", "."); umount2(".", MNT_DETACH); and be done with it. (Ofc, userspace can also choose to retain the initramfs contents by using something like pivot_root(".", "/initramfs") without unmounting it.) Technically this also means that the rootfs mount in unprivileged namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed that the immutable rootfs remains permanently empty so there cannot be anything revealed by unmounting the covering mount. In the future this will also allow us to create completely empty mount namespaces without risking to leak anything. systemd already handles this all correctly as it tries to pivot_root() first and falls back to MS_MOVE only when that fails. This goes back to various discussion in previous years and a LPC 2024 presentation about this very topic. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>