| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix netfs_limit_iter() hitting BUG() when an ITER_KVEC iterator
reaches it via core dump writes to 9P filesystems. Add ITER_KVEC
handling following the same pattern as the existing ITER_BVEC code.
- Fix a NULL pointer dereference in the netfs unbuffered write retry
path when the filesystem (e.g., 9P) doesn't set the prepare_write
operation.
- Clear I_DIRTY_TIME in sync_lazytime for filesystems implementing
->sync_lazytime. Without this the flag stays set and may cause
additional unnecessary calls during inode deactivation.
- Increase tmpfs size in mount_setattr selftests. A recent commit
bumped the ext4 image size to 2 GB but didn't adjust the tmpfs
backing store, so mkfs.ext4 fails with ENOSPC writing metadata.
- Fix an invalid folio access in iomap when i_blkbits matches the folio
size but differs from the I/O granularity. The cur_folio pointer
would not get invalidated and iomap_read_end() would still be called
on it despite the IO helper owning it.
- Fix hash_name() docstring.
- Fix read abandonment during netfs retry where the subreq variable
used for abandonment could be uninitialized on the first pass or
point to a deleted subrequest on later passes.
- Don't block sync for filesystems with no data integrity guarantees.
Add a SB_I_NO_DATA_INTEGRITY superblock flag replacing the per-inode
AS_NO_DATA_INTEGRITY mapping flag so sync kicks off writeback but
doesn't wait for flusher threads. This fixes a suspend-to-RAM hang on
fuse-overlayfs where the flusher thread blocks when the fuse daemon
is frozen.
- Fix a lockdep splat in iomap when reads fail. iomap_read_end_io()
invokes fserror_report() which calls igrab() taking i_lock in hardirq
context while i_lock is normally held with interrupts enabled. Kick
failed read handling to a workqueue.
- Remove the redundant netfs_io_stream::front member and use
stream->subrequests.next instead, fixing a potential issue in the
direct write code path.
* tag 'vfs-7.0-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: Fix the handling of stream->front by removing it
iomap: fix lockdep complaint when reads fail
writeback: don't block sync for filesystems with no data integrity guarantees
netfs: Fix read abandonment during retry
vfs: fix docstring of hash_name()
iomap: fix invalid folio access when i_blkbits differs from I/O granularity
selftests/mount_setattr: increase tmpfs size for idmapped mount tests
fs: clear I_DIRTY_TIME in sync_lazytime
netfs: Fix NULL pointer dereference in netfs_unbuffered_write() on retry
netfs: Fix kernel BUG in netfs_limit_iter() for ITER_KVEC iterators
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex fixes from Ingo Molnar:
- Tighten up the sys_futex_requeue() ABI a bit, to disallow dissimilar
futex flags and potential UaF access (Peter Zijlstra)
- Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
(Hao-Yu Yang)
- Clear stale exiting pointer in futex_lock_pi() retry path, which
triggered a warning (and potential misbehavior) in stress-testing
(Davidlohr Bueso)
* tag 'locking-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Clear stale exiting pointer in futex_lock_pi() retry path
futex: Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
futex: Require sys_futex_requeue() to have identical flags
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes. There's one that stands out in size as it fixes an
edge case in fsync.
- fix issue on fsync where file with zero size appears as a non-zero
after log replay
- in zlib compression, handle a crash when data alignment causes
folio reference issues
- fix possible crash with enabled tracepoints on a overlayfs mount
- handle device stats update error
- on zoned filesystems, fix kobject leak on sub-block groups
- fix super block offset in an error message in validation"
* tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix lost error when running device stats on multiple devices fs
btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()
btrfs: zlib: handle page aligned compressed size correctly
btrfs: fix leak of kobject name for sub-group space_info
btrfs: fix zero size inode with non-zero size after log replay
btrfs: fix super block offset in error message in btrfs_validate_super()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes. 8 are cc:stable. 9 are for MM.
There's a 3-patch series of DAMON fixes from Josh Law and SeongJae
Park. The rest are singletons - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2026-03-28-10-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/mseal: update VMA end correctly on merge
bug: avoid format attribute warning for clang as well
mm/pagewalk: fix race between concurrent split and refault
mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
mm/damon/sysfs: check contexts->nr in repeat_call_fn
mm/damon/sysfs: check contexts->nr before accessing contexts_arr[0]
mm/damon/sysfs: fix param_ctx leak on damon_sysfs_new_test_ctx() failure
mm/swap: fix swap cache memcg accounting
MAINTAINERS, mailmap: update email address for Harry Yoo
mm/huge_memory: fix folio isn't locked in softleaf_to_folio()
|
|
On arm64 server, we found folio that get from migration entry isn't locked
in softleaf_to_folio(). This issue triggers when mTHP splitting and
zap_nonpresent_ptes() races, and the root cause is lack of memory barrier
in softleaf_to_folio(). The race is as follows:
CPU0 CPU1
deferred_split_scan() zap_nonpresent_ptes()
lock folio
split_folio()
unmap_folio()
change ptes to migration entries
__split_folio_to_order() softleaf_to_folio()
set flags(including PG_locked) for tail pages folio = pfn_folio(softleaf_to_pfn(entry))
smp_wmb() VM_WARN_ON_ONCE(!folio_test_locked(folio))
prep_compound_page() for tail pages
In __split_folio_to_order(), smp_wmb() guarantees page flags of tail pages
are visible before the tail page becomes non-compound. smp_wmb() should
be paired with smp_rmb() in softleaf_to_folio(), which is missed. As a
result, if zap_nonpresent_ptes() accesses migration entry that stores tail
pfn, softleaf_to_folio() may see the updated compound_head of tail page
before page->flags.
This issue will trigger VM_WARN_ON_ONCE() in pfn_swap_entry_folio()
because of the race between folio split and zap_nonpresent_ptes()
leading to a folio incorrectly undergoing modification without a folio
lock being held.
This is a BUG_ON() before commit 93976a20345b ("mm: eliminate further
swapops predicates"), which in merged in v6.19-rc1.
To fix it, add missing smp_rmb() if the softleaf entry is migration entry
in softleaf_to_folio() and softleaf_to_page().
[tujinjiang@huawei.com: update function name and comments]
Link: https://lkml.kernel.org/r/20260321075214.3305564-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20260319012541.4158561-1-tujinjiang@huawei.com
Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There are two core fixes here. One is from Johan dealing with an issue
introduced by a devm_ API usage update causing things to be freed
earlier than they had earlier when we fail to register a device,
another from Danilo avoids unlocked acccess to data by converting to
use a driver core API.
We also have a few relatively minor driver specific fixes"
* tag 'spi-fix-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-fsl-lpspi: fix teardown order issue (UAF)
spi: fix use-after-free on managed registration failure
spi: use generic driver_override infrastructure
spi: meson-spicc: Fix double-put in remove path
spi: sn-f-ospi: Use devm_mutex_init() to simplify code
spi: sn-f-ospi: Fix resource leak in f_ospi_probe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became slightly big partly due to my time off in the last week.
But all changes are about device-specific fixes, so it should be
safely applicable.
ASoC:
- Fix double free in sma1307
- Fix uninitialized variables in simple-card-utils/imx-card
- Address clock leaks and error propagation in ADAU1372
- Add DMI quirks and ACP/SDW support for ASUS
- Fix Intel CATPT DMA mask
- Fix SOF topology parsing
- Fix DT bindings for RK3576 SPDIF, STM32 SAI and WCD934x
HD-audio:
- Quirks for Lenovo, ASUS, and various HP models, as well as
a speaker pop fix on Star Labs StarFighter
- Revert MSI X870E Tomahawk denylist again
USB-Audio:
- Fix distorted audio on Focusrite Scarlett 2i2/2i4 1st Gen
- Add iface reset quirk for AB17X
- Update Qualcomm USB audio Kconfig dependencies and license
Misc:
- Fix minor compile warnings for firewire and asihpi drivers"
* tag 'sound-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
Revert "ALSA: hda/intel: Add MSI X870E Tomahawk to denylist"
ALSA: usb-audio: Add iface reset and delay quirk for AB17X USB Audio
ALSA: hda/realtek: add HP Laptop 15-fd0xxx mute LED quirk
ALSA: usb-audio: Exclude Scarlett 2i4 1st Gen from SKIP_IFACE_SETUP
ALSA: hda/realtek: Add mute LED quirk for HP Pavilion 15-eg0xxx
ALSA: hda/realtek - Fixed Speaker Mute LED for HP EliteBoard G1a platform
ASoC: SOF: ipc4-topology: Allow bytes controls without initial payload
ASoC: adau1372: Fix clock leak on PLL lock failure
ASoC: adau1372: Fix unchecked clk_prepare_enable() return value
ASoC: SDCA: fix finding wrong entity
ASoC: SDCA: remove the max count of initialization table
ASoC: codecs: wcd934x: fix typo in dt parsing
ASoC: dt-bindings: stm32: Fix incorrect compatible string in stm32h7-sai match
ASoC: Intel: catpt: Fix the device initialization
ASoC: amd: acp: add ASUS HN7306EA quirk for legacy SDW machine
ASoC: SOF: topology: reject invalid vendor array size in token parser
ASoC: tas2781: Add null check for calibration data
ALSA: asihpi: avoid write overflow check warning
ASoC: fsl: imx-card: initialize playback_only and capture_only
ASoC: simple-card-utils: Check value of is_playback_only and is_capture_only
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, CAN, IPsec and Netfilter.
Notably, this includes the fix for the Bluetooth regression that you
were notified about. I'm not aware of any other pending regressions.
Current release - regressions:
- bluetooth:
- fix stack-out-of-bounds read in l2cap_ecred_conn_req
- fix regressions caused by reusing ident
- netfilter: revisit array resize logic
- eth: ice: set max queues in alloc_etherdev_mqs()
Previous releases - regressions:
- core: correctly handle tunneled traffic on IPV6_CSUM GSO fallback
- bluetooth:
- fix dangling pointer on mgmt_add_adv_patterns_monitor_complete
- fix deadlock in l2cap_conn_del()
- sched: codel: fix stale state for empty flows in fq_codel
- ipv6: remove permanent routes from tb6_gc_hlist when all exceptions expire.
- xfrm: fix skb_put() panic on non-linear skb during reassembly
- openvswitch:
- avoid releasing netdev before teardown completes
- validate MPLS set/set_masked payload length
- eth: iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()
Previous releases - always broken:
- bluetooth: fix null-ptr-deref on l2cap_sock_ready_cb
- udp: fix wildcard bind conflict check when using hash2
- netfilter: fix use of uninitialized rtp_addr in process_sdp
- tls: Purge async_hold in tls_decrypt_async_wait()
- xfrm:
- prevent policy_hthresh.work from racing with netns teardown
- fix skb leak with espintcp and async crypto
- smc: fix double-free of smc_spd_priv when tee() duplicates splice pipe buffer
- can:
- add missing error handling to call can_ctrlmode_changelink()
- fix OOB heap access in cgw_csum_crc8_rel()
- eth:
- mana: fix use-after-free in add_adev() error path
- virtio-net: fix for VIRTIO_NET_F_GUEST_HDRLEN
- bcmasp: fix double free of WoL irq"
* tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (90 commits)
net: macb: use the current queue number for stats
netfilter: ctnetlink: use netlink policy range checks
netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
netfilter: nf_conntrack_expect: skip expectations in other netns via proc
netfilter: nf_conntrack_expect: store netns and zone in expectation
netfilter: ctnetlink: ensure safe access to master conntrack
netfilter: nf_conntrack_expect: use expect->helper
netfilter: nf_conntrack_expect: honor expectation helper field
netfilter: nft_set_rbtree: revisit array resize logic
netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
tls: Purge async_hold in tls_decrypt_async_wait()
selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
Bluetooth: btusb: clamp SCO altsetting table indices
Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
Bluetooth: L2CAP: Fix send LE flow credits in ACL link
net: mana: fix use-after-free in add_adev() error path
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
"A set of fixes for DMA-mapping subsystem, which resolve false-
positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida
and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)"
* tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: add missing `inline` for `dma_free_attrs`
mm/hmm: Indicate that HMM requires DMA coherency
RDMA/umem: Tell DMA mapping that UMEM requires coherency
iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute
dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set
dma-mapping: Introduce DMA require coherency attribute
dma-mapping: Clarify valid conditions for CPU cache line overlap
dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output
dma-debug: Allow multiple invocations of overlapping entries
dma: swiotlb: add KMSAN annotations to swiotlb_bounce()
|
|
During futex_key_to_node_opt() execution, vma->vm_policy is read under
speculative mmap lock and RCU. Concurrently, mbind() may call
vma_replace_policy() which frees the old mempolicy immediately via
kmem_cache_free().
This creates a race where __futex_key_to_node() dereferences a freed
mempolicy pointer, causing a use-after-free read of mpol->mode.
[ 151.412631] BUG: KASAN: slab-use-after-free in __futex_key_to_node (kernel/futex/core.c:349)
[ 151.414046] Read of size 2 at addr ffff888001c49634 by task e/87
[ 151.415969] Call Trace:
[ 151.416732] __asan_load2 (mm/kasan/generic.c:271)
[ 151.416777] __futex_key_to_node (kernel/futex/core.c:349)
[ 151.416822] get_futex_key (kernel/futex/core.c:374 kernel/futex/core.c:386 kernel/futex/core.c:593)
Fix by adding rcu to __mpol_put().
Fixes: c042c505210d ("futex: Implement FUTEX2_MPOL")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hao-Yu Yang <naup96721@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Link: https://patch.msgid.link/20260324174418.GB1850007@noisy.programming.kicks-ass.net
|
|
The netfs_io_stream::front member is meant to point to the subrequest
currently being collected on a stream, but it isn't actually used this way
by direct write (which mostly ignores it). However, there's a tracepoint
which looks at it. Further, stream->front is actually redundant with
stream->subrequests.next.
Fix the potential problem in the direct code by just removing the member
and using stream->subrequests.next instead, thereby also simplifying the
code.
Fixes: a0b4c7a49137 ("netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence")
Reported-by: Paulo Alcantara <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/4158599.1774426817@warthog.procyon.org.uk
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Replace manual range and mask validations with netlink policy
annotations in ctnetlink code paths, so that the netlink core rejects
invalid values early and can generate extack errors.
- CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at
policy level, removing the manual >= TCP_CONNTRACK_MAX check.
- CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE
(14). The normal TCP option parsing path already clamps to this value,
but the ctnetlink path accepted 0-255, causing undefined behavior when
used as a u32 shift count.
- CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with
CTA_FILTER_F_ALL, removing the manual mask checks.
- CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding
a new mask define grouping all valid expect flags.
Extracted from a broader nf-next patch by Florian Westphal, scoped to
ctnetlink for the fixes tree.
Fixes: c8e2078cfe41 ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling")
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
__nf_ct_expect_find() and nf_ct_expect_find_get() are called under
rcu_read_lock() but they dereference the master conntrack via
exp->master.
Since the expectation does not hold a reference on the master conntrack,
this could be dying conntrack or different recycled conntrack than the
real master due to SLAB_TYPESAFE_RCU.
Store the netns, the master_tuple and the zone in struct
nf_conntrack_expect as a safety measure.
This patch is required by the follow up fix not to dump expectations
that do not belong to this netns.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Holding reference on the expectation is not sufficient, the master
conntrack object can just go away, making exp->master invalid.
To access exp->master safely:
- Grab the nf_conntrack_expect_lock, this gets serialized with
clean_from_lists() which also holds this lock when the master
conntrack goes away.
- Hold reference on master conntrack via nf_conntrack_find_get().
Not so easy since the master tuple to look up for the master conntrack
is not available in the existing problematic paths.
This patch goes for extending the nf_conntrack_expect_lock section
to address this issue for simplicity, in the cases that are described
below this is just slightly extending the lock section.
The add expectation command already holds a reference to the master
conntrack from ctnetlink_create_expect().
However, the delete expectation command needs to grab the spinlock
before looking up for the expectation. Expand the existing spinlock
section to address this to cover the expectation lookup. Note that,
the nf_ct_expect_iterate_net() calls already grabs the spinlock while
iterating over the expectation table, which is correct.
The get expectation command needs to grab the spinlock to ensure master
conntrack does not go away. This also expands the existing spinlock
section to cover the expectation lookup too. I needed to move the
netlink skb allocation out of the spinlock to keep it GFP_KERNEL.
For the expectation events, the IPEXP_DESTROY event is already delivered
under the spinlock, just move the delivery of IPEXP_NEW under the
spinlock too because the master conntrack event cache is reached through
exp->master.
While at it, add lockdep notations to help identify what codepaths need
to grab the spinlock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The expectation helper field is mostly unused. As a result, the
netfilter codebase relies on accessing the helper through exp->master.
Always set on the expectation helper field so it can be used to reach
the helper.
nf_ct_expect_init() is called from packet path where the skb owns
the ct object, therefore accessing exp->master for the newly created
expectation is safe. This saves a lot of updates in all callsites
to pass the ct object as parameter to nf_ct_expect_init().
This is a preparation patches for follow up fixes.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Jihed Chaibi <jihed.chaibi.dev@gmail.com> says:
adau1372_set_power() had two related error handling issues in its enable
path: clk_prepare_enable() was called but its return value discarded, and
adau1372_enable_pll() was a void function that silently swallowed lock
failures, leaving mclk enabled and adau1372->enabled set to true despite
the device being in a broken state.
Patch 1 fixes the unchecked clk_prepare_enable() by making
adau1372_set_power() return int and propagating the error.
Patch 2 converts adau1372_enable_pll() to return int and adds a full
unwind in adau1372_set_power() if PLL lock fails, reversing the regcache,
GPIO power-down, and clock state.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU fixes from Boqun Feng:
"Fix a regression introduced by commit c27cea4416a3 ("rcu: Re-implement
RCU Tasks Trace in terms of SRCU-fast"): BPF contexts can run with
preemption disabled or scheduler locks held, so call_srcu() must work
in all such contexts.
Fix this by converting SRCU's spinlocks to raw spinlocks and avoiding
scheduler lock acquisition in call_srcu() by deferring to an irq_work
(similar to call_rcu_tasks_generic()), for both tree SRCU and tiny
SRCU.
Also fix a follow-on lockdep splat caused by srcu_node allocation
under the newly introduced raw spinlock by deferring the allocation to
grace-period worker context"
* tag 'rcu-fixes.v7.0-20260325a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
srcu: Use irq_work to start GP in tiny SRCU
rcu: Use an intermediate irq_work to start process_srcu()
srcu: Push srcu_node allocation to GP when non-preemptible
srcu: Use raw spinlocks so call_srcu() can be used under preempt_disable()
|
|
Tiny SRCU's srcu_gp_start_if_needed() directly calls schedule_work(),
which acquires the workqueue pool->lock.
This causes a lockdep splat when call_srcu() is called with a scheduler
lock held, due to:
call_srcu() [holding pi_lock]
srcu_gp_start_if_needed()
schedule_work() -> pool->lock
workqueue_init() / create_worker() [holding pool->lock]
wake_up_process() -> try_to_wake_up() -> pi_lock
Also add irq_work_sync() to cleanup_srcu_struct() to prevent a
use-after-free if a queued irq_work fires after cleanup begins.
Tested with rcutorture SRCU-T and no lockdep warnings.
[ Thanks to Boqun for similar fix in patch "rcu: Use an intermediate irq_work
to start process_srcu()" ]
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>
|
|
Since commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms
of SRCU-fast") we switched to SRCU in BPF. However as BPF instrument can
happen basically everywhere (including where a scheduler lock is held),
call_srcu() now needs to avoid acquiring scheduler lock because
otherwise it could cause deadlock [1]. Fix this by following what the
previous RCU Tasks Trace did: using an irq_work to delay the queuing of
the work to start process_srcu().
[boqun: Apply Joel's feedback]
[boqun: Apply Andrea's test feedback]
Reported-by: Andrea Righi <arighi@nvidia.com>
Closes: https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/
Fixes: commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Link: https://lore.kernel.org/rcu/3c4c5a29-24ea-492d-aeee-e0d9605b4183@nvidia.com/ [1]
Suggested-by: Zqiang <qiang.zhang@linux.dev>
Tested-by: Andrea Righi <arighi@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
|
|
Tree SRCU has used non-raw spinlocks for many years, motivated by a desire
to avoid unnecessary real-time latency and the absence of any reason to
use raw spinlocks. However, the recent use of SRCU in tracing as the
underlying implementation of RCU Tasks Trace means that call_srcu()
is invoked from preemption-disabled regions of code, which in turn
requires that any locks acquired by call_srcu() or its callees must be
raw spinlocks.
This commit therefore converts SRCU's spinlocks to raw spinlocks.
[boqun: Add Fixes tag]
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Fixes: c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The number of the initialization table may exceed 2048.
Therefore, this patch removes the limitation and allows the driver to
allocate memory dynamically based on the size of the initialization table.
Signed-off-by: Shuming Fan <shumingf@realtek.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260325092017.3221640-1-shumingf@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Under an UML build for an upcoming series [1], I got `-Wstatic-in-inline`
for `dma_free_attrs`:
BINDGEN rust/bindings/bindings_generated.rs - due to target missing
In file included from rust/helpers/helpers.c:59:
rust/helpers/dma.c:17:2: warning: static function 'dma_free_attrs' is used in an inline function with external linkage [-Wstatic-in-inline]
17 | dma_free_attrs(dev, size, cpu_addr, dma_handle, attrs);
| ^
rust/helpers/dma.c:12:1: note: use 'static' to give inline function 'rust_helper_dma_free_attrs' internal linkage
12 | __rust_helper void rust_helper_dma_free_attrs(struct device *dev, size_t size,
| ^
| static
The issue is that `dma_free_attrs` was not marked `inline` when it was
introduced alongside the rest of the stubs.
Thus mark it.
Fixes: ed6ccf10f24b ("dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA")
Closes: https://lore.kernel.org/rust-for-linux/20260322194616.89847-1-ojeda@kernel.org/ [1]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260325015548.70912-1-ojeda@kernel.org
|
|
When codel_dequeue() finds an empty queue, it resets vars->dropping
but does not reset vars->first_above_time. The reference CoDel
algorithm (Nichols & Jacobson, ACM Queue 2012) resets both:
dodeque_result codel_queue_t::dodeque(time_t now) {
...
if (r.p == NULL) {
first_above_time = 0; // <-- Linux omits this
}
...
}
Note that codel_should_drop() does reset first_above_time when called
with a NULL skb, but codel_dequeue() returns early before ever calling
codel_should_drop() in the empty-queue case. The post-drop code paths
do reach codel_should_drop(NULL) and correctly reset the timer, so a
dropped packet breaks the cycle -- but the next delivered packet
re-arms first_above_time and the cycle repeats.
For sparse flows such as ICMP ping (one packet every 200ms-1s), the
first packet arms first_above_time, the flow goes empty, and the
second packet arrives after the interval has elapsed and gets dropped.
The pattern repeats, producing sustained loss on flows that are not
actually congested.
Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the
consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms
target), ping at 5 pps under UDP flood:
Before fix: 26% ping packet loss
After fix: 0% ping packet loss
Fix by resetting first_above_time to zero in the empty-queue path
of codel_dequeue(), matching the reference algorithm.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
Fixes: d068ca2ae2e6 ("codel: split into multiple files")
Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Reported-by: Chris Arges <carges@cloudflare.com>
Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260323174920.253526-1-hawk@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"6 hotfixes. 2 are cc:stable. All are for MM.
All are singletons - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/stat: monitor all System RAM resources
mm/zswap: add missing kunmap_local()
mailmap: update email address for Muhammad Usama Anjum
zram: do not slot_free() written-back slots
mm/damon/core: avoid use of half-online-committed context
mm/rmap: clear vma->anon_vma on error
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Also note that we do not enable the driver_override feature of struct
bus_type, as SPI - in contrast to most other buses - passes "" to
sysfs_emit() when the driver_override pointer is NULL. Thus, printing
"\n" instead of "(null)\n".
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 5039563e7c25 ("spi: Add driver_override SPI device attribute")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260324005919.2408620-12-dakr@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2026-03-23
1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi.
From Sabrina Dubroca.
2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the
proper check. From Sabrina Dubroca.
3) Call xdo_dev_state_delete during state update to properly cleanup
the xdo device state. From Sabrina Dubroca.
4) Fix a potential skb leak in espintcp when async crypto is used.
From Sabrina Dubroca.
5) Validate inner IPv4 header length in IPTFS payload to avoid
parsing malformed packets. From Roshan Kumar.
6) Fix skb_put() panic on non-linear skb during IPTFS reassembly.
From Fernando Fernandez Mancera.
7) Silence various sparse warnings related to RCU, state, and policy
handling. From Sabrina Dubroca.
8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini().
From Hyunwoo Kim.
9) Prevent policy_hthresh.work from racing with netns teardown by using
a proper cleanup mechanism. From Minwoo Ra.
10) Validate that the family of the source and destination addresses match
in pfkey_send_migrate(). From Eric Dumazet.
11) Only publish mode_data after the clone is setup in the IPTFS receive path.
This prevents leaving x->mode_data pointing at freed memory on error.
From Paul Moses.
Please pull or let me know if there are problems.
ipsec-2026-03-23
* tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: iptfs: only publish mode_data after clone setup
af_key: validate families in pfkey_send_migrate()
xfrm: prevent policy_hthresh.work from racing with netns teardown
xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
xfrm: avoid RCU warnings around the per-netns netlink socket
xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo
xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo
xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini}
xfrm: state: silence sparse warnings during netns exit
xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto
xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock
xfrm: state: fix sparse warnings around XFRM_STATE_INSERT
xfrm: state: fix sparse warnings in xfrm_state_init
xfrm: state: fix sparse warnings on xfrm_state_hold_rcu
xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly
xfrm: iptfs: validate inner IPv4 header length in IPTFS payload
esp: fix skb leak with espintcp and async crypto
xfrm: call xdo_dev_state_delete during state update
xfrm: fix the condition on x->pcpu_num in xfrm_sa_len
xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi
====================
Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The commit a2fb4bc4e2a6a03 ("net: implement virtio helpers to handle UDP
GSO tunneling.") introduces support for the UDP GSO tunnel feature in
virtio-net.
The virtio spec says:
If the \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or
VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, \field{hdr_len} accounts for
all the headers up to and including the inner transport.
The commit did not update the hdr_len to include the inner transport.
I observed that the "hdr_len" is 116 for this packet:
17:36:18.241105 52:55:00:d1:27:0a > 2e:2c:df:46:a9:e1, ethertype IPv4 (0x0800), length 2912: (tos 0x0, ttl 64, id 45197, offset 0, flags [none], proto UDP (17), length 2898)
192.168.122.100.50613 > 192.168.122.1.4789: [bad udp cksum 0x8106 -> 0x26a0!] VXLAN, flags [I] (0x08), vni 1
fa:c3:ba:82:05:ee > ce:85:0c:31:77:e5, ethertype IPv4 (0x0800), length 2862: (tos 0x0, ttl 64, id 14678, offset 0, flags [DF], proto TCP (6), length 2848)
192.168.3.1.49880 > 192.168.3.2.9898: Flags [P.], cksum 0x9266 (incorrect -> 0xaa20), seq 515667:518463, ack 1, win 64, options [nop,nop,TS val 2990048824 ecr 2798801412], length 2796
116 = 14(mac) + 20(ip) + 8(udp) + 8(vxlan) + 14(inner mac) + 20(inner ip) + 32(innner tcp)
Fixes: a2fb4bc4e2a6a03 ("net: implement virtio helpers to handle UDP GSO tunneling.")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260320021818.111741-3-xuanzhuo@linux.alibaba.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The commit be50da3e9d4a ("net: virtio_net: implement exact header length
guest feature") introduces support for the VIRTIO_NET_F_GUEST_HDRLEN
feature in virtio-net.
This feature requires virtio-net to set hdr_len to the actual header
length of the packet when transmitting, the number of
bytes from the start of the packet to the beginning of the
transport-layer payload.
However, in practice, hdr_len was being set using skb_headlen(skb),
which is clearly incorrect. This commit fixes that issue.
Fixes: be50da3e9d4a ("net: virtio_net: implement exact header length guest feature")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260320021818.111741-2-xuanzhuo@linux.alibaba.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Restrict the xen privcmd driver in unprivileged domU to only allow
hypercalls to target domain when using secure boot"
* tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/privcmd: add boot control for restricted usage in domU
xen/privcmd: restrict usage in unprivileged domU
|
|
When binding a udp_sock to a local address and port, UDP uses
two hashes (udptable->hash and udptable->hash2) for collision
detection. The current code switches to "hash2" when
hslot->count > 10.
"hash2" is keyed by local address and local port.
"hash" is keyed by local port only.
The issue can be shown in the following bind sequence (pseudo code):
bind(fd1, "[fd00::1]:8888")
bind(fd2, "[fd00::2]:8888")
bind(fd3, "[fd00::3]:8888")
bind(fd4, "[fd00::4]:8888")
bind(fd5, "[fd00::5]:8888")
bind(fd6, "[fd00::6]:8888")
bind(fd7, "[fd00::7]:8888")
bind(fd8, "[fd00::8]:8888")
bind(fd9, "[fd00::9]:8888")
bind(fd10, "[fd00::10]:8888")
/* Correctly return -EADDRINUSE because "hash" is used
* instead of "hash2". udp_lib_lport_inuse() detects the
* conflict.
*/
bind(fail_fd, "[::]:8888")
/* After one more socket is bound to "[fd00::11]:8888",
* hslot->count exceeds 10 and "hash2" is used instead.
*/
bind(fd11, "[fd00::11]:8888")
bind(fail_fd, "[::]:8888") /* succeeds unexpectedly */
The same issue applies to the IPv4 wildcard address "0.0.0.0"
and the IPv4-mapped wildcard address "::ffff:0.0.0.0". For
example, if there are existing sockets bound to
"192.168.1.[1-11]:8888", then binding "0.0.0.0:8888" or
"[::ffff:0.0.0.0]:8888" can also miss the conflict when
hslot->count > 10.
TCP inet_csk_get_port() already has the correct check in
inet_use_bhash2_on_bind(). Rename it to
inet_use_hash2_on_bind() and move it to inet_hashtables.h
so udp.c can reuse it in this fix.
Fixes: 30fff9231fad ("udp: bind() optimisation")
Reported-by: Andrew Onyshchuk <oandrew@meta.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260319181817.1901357-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cited commit mechanically put fib6_remove_gc_list()
just after every fib6_clean_expires() call.
When a temporary route is promoted to a permanent route,
there may already be exception routes tied to it.
If fib6_remove_gc_list() removes the route from tb6_gc_hlist,
such exception routes will no longer be aged.
Let's replace fib6_remove_gc_list() with a new helper
fib6_may_remove_gc_list() and use fib6_age_exceptions() there.
Note that net->ipv6 is only compiled when CONFIG_IPV6 is
enabled, so fib6_{add,remove,may_remove}_gc_list() are guarded.
Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260320072317.2561779-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
btrfs_sync_file()
If overlay is used on top of btrfs, dentry->d_sb translates to overlay's
super block and fsid assignment will lead to a crash.
Use file_inode(file)->i_sb to always get btrfs_sb.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Revert "tracing: Remove pid in task_rename tracing output"
A change was made to remove the pid field from the task_rename event
because it was thought that it was always done for the current task
and recording the pid would be redundant. This turned out to be
incorrect and there are a few corner case where this is not true and
caused some regressions in tooling.
- Fix the reading from user space for migration
The reading of user space uses a seq lock type of logic where it uses
a per-cpu temporary buffer and disables migration, then enables
preemption, does the copy from user space, disables preemption,
enables migration and checks if there was any schedule switches while
preemption was enabled. If there was a context switch, then it is
considered that the per-cpu buffer could be corrupted and it tries
again. There's a protection check that tests if it takes a hundred
tries, it issues a warning and exits out to prevent a live lock.
This was triggered because the task was selected by the load balancer
to be migrated to another CPU, every time preemption is enabled the
migration task would schedule in try to migrate the task but can't
because migration is disabled and let it run again. This caused the
scheduler to schedule out the task every time it enabled preemption
and made the loop never exit (until the 100 iteration test
triggered).
Fix this by enabling and disabling preemption and keeping migration
enabled if the reading from user space needs to be done again. This
will let the migration thread migrate the task and the copy from user
space will likely pass on the next iteration.
- Fix trace_marker copy option freeing
The "copy_trace_marker" option allows a tracing instance to get a
copy of a write to the trace_marker file of the top level instance.
This is managed by a link list protected by RCU. When an instance is
removed, a check is made if the option is set, and if so
synchronized_rcu() is called.
The problem is that an iteration is made to reset all the flags to
what they were when the instance was created (to perform clean ups)
was done before the check of the copy_trace_marker option and that
option was cleared, so the synchronize_rcu() was never called.
Move the clearing of all the flags after the check of
copy_trace_marker to do synchronize_rcu() so that the option is still
set if it was before and the synchronization is performed.
- Fix entries setting when validating the persistent ring buffer
When validating the persistent ring buffer on boot up, the number of
events per sub-buffer is added to the sub-buffer meta page. The
validator was updating cpu_buffer->head_page (the first sub-buffer of
the per-cpu buffer) and not the "head_page" variable that was
iterating the sub-buffers. This was causing the first sub-buffer to
be assigned the entries for each sub-buffer and not the sub-buffer
that was supposed to be updated.
- Use "hash" value to update the direct callers
When updating the ftrace direct callers, it assigned a temporary
callback to all the callback functions of the ftrace ops and not just
the functions represented by the passed in hash. This causes an
unnecessary slow down of the functions of the ftrace_ops that is not
being modified. Only update the functions that are going to be
modified to call the ftrace loop function so that the update can be
made on those functions.
* tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
tracing: Fix trace_marker copy link list updates
tracing: Fix failure to read user space from system call trace events
tracing: Revert "tracing: Remove pid in task_rename tracing output"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fix a sparse build error regression in <linux/local_lock_internal.h>
caused by the locking context-analysis changes"
* tag 'locking-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
include/linux/local_lock_internal.h: Make this header file again compatible with sparse
|
|
One major usage of damon_call() is online DAMON parameters update. It is
done by calling damon_commit_ctx() inside the damon_call() callback
function. damon_commit_ctx() can fail for two reasons: 1) invalid
parameters and 2) internal memory allocation failures. In case of
failures, the damon_ctx that attempted to be updated (commit destination)
can be partially updated (or, corrupted from a perspective), and therefore
shouldn't be used anymore. The function only ensures the damon_ctx object
can safely deallocated using damon_destroy_ctx().
The API callers are, however, calling damon_commit_ctx() only after
asserting the parameters are valid, to avoid damon_commit_ctx() fails due
to invalid input parameters. But it can still theoretically fail if the
internal memory allocation fails. In the case, DAMON may run with the
partially updated damon_ctx. This can result in unexpected behaviors
including even NULL pointer dereference in case of damos_commit_dests()
failure [1]. Such allocation failure is arguably too small to fail, so
the real world impact would be rare. But, given the bad consequence, this
needs to be fixed.
Avoid such partially-committed (maybe-corrupted) damon_ctx use by saving
the damon_commit_ctx() failure on the damon_ctx object. For this,
introduce damon_ctx->maybe_corrupted field. damon_commit_ctx() sets it
when it is failed. kdamond_call() checks if the field is set after each
damon_call_control->fn() is executed. If it is set, ignore remaining
callback requests and return. All kdamond_call() callers including
kdamond_fn() also check the maybe_corrupted field right after
kdamond_call() invocations. If the field is set, break the kdamond_fn()
main loop so that DAMON sill doesn't use the context that might be
corrupted.
[sj@kernel.org: let kdamond_call() with cancel regardless of maybe_corrupted]
Link: https://lkml.kernel.org/r/20260320031553.2479-1-sj@kernel.org
Link: https://sashiko.dev/#/patchset/20260319145218.86197-1-sj%40kernel.org
Link: https://lkml.kernel.org/r/20260319145218.86197-1-sj@kernel.org
Link: https://lore.kernel.org/20260319043309.97966-1-sj@kernel.org [1]
Fixes: 3301f1861d34 ("mm/damon/sysfs: handle commit command using damon_call()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Generalize driver_override in the driver core, providing a common
sysfs implementation and concurrency-safe accessors for bus
implementations
- Do not use driver_override as IRQ name in the hwmon axi-fan driver
- Remove an unnecessary driver_override check in sh platform_early
- Migrate the platform bus to use the generic driver_override
infrastructure, fixing a UAF condition caused by accessing the
driver_override field without proper locking in the platform_match()
callback
* tag 'driver-core-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
driver core: platform: use generic driver_override infrastructure
sh: platform_early: remove pdev->driver_override check
hwmon: axi-fan: don't use driver_override as IRQ name
docs: driver-model: document driver_override
driver core: generalize driver_override in struct device
|
|
This reverts commit e3f6a42272e028c46695acc83fc7d7c42f2750ad.
The commit says that the tracepoint only deals with the current task,
however the following case is not current task:
comm_write() {
p = get_proc_task(inode);
if (!p)
return -ESRCH;
if (same_thread_group(current, p))
set_task_comm(p, buffer);
}
where set_task_comm() calls __set_task_comm() which records
the update of p and not current.
So revert the patch to show pid.
Cc: <mhiramat@kernel.org>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <elver@google.com>
Cc: <kees@kernel.org>
Link: https://patch.msgid.link/20260306075954.4533-1-xuewen.yan@unisoc.com
Fixes: e3f6a42272e0 ("tracing: Remove pid in task_rename tracing output")
Reported-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fixes from Kees Cook:
- binfmt_elf_fdpic: fix AUXV size calculation (Andrei Vagin)
- fs/tests: exec: Remove bad test vector
* tag 'execve-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fs/tests: exec: Remove bad test vector
binfmt_elf_fdpic: fix AUXV size calculation for ELF_HWCAP3 and ELF_HWCAP4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/vt and serial driver fixes for 7.0-rc5.
Included in here are:
- 8250 driver fixes for reported problems
- serial core lockup fix
- uartlite driver bugfix
- vt save/restore bugfix
All of these have been in linux-next for over a week with no reported
problems"
* tag 'tty-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: save/restore unicode screen buffer for alternate screen
serial: 8250_dw: Ensure BUSY is deasserted
serial: 8250: Add late synchronize_irq() to shutdown to handle DW UART BUSY
serial: 8250_dw: Rework IIR_NO_INT handling to stop interrupt storm
serial: 8250_dw: Rework dw8250_handle_irq() locking and IIR handling
serial: 8250: Add serial8250_handle_irq_locked()
serial: 8250_dw: Avoid unnecessary LCR writes
serial: 8250: Protect LCR write in shutdown
serial: 8250_pci: add support for the AX99100
serial: core: fix infinite loop in handle_tx() for PORT_UNKNOWN
serial: uartlite: fix PM runtime usage count underflow on probe
serial: 8250: always disable IRQ during THRE test
serial: 8250: Fix TX deadlock when using DMA
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- A bit of a work-around for AF_UNIX recv multishot, as the in-kernel
implementation doesn't properly signal EOF. We'll likely rework this
one going forward, but the fix is sufficient for now
- Two fixes for incrementally consumed buffers, for non-pollable files
and for 0 byte reads
* tag 'io_uring-7.0-20260320' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/kbuf: propagate BUF_MORE through early buffer commit path
io_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOF
io_uring/poll: fix multishot recv missing EOF on wakeup race
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"Intel VT-d:
- Abort all pending requests on dev_tlb_inv timeout to avoid
hardlockup
- Limit IOPF handling to PRI-capable device to avoid SVA attach
failure
AMD-Vi:
- Make sure identity domain is not used when SNP is active
Core fixes:
- Handle mapping IOVA 0x0 correctly
- Fix crash in SVA code
- Kernel-doc fix in IO-PGTable code"
* tag 'iommu-fixes-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/amd: Block identity domain when SNP enabled
iommu/sva: Fix crash in iommu_sva_unbind_device()
iommu/io-pgtable: fix all kernel-doc warnings in io-pgtable.h
iommu: Fix mapping check for 0x0 to avoid re-mapping it
iommu/vt-d: Only handle IOPF for SVA when PRI is supported
iommu/vt-d: Fix intel iommu iotlb sync hardlockup and retry
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu:
- Fix ARM64 MSHV support (Anirudh Rayabharam)
- Fix MSHV driver memory handling issues (Stanislav Kinsburskii)
- Update maintainers for Hyper-V DRM driver (Saurabh Sengar)
- Misc clean up in MSHV crashdump code (Ard Biesheuvel, Uros Bizjak)
- Minor improvements to MSHV code (Mukesh R, Wei Liu)
- Revert not yet released MSHV scrub partition hypercall (Wei Liu)
* tag 'hyperv-fixes-signed-20260319' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: Fix error handling in mshv_region_pin
MAINTAINERS: Update maintainers for Hyper-V DRM driver
mshv: Fix use-after-free in mshv_map_user_memory error path
mshv: pass struct mshv_user_mem_region by reference
x86/hyperv: Use any general-purpose register when saving %cr2 and %cr8
x86/hyperv: Use current_stack_pointer to avoid asm() in hv_hvcrash_ctxt_save()
x86/hyperv: Save segment registers directly to memory in hv_hvcrash_ctxt_save()
x86/hyperv: Use __naked attribute to fix stackless C function
Revert "mshv: expose the scrub partition hypercall"
mshv: add arm64 support for doorbell & intercept SINTs
mshv: refactor synic init and cleanup
x86/hyperv: print out reserved vectors in hexadecimal
|
|
Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot
guarantee data persistence on sync (eg fuse). For superblocks with this
flag set, sync kicks off writeback of dirty inodes but does not wait
for the flusher threads to complete the writeback.
This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in
commit f9a49aa302a0 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings
in wait_sb_inodes()"). The flag belongs at the superblock level because
data integrity is a filesystem-wide property, not a per-inode one.
Having this flag at the superblock level also allows us to skip having
to iterate every dirty inode in wait_sb_inodes() only to skip each inode
individually.
Prior to this commit, mappings with no data integrity guarantees skipped
waiting on writeback completion but still waited on the flusher threads
to finish initiating the writeback. Waiting on the flusher threads is
unnecessary. This commit kicks off writeback but does not wait on the
flusher threads. This change properly addresses a recent report [1] for
a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting
on the flusher threads to finish:
Workqueue: pm_fs_sync pm_fs_sync_work_fn
Call Trace:
<TASK>
__schedule+0x457/0x1720
schedule+0x27/0xd0
wb_wait_for_completion+0x97/0xe0
sync_inodes_sb+0xf8/0x2e0
__iterate_supers+0xdc/0x160
ksys_sync+0x43/0xb0
pm_fs_sync_work_fn+0x17/0xa0
process_one_work+0x193/0x350
worker_thread+0x1a1/0x310
kthread+0xfc/0x240
ret_from_fork+0x243/0x280
ret_from_fork_asm+0x1a/0x30
</TASK>
On fuse this is problematic because there are paths that may cause the
flusher thread to block (eg if systemd freezes the user session cgroups
first, which freezes the fuse daemon, before invoking the kernel
suspend. The kernel suspend triggers ->write_node() which on fuse issues
a synchronous setattr request, which cannot be processed since the
daemon is frozen. Or if the daemon is buggy and cannot properly complete
writeback, initiating writeback on a dirty folio already under writeback
leads to writeback_get_folio() -> folio_prepare_writeback() ->
unconditional wait on writeback to finish, which will cause a hang).
This commit restores fuse to its prior behavior before tmp folios were
removed, where sync was essentially a no-op.
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a-asuvfrbKXbEwwDSctvemF+6zfhdnuzO65Pt8HsFSRw@mail.gmail.com/T/#m632c4648e9cafc4239299887109ebd880ac6c5c1
Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
Reported-by: John <therealgraysky@proton.me>
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260320005145.2483161-2-joannelkoong@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When running in an unprivileged domU under Xen, the privcmd driver
is restricted to allow only hypercalls against a target domain, for
which the current domU is acting as a device model.
Add a boot parameter "unrestricted" to allow all hypercalls (the
hypervisor will still refuse destructive hypercalls affecting other
guests).
Make this new parameter effective only in case the domU wasn't started
using secure boot, as otherwise hypercalls targeting the domU itself
might result in violating the secure boot functionality.
This is achieved by adding another lockdown reason, which can be
tested to not being set when applying the "unrestricted" option.
This is part of XSA-482
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
|
|
The mapping buffers which carry this attribute require DMA coherent system.
This means that they can't take SWIOTLB path, can perform CPU cache overlap
and doesn't perform cache flushing.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-4-1dde90a7f08b@nvidia.com
|
|
Rename the DMA_ATTR_CPU_CACHE_CLEAN attribute to better reflect that it
is debugging aid to inform DMA core code that CPU cache line overlaps are
allowed, and refine the documentation describing its use.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-3-1dde90a7f08b@nvidia.com
|
|
Tracing prints decoded DMA attribute flags, but it does not yet
include the recently added DMA_ATTR_CPU_CACHE_CLEAN. Add support
for decoding and displaying this attribute in the trace output.
Fixes: 61868dc55a11 ("dma-mapping: add DMA_ATTR_CPU_CACHE_CLEAN")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-2-1dde90a7f08b@nvidia.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_ll: Fix firmware leak on error path
- hci_sync: annotate data-races around hdev->req_status
- L2CAP: Fix null-ptr-deref on l2cap_sock_ready_cb
- L2CAP: Validate PDU length before reading SDU length in l2cap_ecred_data_rcv()
- L2CAP: Fix regressions caused by reusing ident
- L2CAP: Fix stack-out-of-bounds read in l2cap_ecred_conn_req
- MGMT: Fix dangling pointer on mgmt_add_adv_patterns_monitor_complete
- SCO: Fix use-after-free in sco_recv_frame() due to missing sock_hold
* tag 'for-net-2026-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix regressions caused by reusing ident
Bluetooth: L2CAP: Fix null-ptr-deref on l2cap_sock_ready_cb
Bluetooth: hci_ll: Fix firmware leak on error path
Bluetooth: hci_sync: annotate data-races around hdev->req_status
Bluetooth: MGMT: Fix dangling pointer on mgmt_add_adv_patterns_monitor_complete
Bluetooth: SCO: Fix use-after-free in sco_recv_frame() due to missing sock_hold
Bluetooth: L2CAP: Validate PDU length before reading SDU length in l2cap_ecred_data_rcv()
Bluetooth: L2CAP: Fix stack-out-of-bounds read in l2cap_ecred_conn_req
====================
Link: https://patch.msgid.link/20260319190455.135302-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When io_should_commit() returns true (eg for non-pollable files), buffer
commit happens at buffer selection time and sel->buf_list is set to
NULL. When __io_put_kbufs() generates CQE flags at completion time, it
calls __io_put_kbuf_ring() which finds a NULL buffer_list and hence
cannot determine whether the buffer was consumed or not. This means that
IORING_CQE_F_BUF_MORE is never set for non-pollable input with
incrementally consumed buffers.
Likewise for io_buffers_select(), which always commits upfront and
discards the return value of io_kbuf_commit().
Add REQ_F_BUF_MORE to store the result of io_kbuf_commit() during early
commit. Then __io_put_kbuf_ring() can check this flag and set
IORING_F_BUF_MORE accordingy.
Reported-by: Martin Michaelis <code@mgjm.de>
Cc: stable@vger.kernel.org
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Link: https://github.com/axboe/liburing/issues/1553
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This attempt to fix regressions caused by reusing ident which apparently
is not handled well on certain stacks causing the stack to not respond to
requests, so instead of simple returning the first unallocated id this
stores the last used tx_ident and then attempt to use the next until all
available ids are exausted and then cycle starting over to 1.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221120
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221177
Fixes: 6c3ea155e5ee ("Bluetooth: L2CAP: Fix not tracking outstanding TX ident")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Christian Eggers <ceggers@arri.de>
|