| Age | Commit message (Collapse) | Author |
|
commit b56ebe7c896dc78b5865ec2c4b1dae3c93537517 upstream.
commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less
convoluted"), reworked the code so that the x86 specific quirk for affinity
setting of non-maskable PCI/MSI interrupts is not longer activated if
necessary.
This could be solved by restoring the original logic in the core MSI code,
but after a deeper analysis it turned out that the quirk flag is not
required at all.
The quirk is only required when the PCI/MSI device cannot mask the MSI
interrupts, which in turn also prevents reservation mode from being enabled
for the affected interrupt.
This allows ot remove the NOMASK quirk bit completely as msi_set_affinity()
can instead check whether reservation mode is enabled for the interrupt,
which gives exactly the same answer.
Even in the momentary non-existing case that the reservation mode would be
not set for a maskable MSI interrupt this would not cause any harm as it
just would cause msi_set_affinity() to go needlessly through the
functionaly equivalent slow path, which works perfectly fine with maskable
interrupts as well.
Rework msi_set_affinity() to query the reservation mode and remove all
NOMASK quirk logic from the core code.
[ tglx: Massaged changelog ]
Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8d91f3f8ae57e6292142ca89f322e90fa0d6ac02 ]
There's a number of drivers (e.g. dw_mmc, meson-gx, mmci, sunxi) using
the same mechanism and a private flag vqmmc_enabled to deal with
enabling/disabling the vqmmc regulator.
Move this to the core and create new helpers mmc_regulator_enable_vqmmc
and mmc_regulator_disable_vqmmc.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/71586432-360f-9b92-17f6-b05a8a971bc2@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Stable-dep-of: 477865af60b2 ("mmc: sdhci-sprd: Fix vqmmc not shutting down after the card was pulled")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bef4a48f4ef798c4feddf045d49e53c8a97d5e37 ]
A race condition exists where a synchronous (noqueue) transfer can be
active during a system suspend. This can cause a null pointer
dereference exception to occur when the system resumes.
Example order of events leading to the exception:
1. spi_sync() calls __spi_transfer_message_noqueue() which sets
ctlr->cur_msg
2. Spi transfer begins via spi_transfer_one_message()
3. System is suspended interrupting the transfer context
4. System is resumed
6. spi_controller_resume() calls spi_start_queue() which resets cur_msg
to NULL
7. Spi transfer context resumes and spi_finalize_current_message() is
called which dereferences cur_msg (which is now NULL)
Wait for synchronous transfers to complete before suspending by
acquiring the bus mutex and setting/checking a suspend flag.
Signed-off-by: Mark Hasemeyer <markhas@chromium.org>
Link: https://lore.kernel.org/r/20231107144743.v1.1.I7987f05f61901f567f7661763646cb7d7919b528@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 95ba893c9f4feb836ddce627efd0bb6af6667031 upstream.
It's valid to add the same fence multiple times to a dma-resv object and
we shouldn't need one extra slot for each.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: a3f7c10a269d5 ("dma-buf/dma-resv: check if the new fence is really later")
Cc: stable@vger.kernel.org # v5.19+
Link: https://patchwork.freedesktop.org/patch/msgid/20231115093035.1889-1-christian.koenig@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fc43e9c857b7aa55efba9398419b14d9e35dcc7d ]
hid_debug_events_release releases resources bound to the HID device instance.
hid_device_release releases the underlying HID device instance potentially
before hid_debug_events_release has completed releasing debug resources bound
to the same HID device instance.
Reference count to prevent the HID device instance from being torn down
preemptively when HID debugging support is used. When count reaches zero,
release core resources of HID device instance using hiddev_free.
The crash:
[ 120.728477][ T4396] kernel BUG at lib/list_debug.c:53!
[ 120.728505][ T4396] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 120.739806][ T4396] Modules linked in: bcmdhd dhd_static_buf 8822cu pcie_mhi r8168
[ 120.747386][ T4396] CPU: 1 PID: 4396 Comm: hidt_bridge Not tainted 5.10.110 #257
[ 120.754771][ T4396] Hardware name: Rockchip RK3588 EVB4 LP4 V10 Board (DT)
[ 120.761643][ T4396] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
[ 120.768338][ T4396] pc : __list_del_entry_valid+0x98/0xac
[ 120.773730][ T4396] lr : __list_del_entry_valid+0x98/0xac
[ 120.779120][ T4396] sp : ffffffc01e62bb60
[ 120.783126][ T4396] x29: ffffffc01e62bb60 x28: ffffff818ce3a200
[ 120.789126][ T4396] x27: 0000000000000009 x26: 0000000000980000
[ 120.795126][ T4396] x25: ffffffc012431000 x24: ffffff802c6d4e00
[ 120.801125][ T4396] x23: ffffff8005c66f00 x22: ffffffc01183b5b8
[ 120.807125][ T4396] x21: ffffff819df2f100 x20: 0000000000000000
[ 120.813124][ T4396] x19: ffffff802c3f0700 x18: ffffffc01d2cd058
[ 120.819124][ T4396] x17: 0000000000000000 x16: 0000000000000000
[ 120.825124][ T4396] x15: 0000000000000004 x14: 0000000000003fff
[ 120.831123][ T4396] x13: ffffffc012085588 x12: 0000000000000003
[ 120.837123][ T4396] x11: 00000000ffffbfff x10: 0000000000000003
[ 120.843123][ T4396] x9 : 455103d46b329300 x8 : 455103d46b329300
[ 120.849124][ T4396] x7 : 74707572726f6320 x6 : ffffffc0124b8cb5
[ 120.855124][ T4396] x5 : ffffffffffffffff x4 : 0000000000000000
[ 120.861123][ T4396] x3 : ffffffc011cf4f90 x2 : ffffff81fee7b948
[ 120.867122][ T4396] x1 : ffffffc011cf4f90 x0 : 0000000000000054
[ 120.873122][ T4396] Call trace:
[ 120.876259][ T4396] __list_del_entry_valid+0x98/0xac
[ 120.881304][ T4396] hid_debug_events_release+0x48/0x12c
[ 120.886617][ T4396] full_proxy_release+0x50/0xbc
[ 120.891323][ T4396] __fput+0xdc/0x238
[ 120.895075][ T4396] ____fput+0x14/0x24
[ 120.898911][ T4396] task_work_run+0x90/0x148
[ 120.903268][ T4396] do_exit+0x1bc/0x8a4
[ 120.907193][ T4396] do_group_exit+0x8c/0xa4
[ 120.911458][ T4396] get_signal+0x468/0x744
[ 120.915643][ T4396] do_signal+0x84/0x280
[ 120.919650][ T4396] do_notify_resume+0xd0/0x218
[ 120.924262][ T4396] work_pending+0xc/0x3f0
[ Rahul Rameshbabu <sergeantsagara@protonmail.com>: rework changelog ]
Fixes: cd667ce24796 ("HID: use debugfs for events/reports dumping")
Signed-off-by: Charles Yi <be286@163.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream.
The following can crash the kernel:
# cd /sys/kernel/tracing
# echo 'p:sched schedule' > kprobe_events
# exec 5>>events/kprobes/sched/enable
# > kprobe_events
# exec 5>&-
The above commands:
1. Change directory to the tracefs directory
2. Create a kprobe event (doesn't matter what one)
3. Open bash file descriptor 5 on the enable file of the kprobe event
4. Delete the kprobe event (removes the files too)
5. Close the bash file descriptor 5
The above causes a crash!
BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:tracing_release_file_tr+0xc/0x50
What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.
But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.
To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <beaub@linux.microsoft.com>
Tested-by: Beau Belgrave <beaub@linux.microsoft.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f55d8e60f10909dbc5524e261041e1d28d7d20d8 upstream.
This function takes a pointer to a pointer, unlike sprintf() which is
passed a plain pointer. Fix up the documentation to make this clear.
Fixes: 7888fe53b706 ("ethtool: Add common function for filling out strings")
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20231028192511.100001-1-andrew@lunn.ch
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b36995b8609a5a8fe5cf259a1ee768fcaed919f8 upstream.
-EOPNOTSUPP is the return value that implements a "no-op" hook, not 0.
Without this fix having only the BPF LSM enabled (with no programs
attached) can cause uninitialized variable reads in
nfsd4_encode_fattr(), because the BPF hook returns 0 without touching
the 'ctxlen' variable and the corresponding 'contextlen' variable in
nfsd4_encode_fattr() remains uninitialized, yet being treated as valid
based on the 0 return value.
Cc: stable@vger.kernel.org
Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 866d648059d5faf53f1cd960b43fe8365ad93ea7 upstream.
1 is the return value that implements a "no-op" hook, not 0.
Cc: stable@vger.kernel.org
Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9b6304c1d53745c300b86f202d0dcff395e2d2db upstream.
struct timespec64 has unused bits in the tv_nsec field that can be used
for other purposes. In future patches, we're going to change how the
inode->i_ctime is accessed in certain inodes in order to make use of
them. In order to do that safely though, we'll need to eradicate raw
accesses of the inode->i_ctime field from the kernel.
Add new accessor functions for the ctime that we use to replace them.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Message-Id: <20230705185812.579118-2-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ed9009ad300c0f15a3ecfe9613547b1962bde02c upstream.
Micron MTFC4GACAJCN eMMC supports cache but requires that flush cache
operation be allowed only after a write has occurred. Otherwise, the
cache flush command or subsequent commands will time out.
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Rafael Beims <rafael.beims@toradex.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231030224809.59245-1-beanhuo@iokpp.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 35f5d94187a6a3a8df2cba54beccca1c2379edb8 upstream.
Patch series "avoid divide-by-zero due to max_nr_accesses overflow".
The maximum nr_accesses of given DAMON context can be calculated by
dividing the aggregation interval by the sampling interval. Some logics
in DAMON uses the maximum nr_accesses as a divisor. Hence, the value
shouldn't be zero. Such case is avoided since DAMON avoids setting the
agregation interval as samller than the sampling interval. However, since
nr_accesses is unsigned int while the intervals are unsigned long, the
maximum nr_accesses could be zero while casting.
Avoid the divide-by-zero by implementing a function that handles the
corner case (first patch), and replaces the vulnerable direct max
nr_accesses calculations (remaining patches).
Note that the patches for the replacements are divided for broken commits,
to make backporting on required tres easier. Especially, the last patch
is for a patch that not yet merged into the mainline but in mm tree.
This patch (of 4):
The maximum nr_accesses of given DAMON context can be calculated by
dividing the aggregation interval by the sampling interval. Some logics
in DAMON uses the maximum nr_accesses as a divisor. Hence, the value
shouldn't be zero. Such case is avoided since DAMON avoids setting the
agregation interval as samller than the sampling interval. However, since
nr_accesses is unsigned int while the intervals are unsigned long, the
maximum nr_accesses could be zero while casting. Implement a function
that handles the corner case.
Note that this commit is not fixing the real issue since this is only
introducing the safe function that will replaces the problematic
divisions. The replacements will be made by followup commits, to make
backporting on stable series easier.
Link: https://lkml.kernel.org/r/20231019194924.100347-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231019194924.100347-2-sj@kernel.org
Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Jakub Acs <acsjakub@amazon.de>
Cc: <stable@vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8001f49394e353f035306a45bcf504f06fca6355 upstream.
The code that checks for unknown boot options is unaware of the sysctl
alias facility, which maps bootparams to sysctl values. If a user sets
an old value that has a valid alias, a message about an invalid
parameter will be printed during boot, and the parameter will get passed
to init. Fix by checking for the existence of aliased parameters in the
unknown boot parameter code. If an alias exists, don't return an error
or pass the value to init.
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: stable@vger.kernel.org
Fixes: 0a477e1ae21b ("kernel/sysctl: support handling command line aliases")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bfca5fb4e97c46503ddfc582335917b0cc228264 ]
RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir()
workqueue,which takes care about pipefs superblock locking.
In some special scenarios, when kernel frees the pipefs sb of the
current client and immediately alloctes a new pipefs sb,
rpc_remove_pipedir function would misjudge the existence of pipefs
sb which is not the one it used to hold. As a result,
the rpc_remove_pipedir would clean the released freed pipefs dentries.
To fix this issue, rpc_remove_pipedir should check whether the
current pipefs sb is consistent with the original pipefs sb.
This error can be catched by KASAN:
=========================================================
[ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200
[ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503
[ 250.500549] Workqueue: events rpc_free_client_work
[ 250.501001] Call Trace:
[ 250.502880] kasan_report+0xb6/0xf0
[ 250.503209] ? dget_parent+0x195/0x200
[ 250.503561] dget_parent+0x195/0x200
[ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10
[ 250.504384] rpc_rmdir_depopulate+0x1b/0x90
[ 250.504781] rpc_remove_client_dir+0xf5/0x150
[ 250.505195] rpc_free_client_work+0xe4/0x230
[ 250.505598] process_one_work+0x8ee/0x13b0
...
[ 22.039056] Allocated by task 244:
[ 22.039390] kasan_save_stack+0x22/0x50
[ 22.039758] kasan_set_track+0x25/0x30
[ 22.040109] __kasan_slab_alloc+0x59/0x70
[ 22.040487] kmem_cache_alloc_lru+0xf0/0x240
[ 22.040889] __d_alloc+0x31/0x8e0
[ 22.041207] d_alloc+0x44/0x1f0
[ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140
[ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110
[ 22.042459] rpc_create_client_dir+0x34/0x150
[ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0
[ 22.043284] rpc_client_register+0x136/0x4e0
[ 22.043689] rpc_new_client+0x911/0x1020
[ 22.044057] rpc_create_xprt+0xcb/0x370
[ 22.044417] rpc_create+0x36b/0x6c0
...
[ 22.049524] Freed by task 0:
[ 22.049803] kasan_save_stack+0x22/0x50
[ 22.050165] kasan_set_track+0x25/0x30
[ 22.050520] kasan_save_free_info+0x2b/0x50
[ 22.050921] __kasan_slab_free+0x10e/0x1a0
[ 22.051306] kmem_cache_free+0xa5/0x390
[ 22.051667] rcu_core+0x62c/0x1930
[ 22.051995] __do_softirq+0x165/0x52a
[ 22.052347]
[ 22.052503] Last potentially related work creation:
[ 22.052952] kasan_save_stack+0x22/0x50
[ 22.053313] __kasan_record_aux_stack+0x8e/0xa0
[ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0
[ 22.054209] dentry_free+0xb2/0x140
[ 22.054540] __dentry_kill+0x3be/0x540
[ 22.054900] shrink_dentry_list+0x199/0x510
[ 22.055293] shrink_dcache_parent+0x190/0x240
[ 22.055703] do_one_tree+0x11/0x40
[ 22.056028] shrink_dcache_for_umount+0x61/0x140
[ 22.056461] generic_shutdown_super+0x70/0x590
[ 22.056879] kill_anon_super+0x3a/0x60
[ 22.057234] rpc_kill_sb+0x121/0x200
Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines")
Signed-off-by: felix <fuzhen5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 87c3a5893e865739ce78aa7192d36011022e0af7 ]
Except on x86, preempt_count is always accessed with READ_ONCE().
Repeated invocations in macros like irq_count() produce repeated loads.
These redundant instructions appear in various fast paths. In the one
shown below, for example, irq_count() is evaluated during kernel entry
if !tick_nohz_full_cpu(smp_processor_id()).
0001ed0a <irq_enter_rcu>:
1ed0a: 4e56 0000 linkw %fp,#0
1ed0e: 200f movel %sp,%d0
1ed10: 0280 ffff e000 andil #-8192,%d0
1ed16: 2040 moveal %d0,%a0
1ed18: 2028 0008 movel %a0@(8),%d0
1ed1c: 0680 0001 0000 addil #65536,%d0
1ed22: 2140 0008 movel %d0,%a0@(8)
1ed26: 082a 0001 000f btst #1,%a2@(15)
1ed2c: 670c beqs 1ed3a <irq_enter_rcu+0x30>
1ed2e: 2028 0008 movel %a0@(8),%d0
1ed32: 2028 0008 movel %a0@(8),%d0
1ed36: 2028 0008 movel %a0@(8),%d0
1ed3a: 4e5e unlk %fp
1ed3c: 4e75 rts
This patch doesn't prevent the pointless btst and beqs instructions
above, but it does eliminate 2 of the 3 pointless move instructions
here and elsewhere.
On x86, preempt_count is per-cpu data and the problem does not arise
presumably because the compiler is free to optimize more effectively.
This patch was tested on m68k and x86. I was expecting no changes
to object code for x86 and mostly that's what I saw. However, there
were a few places where code generation was perturbed for some reason.
The performance issue addressed here is minor on uniprocessor m68k. I
got a 0.01% improvement from this patch for a simple "find /sys -false"
benchmark. For architectures and workloads susceptible to cache line bounce
the improvement is expected to be larger. The only SMP architecture I have
is x86, and as x86 unaffected I have not done any further measurements.
Fixes: 15115830c887 ("preempt: Cleanup the macro maze a bit")
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/0a403120a682a525e6db2d81d1a3ffcc137c3742.1694756831.git.fthain@linux-m68k.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d27abbfd4888d79dd24baf50e774631046ac4732 ]
These enums are passed to set/test_bit(). The set/test_bit() functions
take a bit number instead of a shifted value. Passing a shifted value
is a double shift bug like doing BIT(BIT(1)). The double shift bug
doesn't cause a problem here because we are only checking 0 and 1 but
if the value was 5 or above then it can lead to a buffer overflow.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 313ebe47d75558511aa1237b6e35c663b5c0ec6f ]
Currently, user array duplications are sometimes done without an
overflow check. Sometimes the checks are done manually; sometimes the
array size is calculated with array_size() and sometimes by calculating
n * size directly in code.
Introduce wrappers for arrays for memdup_user() and vmemdup_user() to
provide a standardized and safe way for duplicating user arrays.
This is both for new code as well as replacing usage of (v)memdup_user()
in existing code that uses, e.g., n * size to calculate array sizes.
Suggested-by: David Airlie <airlied@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920123612.16914-3-pstanner@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 49dbe25adac42d3e06f65d1420946bec65896222 ]
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 265f3ed077036f053981f5eea0b5b43e7c5b39ff ]
All callers of work_on_cpu() share the same lock class key for all the
functions queued. As a result the workqueue related locking scenario for
a function A may be spuriously accounted as an inversion against the
locking scenario of function B such as in the following model:
long A(void *arg)
{
mutex_lock(&mutex);
mutex_unlock(&mutex);
}
long B(void *arg)
{
}
void launchA(void)
{
work_on_cpu(0, A, NULL);
}
void launchB(void)
{
mutex_lock(&mutex);
work_on_cpu(1, B, NULL);
mutex_unlock(&mutex);
}
launchA and launchB running concurrently have no chance to deadlock.
However the above can be reported by lockdep as a possible locking
inversion because the works containing A() and B() are treated as
belonging to the same locking class.
The following shows an existing example of such a spurious lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted
------------------------------------------------------
kworker/0:1/9 is trying to acquire lock:
ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0
but task is already holding lock:
ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}:
__flush_work+0x83/0x4e0
work_on_cpu+0x97/0xc0
rcu_nocb_cpu_offload+0x62/0xb0
rcu_nocb_toggle+0xd0/0x1d0
kthread+0xe6/0x120
ret_from_fork+0x2f/0x40
ret_from_fork_asm+0x1b/0x30
-> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}:
__mutex_lock+0x81/0xc80
rcu_nocb_cpu_deoffload+0x38/0xb0
rcu_nocb_toggle+0x144/0x1d0
kthread+0xe6/0x120
ret_from_fork+0x2f/0x40
ret_from_fork_asm+0x1b/0x30
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
__lock_acquire+0x1538/0x2500
lock_acquire+0xbf/0x2a0
percpu_down_write+0x31/0x200
_cpu_down+0x57/0x2b0
__cpu_down_maps_locked+0x10/0x20
work_for_cpu_fn+0x15/0x20
process_scheduled_works+0x2a7/0x500
worker_thread+0x173/0x330
kthread+0xe6/0x120
ret_from_fork+0x2f/0x40
ret_from_fork_asm+0x1b/0x30
other info that might help us debug this:
Chain exists of:
cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&wfc.work));
lock(rcu_state.barrier_mutex);
lock((work_completion)(&wfc.work));
lock(cpu_hotplug_lock);
*** DEADLOCK ***
2 locks held by kworker/0:1/9:
#0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500
#1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500
stack backtrace:
CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Workqueue: events work_for_cpu_fn
Call Trace:
rcu-torture: rcu_torture_read_exit: Start of episode
<TASK>
dump_stack_lvl+0x4a/0x80
check_noncircular+0x132/0x150
__lock_acquire+0x1538/0x2500
lock_acquire+0xbf/0x2a0
? _cpu_down+0x57/0x2b0
percpu_down_write+0x31/0x200
? _cpu_down+0x57/0x2b0
_cpu_down+0x57/0x2b0
__cpu_down_maps_locked+0x10/0x20
work_for_cpu_fn+0x15/0x20
process_scheduled_works+0x2a7/0x500
worker_thread+0x173/0x330
? __pfx_worker_thread+0x10/0x10
kthread+0xe6/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK
Fix this with providing one lock class key per work_on_cpu() caller.
Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9492261ff2460252cf2d8de89cdf854c7e2b28a0 ]
When we started spreading new inode numbers throughout most of the 64
bit inode space, that triggered some corner case bugs, in particular
some integer overflows related to the radix tree code. Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 71dc9ec9ac7d3eee785cdc986c3daeb821381e20 ]
This commit changes virtio/vsock to use sk_buff instead of
virtio_vsock_pkt. Beyond better conforming to other net code, using
sk_buff allows vsock to use sk_buff-dependent features in the future
(such as sockmap) and improves throughput.
This patch introduces the following performance changes:
Tool: Uperf
Env: Phys Host + L1 Guest
Payload: 64k
Threads: 16
Test Runs: 10
Type: SOCK_STREAM
Before: commit b7bfaa761d760 ("Linux 6.2-rc3")
Before
------
g2h: 16.77Gb/s
h2g: 10.56Gb/s
After
-----
g2h: 21.04Gb/s
h2g: 10.76Gb/s
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 3a5cc90a4d17 ("vsock/virtio: remove socket from connected/bound list on shutdown")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e8ae8ad479e2d037daa33756e5e72850a7bd37a9 ]
The comment for idr_for_each_entry_ul() states
after normal termination @entry is left with the value NULL
This is not correct in the case where UINT_MAX has an entry in the idr.
In that case @entry will be non-NULL after termination.
No current code depends on the documentation being correct, but to
save future code we should fix it.
Also fix idr_for_each_entry_continue_ul(). While this is not documented
as leaving @entry as NULL, the mellanox driver appears to depend on
it doing so. So make that explicit in the documentation as well as in
the code.
Fixes: e33d2b74d805 ("idr: fix overflow case for idr_for_each_entry_ul()")
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mi <chrism@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 247f9e1feef4e57911510c8f82348efb4491ea0e ]
This is the equivalent of find_get_pages_range_tag(), except for folios
instead of pages.
One noteable difference is filemap_get_folios_tag() does not take in a
maximum pages argument. It instead tries to fill a folio batch and stops
either once full (15 folios) or reaching the end of the search range.
The new function supports large folios, the initial function did not since
all callers don't use large folios.
Link: https://lkml.kernel.org/r/20230104211448.4804-3-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: c5d3f9b7649a ("f2fs: compress: fix deadloop in f2fs_write_cache_pages()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3c70342f1f0045dc827bb2f02d814ce31e0e0d05 ]
Enable dynamically filling in the whole mfd_cell structure. All other
fields already allow that.
Fixes: 466a62d7642f ("mfd: core: Make a best effort attempt to match devices with the correct of_nodes")
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Link: https://lore.kernel.org/r/b73fe4bc4bd6ba1af90940a640ed65fe254c0408.1693253717.git.mirq-linux@rere.qmqm.pl
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5831fc1fd4a578232fea708b82de0c666ed17153 ]
If the queue isolation feature is enabled, the number of queues
supported by the device changes. When PF is enabled using the
current default number of queues, the default number of queues may
be greater than the number supported by the device. As a result,
the PF fails to be bound to the driver.
After modification, if queue isolation feature is enabled, when
the default queue parameter is greater than the number supported
by the device, the number of enabled queues will be changed to
the number supported by the device, so that the PF and driver
can be properly bound.
Fixes: 8bbecfb402f7 ("crypto: hisilicon/qm - add queue isolation support for Kunpeng930")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b40b62ed7b0ffe8eb2e6fe8bcfb47027c9a93e93 ]
The last register logic and different register logic are combined.
Use "u32" instead of 'int' in the regs function input parameter to
simplify some checks.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stable-dep-of: 5831fc1fd4a5 ("crypto: hisilicon/qm - fix PF queue parameter issue")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b58a36008bfa1aadf55f516bcbfae40c779eb54b ]
The last RCU stall fix caused a massive throughput regression of the
hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough
and usleep_range doesn't allow scheduling. So try to restore the
best possible throughput by introducing hwrng_yield which interruptable
sleeps for one jiffy.
Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig):
sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000
cpu_relax ~138025 Bytes / sec
hwrng_msleep(1000) ~13 Bytes / sec
hwrng_yield ~2510 Bytes / sec
Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()")
Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0a596b0682a7ce37e26c36629816f105c6459d06 ]
Add inclusion of linux/errno.h as otherwise the reference to EINVAL
may be invalid.
Fixes: f3cf4134c5c6 ("bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308261414.HKw1Mrip-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 20f3b8eafe0ba5d3c69d5011a9b07739e9645132 ]
When KPTI is in use, we cannot register a runstate region as XEN
requires that this is always a valid VA, which we cannot guarantee. Due
to this, xen_starting_cpu() must avoid registering each CPU's runstate
region, and xen_guest_init() must avoid setting up features that depend
upon it.
We tried to ensure that in commit:
f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled")
... where we added checks for xen_kernel_unmapped_at_usr(), which wraps
arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit
arm.
Unfortunately, as xen_guest_init() is an early_initcall, this happens
before secondary CPUs are booted and arm64 has finalized the
ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs
arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as
secondary CPUs are onlined. On a big.LITTLE system where the boot CPU
does not require KPTI but some secondary CPUs do, this will result in
xen_guest_init() intializing features that depend on the runstate
region, and xen_starting_cpu() registering the runstate region on some
CPUs before KPTI is subsequent enabled, resulting the the problems the
aforementioned commit tried to avoid.
Handle this more robsutly by deferring the initialization of the
runstate region until secondary CPUs have been initialized and the
ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is
moved into a new hotplug starting function which is registered later
when we're certain that KPTI will not be used.
Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Bertrand Marquis <bertrand.marquis@arm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 84aefafe6b294041b7fa0757414c4a29c1bdeea2 ]
Fix spelling of "Structure".
Fix multiple kernel-doc warnings:
clk-provider.h:269: warning: Function parameter or member 'recalc_rate' not described in 'clk_ops'
clk-provider.h:468: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_with_accuracy_parent_data'
clk-provider.h:468: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_with_accuracy_parent_data'
clk-provider.h:482: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_parent_accuracy'
clk-provider.h:482: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_parent_accuracy'
clk-provider.h:687: warning: Function parameter or member 'flags' not described in 'clk_divider'
clk-provider.h:1164: warning: Function parameter or member 'flags' not described in 'clk_fractional_divider'
clk-provider.h:1164: warning: Function parameter or member 'approximation' not described in 'clk_fractional_divider'
clk-provider.h:1213: warning: Function parameter or member 'flags' not described in 'clk_multiplier'
Fixes: 9fba738a53dd ("clk: add duty cycle support")
Fixes: b2476490ef11 ("clk: introduce the common clock framework")
Fixes: 2d34f09e79c9 ("clk: fixed-rate: Add support for specifying parents via DT/pointers")
Fixes: f5290d8e4f0c ("clk: asm9260: use parent index to link the reference clock")
Fixes: 9d9f78ed9af0 ("clk: basic clock hardware types")
Fixes: e2d0e90fae82 ("clk: new basic clk type for fractional divider")
Fixes: f2e0a53271a4 ("clk: Add a basic multiplier clock")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: linux-clk@vger.kernel.org
Link: https://lore.kernel.org/r/20230930221428.18463-1-rdunlap@infradead.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0b068c714ca9479d2783cc333fff5bc2d4a6d45c ]
Companion of DEV_STATS_INC() & DEV_STATS_ADD().
This is going to be used in the series.
Use it in macsec_get_stats64().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: ff672b9ffeb3 ("ipvlan: properly track tx_errors")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0e108725f6cc5b3be9e607f89c9fbcbb236367b7 ]
Arnd noticed we have a case where a shorter source string is being copied
into a destination byte array, but this results in a strnlen() call that
exceeds the size of the source. This is seen with -Wstringop-overread:
In file included from ../include/linux/uuid.h:11,
from ../include/linux/mod_devicetable.h:14,
from ../include/linux/cpufeature.h:12,
from ../arch/x86/coco/tdx/tdx.c:7:
../arch/x86/coco/tdx/tdx.c: In function 'tdx_panic.constprop':
../include/linux/string.h:284:9: error: 'strnlen' specified bound 64 exceeds source size 60 [-Werror=stringop-overread]
284 | memcpy_and_pad(dest, _dest_len, src, strnlen(src, _dest_len), pad); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../arch/x86/coco/tdx/tdx.c:124:9: note: in expansion of macro 'strtomem_pad'
124 | strtomem_pad(message.str, msg, '\0');
| ^~~~~~~~~~~~
Use the smaller of the two buffer sizes when calling strnlen(). When
src length is unknown (SIZE_MAX), it is adjusted to use dest length,
which is what the original code did.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: dfbafa70bde2 ("string: Introduce strtomem() and strtomem_pad()")
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8f012db27c9516be1a7aca93ea4a6ca9c75056c9 ]
numa_fill_memblks() fills in the gaps in numa_meminfo memblks
over an physical address range.
The ACPI driver will use numa_fill_memblks() to implement a new Linux
policy that prescribes extending proximity domains in a portion of a
CFMWS window to the entire window.
Dan Williams offered this explanation of the policy:
A CFWMS is an ACPI data structure that indicates *potential* locations
where CXL memory can be placed. It is the playground where the CXL
driver has free reign to establish regions. That space can be populated
by BIOS created regions, or driver created regions, after hotplug or
other reconfiguration.
When BIOS creates a region in a CXL Window it additionally describes
that subset of the Window range in the other typical ACPI tables SRAT,
SLIT, and HMAT. The rationale for BIOS not pre-describing the entire
CXL Window in SRAT, SLIT, and HMAT is that it can not predict the
future. I.e. there is nothing stopping higher or lower performance
devices being placed in the same Window. Compare that to ACPI memory
hotplug that just onlines additional capacity in the proximity domain
with little freedom for dynamic performance differentiation.
That leaves the OS with a choice, should unpopulated window capacity
match the proximity domain of an existing region, or should it allocate
a new one? This patch takes the simple position of minimizing proximity
domain proliferation by reusing any proximity domain intersection for
the entire Window. If the Window has no intersections then allocate a
new proximity domain. Note that SRAT, SLIT and HMAT information can be
enumerated dynamically in a standard way from device provided data.
Think of CXL as the end of ACPI needing to describe memory attributes,
CXL offers a standard discovery model for performance attributes, but
Linux still needs to interoperate with the old regime.
Reported-by: Derick Marks <derick.w.marks@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Derick Marks <derick.w.marks@intel.com>
Link: https://lore.kernel.org/all/ef078a6f056ca974e5af85997013c0fda9e3326d.1689018477.git.alison.schofield%40intel.com
Stable-dep-of: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7e6f3b6d2c352b5fde37ce3fed83bdf6172eebd4 upstream.
The AMD VanGogh SoC contains a DesignWare USB3 Dual-Role Device that can be
operated as either a USB Host or a USB Device, similar to on the AMD Nolan
platform.
be6646bfbaec ("PCI: Prevent xHCI driver from claiming AMD Nolan USB3 DRD
device") added a quirk to let the dwc3 driver claim the Nolan device since
it provides more specific support.
Extend that quirk to include the VanGogh SoC USB3 device.
Link: https://lore.kernel.org/r/20230927202212.2388216-1-vi@endrift.com
Signed-off-by: Vicki Pfau <vi@endrift.com>
[bhelgaas: include be6646bfbaec reference, add stable tag]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bbaa6ffa5b6c9609d3b3c431c389b407eea5441f ]
AMD PMF driver can cause the following warning:
[ 196.159546] ------------[ cut here ]------------
[ 196.159556] Voluntary context switch within RCU read-side critical section!
[ 196.159571] WARNING: CPU: 0 PID: 9 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x43d/0x560
[ 196.159604] Modules linked in: nvme_fabrics ccm rfcomm snd_hda_scodec_cs35l41_spi cmac algif_hash algif_skcipher af_alg bnep joydev btusb btrtl uvcvideo btintel btbcm videobuf2_vmalloc intel_rapl_msr btmtk videobuf2_memops uvc videobuf2_v4l2 intel_rapl_common binfmt_misc hid_sensor_als snd_sof_amd_vangogh hid_sensor_trigger bluetooth industrialio_triggered_buffer videodev snd_sof_amd_rembrandt hid_sensor_iio_common amdgpu ecdh_generic kfifo_buf videobuf2_common hp_wmi kvm_amd sparse_keymap snd_sof_amd_renoir wmi_bmof industrialio ecc mc nls_iso8859_1 kvm snd_sof_amd_acp irqbypass snd_sof_xtensa_dsp crct10dif_pclmul crc32_pclmul mt7921e snd_sof_pci snd_ctl_led polyval_clmulni mt7921_common polyval_generic snd_sof ghash_clmulni_intel mt792x_lib mt76_connac_lib sha512_ssse3 snd_sof_utils aesni_intel snd_hda_codec_realtek crypto_simd mt76 snd_hda_codec_generic cryptd snd_soc_core snd_hda_codec_hdmi rapl ledtrig_audio input_leds snd_compress i2c_algo_bit drm_ttm_helper mac80211 snd_pci_ps hid_multitouch ttm drm_exec
[ 196.159970] drm_suballoc_helper snd_rpl_pci_acp6x amdxcp drm_buddy snd_hda_intel snd_acp_pci snd_hda_scodec_cs35l41_i2c serio_raw gpu_sched snd_hda_scodec_cs35l41 snd_acp_legacy_common snd_intel_dspcfg snd_hda_cs_dsp_ctls snd_hda_codec libarc4 drm_display_helper snd_pci_acp6x cs_dsp snd_hwdep snd_soc_cs35l41_lib video k10temp snd_pci_acp5x thunderbolt snd_hda_core drm_kms_helper cfg80211 snd_seq snd_rn_pci_acp3x snd_pcm snd_acp_config cec snd_soc_acpi snd_seq_device rc_core ccp snd_pci_acp3x snd_timer snd soundcore wmi amd_pmf platform_profile amd_pmc mac_hid serial_multi_instantiate wireless_hotkey hid_sensor_hub sch_fq_codel msr parport_pc ppdev lp parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log cdc_ether usbnet r8152 mii hid_generic nvme i2c_hid_acpi i2c_hid nvme_core i2c_piix4 xhci_pci amd_sfh drm xhci_pci_renesas nvme_common hid
[ 196.160382] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1 #4
[ 196.160397] Hardware name: HP HP EliteBook 845 14 inch G10 Notebook PC/8B6E, BIOS V82 Ver. 01.02.00 08/24/2023
[ 196.160405] Workqueue: events power_supply_changed_work
[ 196.160426] RIP: 0010:rcu_note_context_switch+0x43d/0x560
[ 196.160440] Code: 00 48 89 be 40 08 00 00 48 89 86 48 08 00 00 48 89 10 e9 63 fe ff ff 48 c7 c7 10 e7 b0 9e c6 05 e8 d8 20 02 01 e8 13 0f f3 ff <0f> 0b e9 27 fc ff ff a9 ff ff ff 7f 0f 84 cf fc ff ff 65 48 8b 3c
[ 196.160450] RSP: 0018:ffffc900001878f0 EFLAGS: 00010046
[ 196.160462] RAX: 0000000000000000 RBX: ffff88885e834040 RCX: 0000000000000000
[ 196.160470] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 196.160476] RBP: ffffc90000187910 R08: 0000000000000000 R09: 0000000000000000
[ 196.160482] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 196.160488] R13: 0000000000000000 R14: ffff888100990000 R15: ffff888100990000
[ 196.160495] FS: 0000000000000000(0000) GS:ffff88885e800000(0000) knlGS:0000000000000000
[ 196.160504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 196.160512] CR2: 000055cb053c8246 CR3: 000000013443a000 CR4: 0000000000750ef0
[ 196.160520] PKRU: 55555554
[ 196.160526] Call Trace:
[ 196.160532] <TASK>
[ 196.160548] ? show_regs+0x72/0x90
[ 196.160570] ? rcu_note_context_switch+0x43d/0x560
[ 196.160580] ? __warn+0x8d/0x160
[ 196.160600] ? rcu_note_context_switch+0x43d/0x560
[ 196.160613] ? report_bug+0x1bb/0x1d0
[ 196.160637] ? handle_bug+0x46/0x90
[ 196.160658] ? exc_invalid_op+0x19/0x80
[ 196.160675] ? asm_exc_invalid_op+0x1b/0x20
[ 196.160709] ? rcu_note_context_switch+0x43d/0x560
[ 196.160727] __schedule+0xb9/0x15f0
[ 196.160746] ? srso_alias_return_thunk+0x5/0x7f
[ 196.160765] ? srso_alias_return_thunk+0x5/0x7f
[ 196.160778] ? acpi_ns_search_one_scope+0xbe/0x270
[ 196.160806] schedule+0x68/0x110
[ 196.160820] schedule_timeout+0x151/0x160
[ 196.160829] ? srso_alias_return_thunk+0x5/0x7f
[ 196.160842] ? srso_alias_return_thunk+0x5/0x7f
[ 196.160855] ? acpi_ns_lookup+0x3c5/0xa90
[ 196.160878] __down_common+0xff/0x220
[ 196.160905] __down_timeout+0x16/0x30
[ 196.160920] down_timeout+0x64/0x70
[ 196.160938] acpi_os_wait_semaphore+0x85/0x200
[ 196.160959] acpi_ut_acquire_mutex+0x9e/0x280
[ 196.160979] acpi_ex_enter_interpreter+0x2d/0xb0
[ 196.160992] acpi_ns_evaluate+0x2f0/0x5f0
[ 196.161005] acpi_evaluate_object+0x172/0x490
[ 196.161018] ? acpi_os_signal_semaphore+0x8a/0xd0
[ 196.161038] acpi_evaluate_integer+0x52/0xe0
[ 196.161055] ? kfree+0x79/0x120
[ 196.161071] ? srso_alias_return_thunk+0x5/0x7f
[ 196.161089] acpi_ac_get_state.part.0+0x27/0x80
[ 196.161110] get_ac_property+0x5c/0x70
[ 196.161127] ? __pfx___power_supply_is_system_supplied+0x10/0x10
[ 196.161146] __power_supply_is_system_supplied+0x44/0xb0
[ 196.161166] class_for_each_device+0x124/0x160
[ 196.161184] ? acpi_ac_get_state.part.0+0x27/0x80
[ 196.161203] ? srso_alias_return_thunk+0x5/0x7f
[ 196.161223] power_supply_is_system_supplied+0x3c/0x70
[ 196.161243] amd_pmf_get_power_source+0xe/0x20 [amd_pmf]
[ 196.161276] amd_pmf_power_slider_update_event+0x49/0x90 [amd_pmf]
[ 196.161310] amd_pmf_pwr_src_notify_call+0xe7/0x100 [amd_pmf]
[ 196.161340] notifier_call_chain+0x5f/0xe0
[ 196.161362] atomic_notifier_call_chain+0x33/0x60
[ 196.161378] power_supply_changed_work+0x84/0x110
[ 196.161394] process_one_work+0x178/0x360
[ 196.161412] ? __pfx_worker_thread+0x10/0x10
[ 196.161424] worker_thread+0x307/0x430
[ 196.161440] ? __pfx_worker_thread+0x10/0x10
[ 196.161451] kthread+0xf4/0x130
[ 196.161467] ? __pfx_kthread+0x10/0x10
[ 196.161486] ret_from_fork+0x43/0x70
[ 196.161502] ? __pfx_kthread+0x10/0x10
[ 196.161518] ret_from_fork_asm+0x1b/0x30
[ 196.161558] </TASK>
[ 196.161562] ---[ end trace 0000000000000000 ]---
Since there's no guarantee that all the callbacks can work in atomic
context, switch to use blocking_notifier_call_chain to relax the
constraint.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Allen Zhong <allen@atr.me>
Fixes: 4c71ae414474 ("platform/x86/amd/pmf: Add support SPS PMF feature")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230913033233.602986-1-kai.heng.feng@canonical.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit babddbfb7d7d70ae7f10fedd75a45d8ad75fdddf upstream.
when the checked address is illegal,the corresponding shadow address from
kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow
address causes kernel oops. Here is a sample about oops on arm64(VA
39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on:
[ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003,
pud=000000005d3ce003, pmd=0000000000000000
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43
Hardware name: linux,dummy-virt (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __hwasan_load8_noabort+0x5c/0x90
lr : do_ib_ob+0xf4/0x110
ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa.
The problem is reading invalid shadow in kasan_check_range.
The generic kasan also has similar oops.
It only reports the shadow address which causes oops but not
the original address.
Commit 2f004eea0fc8("x86/kasan: Print original address on #GP")
introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE.
This patch extends it to KASAN_OUTLINE mode.
Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com
Fixes: 2f004eea0fc8("x86/kasan: Print original address on #GP")
Signed-off-by: Haibo Li <haibo.li@mediatek.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Haibo Li <haibo.li@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bf4916922c60f43efaa329744b3eef539aa6a2b2 upstream.
Extend the locking scheme used to protect shared hugetlb mappings from
truncate vs page fault races, in order to protect private hugetlb mappings
(with resv_map) against MADV_DONTNEED.
Add a read-write semaphore to the resv_map data structure, and use that
from the hugetlb_vma_(un)lock_* functions, in preparation for closing the
race between MADV_DONTNEED and page faults.
Link: https://lkml.kernel.org/r/20231006040020.3677377-3-riel@surriel.com
Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4dc533e0f2c04174e1ae4aa98e7cffc1c04b9998 ]
Function kallsyms_on_each_symbol() traverses all symbols and submits each
symbol to the hook 'fn' for judgment and processing. For some cases, the
hook actually only handles the matched symbol, such as livepatch.
Because all symbols are currently sorted by name, all the symbols with the
same name are clustered together. Function kallsyms_lookup_names() gets
the start and end positions of the set corresponding to the specified
name. So we can easily and quickly traverse all the matches.
The test results are as follows (twice): (x86)
kallsyms_on_each_match_symbol: 7454, 7984
kallsyms_on_each_symbol : 11733809, 11785803
kallsyms_on_each_match_symbol() consumes only 0.066% of
kallsyms_on_each_symbol()'s time. In other words, 1523x better
performance.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Stable-dep-of: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 32671e3799ca2e4590773fd0e63aaa4229e50c06 upstream.
Because group consistency is non-atomic between parent (filedesc) and children
(inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum
non-matching counter groups -- with non-sensical results.
Add group_generation to distinguish the case where a parent group removes and
adds an event and thus has the same number, but a different configuration of
events as inherited groups.
This became a problem when commit fa8c269353d5 ("perf/core: Invert
perf_read_group() loops") flipped the order of child_list and sibling_list.
Previously it would iterate the group (sibling_list) first, and for each
sibling traverse the child_list. In this order, only the group composition of
the parent is relevant. By flipping the order the group composition of the
child (inherited) events becomes an issue and the mis-match in group
composition becomes evident.
That said; even prior to this commit, while reading of a group that is not
equally inherited was not broken, it still made no sense.
(Ab)use ECHILD as error return to indicate issues with child process group
composition.
Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops")
Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a608dc1c06397dc50ab773498433432fb5938f92 upstream.
HID descriptors with Battery System (0x85) Charging (0x44) usage are
ignored and POWER_SUPPLY_STATUS_DISCHARGING is always reported to user
space, even when the device is charging.
Map this usage and when it is reported set the right charging status.
In addition, add KUnit tests to make sure that the charging status is
correctly set and reported. They can be run with the usual command:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=drivers/hid
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 59a7a298565aa0ce44ce8e4fbcbb89a19730013a ]
Add nr_maxactive to specify rethook_node pool size. This means
the maximum number of actively running target functions concurrently
for probing by exit_handler. Note that if the running function is
preempted or sleep, it is still counted as 'active'.
Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 700b2b439766 ("fprobe: Fix to ensure the number of active retprobes is not zero")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 76d0de5729c0569c4071e7f21fcab394e502f03a ]
Pass the private entry_data to the entry and exit handlers so that
they can share the context data, something like saved function
arguments etc.
User must specify the private entry_data size by @entry_data_size
field before registering the fprobe.
Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 700b2b439766 ("fprobe: Fix to ensure the number of active retprobes is not zero")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0a8565425afd8ba0e1a0ea73e21da119ee6dacea ]
These APIs are analogous to iio_device_claim_direct_mode() and
iio_device_release_direct_mode() but, as the name suggests, with the
logic flipped. While this looks odd enough, it will have at least two
users (in following changes) and it will be important to move the IIO
mlock to the private struct.
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20221012151620.1725215-2-nuno.sa@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Stable-dep-of: 7771c8c80d62 ("iio: cros_ec: fix an use-after-free in cros_ec_sensors_push_data()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 03adc61edad49e1bbecfb53f7ea5d78f398fe368 upstream.
An io_uring openat operation can update an audit reference count
from multiple threads resulting in the call trace below.
A call to io_uring_submit() with a single openat op with a flag of
IOSQE_ASYNC results in the following reference count updates.
These first part of the system call performs two increments that do not race.
do_syscall_64()
__do_sys_io_uring_enter()
io_submit_sqes()
io_openat_prep()
__io_openat_prep()
getname()
getname_flags() /* update 1 (increment) */
__audit_getname() /* update 2 (increment) */
The openat op is queued to an io_uring worker thread which starts the
opportunity for a race. The system call exit performs one decrement.
do_syscall_64()
syscall_exit_to_user_mode()
syscall_exit_to_user_mode_prepare()
__audit_syscall_exit()
audit_reset_context()
putname() /* update 3 (decrement) */
The io_uring worker thread performs one increment and two decrements.
These updates can race with the system call decrement.
io_wqe_worker()
io_worker_handle_work()
io_wq_submit_work()
io_issue_sqe()
io_openat()
io_openat2()
do_filp_open()
path_openat()
__audit_inode() /* update 4 (increment) */
putname() /* update 5 (decrement) */
__audit_uring_exit()
audit_reset_context()
putname() /* update 6 (decrement) */
The fix is to change the refcnt member of struct audit_names
from int to atomic_t.
kernel BUG at fs/namei.c:262!
Call Trace:
...
? putname+0x68/0x70
audit_reset_context.part.0.constprop.0+0xe1/0x300
__audit_uring_exit+0xda/0x1c0
io_issue_sqe+0x1f3/0x450
? lock_timer_base+0x3b/0xd0
io_wq_submit_work+0x8d/0x2b0
? __try_to_del_timer_sync+0x67/0xa0
io_worker_handle_work+0x17c/0x2b0
io_wqe_worker+0x10a/0x350
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Dan Clash <daclash@linux.microsoft.com>
Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b83ce9cb4a465b8f9a3fa45561b721a9551f60e3 upstream.
When a fence signals there is a very small race window where the timestamp
isn't updated yet. sync_file solves this by busy waiting for the
timestamp to appear, but on other ocassions didn't handled this
correctly.
Provide a dma_fence_timestamp() helper function for this and use it in
all appropriate cases.
Another alternative would be to grab the spinlock when that happens.
v2 by teddy: add a wait parameter to wait for the timestamp to show up, in case
the accurate timestamp is needed and/or the timestamp is not based on
ktime (e.g. hw timestamp)
v3 chk: drop the parameter again for unified handling
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 1774baa64f93 ("drm/scheduler: Change scheduled fence track v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230929104725.2358-1-christian.koenig@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0f28ada1fbf0054557cddcdb93ad17f767105208 upstream.
When calling mcb_bus_add_devices(), both mcb devices and the mcb
bus will attempt to attach a device to a driver because they share
the same bus_type. This causes an issue when trying to cast the
container of the device to mcb_device struct using to_mcb_device(),
leading to a wrong cast when the mcb_bus is added. A crash occurs
when freing the ida resources as the bus numbering of mcb_bus gets
confused with the is_added flag on the mcb_device struct.
The only reason for this cast was to keep an is_added flag on the
mcb_device struct that does not seem necessary. The function
device_attach() handles already bound devices and the mcb subsystem
does nothing special with this is_added flag so remove it completely.
Fixes: 18d288198099 ("mcb: Correctly initialize the bus's device")
Cc: stable <stable@kernel.org>
Signed-off-by: Jorge Sanjuan Garcia <jorge.sanjuangarcia@duagon.com>
Co-developed-by: Jose Javier Rodriguez Barbarin <JoseJavier.Rodriguez@duagon.com>
Signed-off-by: Jose Javier Rodriguez Barbarin <JoseJavier.Rodriguez@duagon.com>
Link: https://lore.kernel.org/r/20230906114901.63174-2-JoseJavier.Rodriguez@duagon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit aa3998dbeb3abce63653b7f6d4542e7dcd022590 upstream.
The introduction of a device link to create a consumer/supplier
relationship between the scsi device of an ATA device and the ATA port
of that ATA device fixes the ordering of system suspend and resume
operations. For suspend, the scsi device is suspended first and the ata
port after it. This is fine as this allows the synchronize cache and
START STOP UNIT commands issued by the scsi disk driver to be executed
before the ata port is disabled.
For resume operations, the ata port is resumed first, followed
by the scsi device. This allows having the request queue of the scsi
device to be unfrozen after the ata port resume is scheduled in EH,
thus avoiding to see new requests prematurely issued to the ATA device.
Since libata sets manage_system_start_stop to 1, the scsi disk resume
operation also results in issuing a START STOP UNIT command to the
device being resumed so that the device exits standby power mode.
However, restoring the ATA device to the active power mode must be
synchronized with libata EH processing of the port resume operation to
avoid either 1) seeing the start stop unit command being received too
early when the port is not yet resumed and ready to accept commands, or
after the port resume process issues commands such as IDENTIFY to
revalidate the device. In this last case, the risk is that the device
revalidation fails with timeout errors as the drive is still spun down.
Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume")
disabled issuing the START STOP UNIT command to avoid issues with it.
But this is incorrect as transitioning a device to the active power
mode from the standby power mode set on suspend requires a media access
command. The IDENTIFY, READ LOG and SET FEATURES commands executed in
libata EH context triggered by the ata port resume operation may thus
fail.
Fix these synchronization issues is by handling a device power mode
transitions for system suspend and resume directly in libata EH context,
without relying on the scsi disk driver management triggered with the
manage_system_start_stop flag.
To do this, the following libata helper functions are introduced:
1) ata_dev_power_set_standby():
This function issues a STANDBY IMMEDIATE command to transitiom a device
to the standby power mode. For HDDs, this spins down the disks. This
function applies only to ATA and ZAC devices and does nothing otherwise.
This function also does nothing for devices that have the
ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
set.
For suspend, call ata_dev_power_set_standby() in
ata_eh_handle_port_suspend() before the port is disabled and frozen.
ata_eh_unload() is also modified to transition all enabled devices to
the standby power mode when the system is shutdown or devices removed.
2) ata_dev_power_set_active() and
This function applies to ATA or ZAC devices and issues a VERIFY command
for 1 sector at LBA 0 to transition the device to the active power mode.
For HDDs, since this function will complete only once the disk spin up.
Its execution uses the same timeouts as for reset, to give the drive
enough time to complete spinup without triggering a command timeout.
For resume, call ata_dev_power_set_active() in
ata_eh_revalidate_and_attach() after the port has been enabled and
before any other command is issued to the device.
With these changes, the manage_system_start_stop and no_start_on_resume
scsi device flags do not need to be set in ata_scsi_dev_config(). The
flag manage_runtime_start_stop is still set to allow the sd driver to
spinup/spindown a disk through the sd runtime operations.
Fixes: 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 869b6ea1609f655a43251bf41757aa44e5350a8f upstream.
Eric has reported that commit dabc8b207566 ("quota: fix dqput() to
follow the guarantees dquot_srcu should provide") heavily increases
runtime of generic/270 xfstest for ext4 in nojournal mode. The reason
for this is that ext4 in nojournal mode leaves dquots dirty until the last
dqput() and thus the cleanup done in quota_release_workfn() has to write
them all. Due to the way quota_release_workfn() is written this results
in synchronize_srcu() call for each dirty dquot which makes the dquot
cleanup when turning quotas off extremely slow.
To be able to avoid synchronize_srcu() for each dirty dquot we need to
rework how we track dquots to be cleaned up. Instead of keeping the last
dquot reference while it is on releasing_dquots list, we drop it right
away and mark the dquot with new DQ_RELEASING_B bit instead. This way we
can we can remove dquot from releasing_dquots list when new reference to
it is acquired and thus there's no need to call synchronize_srcu() each
time we drop dq_list_lock.
References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64
Reported-by: Eric Whitney <enwlinux@gmail.com>
Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit d4729af1c73cfacb64facda3d196e25940f0e7a5 which is
commit b193a78ddb5ee7dba074d3f28dc050069ba083c0 upstream.
There are reported NFS problems in the 6.1.56 release, so revert a set
of NFS patches to hopefully resolve the issue.
Reported-by: poester <poester@internetbrands.com>
Link: https://lore.kernel.org/r/20231012165439.137237-2-kernel@linuxace.com
Reported-by: Daniel Díaz <daniel.diaz@linaro.org>
Link: https://lore.kernel.org/r/2023100755-livestock-barcode-fe41@gregkh
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8e56b063c86569e51eed1c5681ce6361fa97fc7a ]
In Scenario A and B below, as the delayed INIT_ACK always changes the peer
vtag, SCTP ct with the incorrect vtag may cause packet loss.
Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK
192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772]
192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151]
192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772]
192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] *
192.168.1.2 > 192.168.1.1: [COOKIE ECHO]
192.168.1.1 > 192.168.1.2: [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: [COOKIE ACK]
Scenario B: INIT_ACK is delayed until the peer completes its own handshake
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] *
This patch fixes it as below:
In SCTP_CID_INIT processing:
- clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] &&
ct->proto.sctp.init[!dir]. (Scenario E)
- set ct->proto.sctp.init[dir].
In SCTP_CID_INIT_ACK processing:
- drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] &&
ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C)
- drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A)
In SCTP_CID_COOKIE_ACK processing:
- clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir].
(Scenario D)
Also, it's important to allow the ct state to move forward with cookie_echo
and cookie_ack from the opposite dir for the collision scenarios.
There are also other Scenarios where it should allow the packet through,
addressed by the processing above:
Scenario C: new CT is created by INIT_ACK.
Scenario D: start INIT on the existing ESTABLISHED ct.
Scenario E: start INIT after the old collision on the existing ESTABLISHED
ct.
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
(both side are stopped, then start new connection again in hours)
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742]
Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|