| Age | Commit message (Collapse) | Author |
|
[ Upstream commit db6c4dee4c104f50ed163af71c53bfdb878a8318 ]
Add SolidRun vendor ID to pci_ids.h
The vendor ID is used in 2 different source files, the SNET vDPA driver
and PCI quirks.
Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e16cab9c1596e251761d2bfb5e1467950d616963 ]
The color matching descriptors defined in the UVC Specification
contain 3 fields with discrete numeric values representing particular
settings. Enumerate those values so that later code setting them can
be more readable.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
Link: https://lore.kernel.org/r/20230202114142.300858-2-dan.scally@ideasonboard.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 4971c268b85e1c7a734a61622fc0813c86e2362e upstream.
Commit 98de59bfe4b2f ("take calculation of final prot in
security_mmap_file() into a helper") moved the code to update prot, to be
the actual protections applied to the kernel, to a new helper called
mmap_prot().
However, while without the helper ima_file_mmap() was getting the updated
prot, with the helper ima_file_mmap() gets the original prot, which
contains the protections requested by the application.
A possible consequence of this change is that, if an application calls
mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition,
that application would have access to executable memory without having this
event recorded in the IMA measurement list. This situation would occur for
example if the application, before mmap(), calls the personality() system
call with READ_IMPLIES_EXEC as the first argument.
Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so
that IMA can receive both the requested prot and the final prot. Since the
requested protections are stored in a new variable, and the final
protections are stored in the existing variable, this effectively restores
the original behavior of the MMAP_CHECK hook.
Cc: stable@vger.kernel.org
Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6 upstream.
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.
arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.
For example, assume that register two kprobes whose addresses are
<func+9> and <func+11> in "func" function.
The original code of "func" function is as follows:
0xffffffff816cb5e9 <+9>: push %r12
0xffffffff816cb5eb <+11>: xor %r12d,%r12d
0xffffffff816cb5ee <+14>: test %rdi,%rdi
0xffffffff816cb5f1 <+17>: setne %r12b
0xffffffff816cb5f5 <+21>: push %rbp
1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1.
After the optimization, "func" code changes to:
0xffffffff816cc079 <+9>: push %r12
0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000
0xffffffff816cc080 <+16>: incl 0xf(%rcx)
0xffffffff816cc083 <+19>: xchg %eax,%ebp
0xffffffff816cc084 <+20>: (bad)
0xffffffff816cc085 <+21>: push %rbp
Now op1->flags == KPROBE_FLAG_OPTIMATED;
2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2.
register_kprobe(kp2)
register_aggr_kprobe
alloc_aggr_kprobe
__prepare_optimized_kprobe
arch_prepare_optimized_kprobe
__recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn,
// jump address = <func+14>
3. disable kp1:
disable_kprobe(kp1)
__disable_kprobe
...
if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized
orig_p->flags |= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
...
4. unregister kp2
__unregister_kprobe_top
...
if (!kprobe_disabled(ap) && !kprobes_all_disarmed) {
optimize_kprobe(op)
...
if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
return;
p->kp.flags |= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED
}
"func" code now is:
0xffffffff816cc079 <+9>: int3
0xffffffff816cc07a <+10>: push %rsp
0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000
0xffffffff816cc080 <+16>: incl 0xf(%rcx)
0xffffffff816cc083 <+19>: xchg %eax,%ebp
0xffffffff816cc084 <+20>: (bad)
0xffffffff816cc085 <+21>: push %rbp
5. if call "func", int3 handler call setup_detour_execution:
if (p->flags & KPROBE_FLAG_OPTIMIZED) {
...
regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
...
}
The code for the destination address is
0xffffffffa021072c: push %r12
0xffffffffa021072e: xor %r12d,%r12d
0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14>
However, <func+14> is not a valid start instruction address. As a result, an error occurs.
Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/
Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 868a6fc0ca2407622d2833adefe1c4d284766c4c upstream.
Since the following commit:
commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp->flags
has KPROBE_FLAG_OPTIMIZED and op->list is not empty.
The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.
The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.
Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/
Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c9d27c6be518b4ef2966d9564654ef99292ea1b3 ]
The MIPI DCS specification demands that brightness values are sent in
big endian byte order. It also states that one parameter (i.e. one byte)
shall be sent/received for 8 bit wide values, and two parameters shall
be used for values that are between 9 and 16 bits wide.
Add new functions to properly handle 16-bit brightness in big endian,
since the two 8- and 16-bit cases are distinct from each other.
[richard: use separate functions instead of switch/case]
[richard: split into 16-bit component]
Fixes: 1a9d759331b8 ("drm/dsi: Implement DCS set/get display brightness")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Link: https://android.googlesource.com/kernel/msm/+/754affd62d0ee268c686c53169b1dbb7deac8550
[richard: fix 16-bit brightness_get]
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Tested-by: Caleb Connolly <caleb@connolly.tech>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230116224909.23884-2-mailingradian@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 47904aed898a08f028572b9b5a5cc101ddfb2d82 ]
The type of member ->irqs_sum is unsigned long, but kstat_cpu_irqs_sum()
returns int, which can result in truncation. Therefore, change the
kstat_cpu_irqs_sum() function's return value to unsigned long to avoid
truncation.
Fixes: f2c66cd8eedd ("/proc/stat: scalability of irq num per cpu")
Reported-by: Elliott, Robert (Servers) <elliott@hpe.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Josh Don <joshdon@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Commit e88b2c6e5a4d9ce30d75391e4d950da74bb2bd90 upstream.
While reviewing a different fix, John and I noticed an oddity in one of the
BPF program dumps that stood out, for example:
# bpftool p d x i 13
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
[...]
In line 2 we noticed that the mov32 would 32 bit truncate the original src
register for the div/mod operation. While for the two operations the dst
register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
the src register is not, and thus verifier keeps tracking original bounds,
simplified:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = -1
1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=invP-1 R1_w=invP-1 R10=fp0
2: (3c) w0 /= w1
3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
3: (77) r1 >>= 32
4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
4: (bf) r0 = r1
5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
5: (95) exit
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Runtime result of r0 at exit is 0 instead of expected -1. Remove the
verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
instead. After the fix, we result in the following code generation when
having dividend r1 and divisor r6:
div, 64 bit: div, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2
3: (ac) w1 ^= w1 3: (ac) w1 ^= w1
4: (05) goto pc+1 4: (05) goto pc+1
5: (3f) r1 /= r6 5: (3c) w1 /= w6
6: (b7) r0 = 0 6: (b7) r0 = 0
7: (95) exit 7: (95) exit
mod, 64 bit: mod, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1
3: (9f) r1 %= r6 3: (9c) w1 %= w6
4: (b7) r0 = 0 4: (b7) r0 = 0
5: (95) exit 5: (95) exit
x86 in particular can throw a 'divide error' exception for div
instruction not only for divisor being zero, but also for the case
when the quotient is too large for the designated register. For the
edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
since we always zero edx (rdx). Hence really the only protection
needed is against divisor being zero.
[Salvatore Bonaccorso: This is an earlier version of the patch provided
by Daniel Borkmann which does not rely on availability of the BPF_JMP32
instruction class. This means it is not even strictly a backport of the
upstream commit mentioned but based on Daniel's and John's work to
address the issue.]
Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero")
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Edward Liaw <edliaw@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream.
The results of "access_ok()" can be mis-speculated. The result is that
you can end speculatively:
if (access_ok(from, size))
// Right here
even for bad from/size combinations. On first glance, it would be ideal
to just add a speculation barrier to "access_ok()" so that its results
can never be mis-speculated.
But there are lots of system calls just doing access_ok() via
"copy_to_user()" and friends (example: fstat() and friends). Those are
generally not problematic because they do not _consume_ data from
userspace other than the pointer. They are also very quick and common
system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and
is frequently followed up with code that might affect caches. Take
something like this:
if (!copy_from_user(&kernelvar, uptr, size))
do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel
addresses, and then do_something_with() has cache (or other)
side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent
mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec().
This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the
BPF code can also go away.
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d7bf7f3b813e3755226bcb5114ad2ac477514ebf ]
add_latent_entropy() is called every time a process forks, in
kernel_clone(). This in turn calls add_device_randomness() using the
latent entropy global state. add_device_randomness() does two things:
2) Mixes into the input pool the latent entropy argument passed; and
1) Mixes in a cycle counter, a sort of measurement of when the event
took place, the high precision bits of which are presumably
difficult to predict.
(2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is
always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n
disables both (1) and (2), instead of just (2).
This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still
do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_
entropy() is called.
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit ca43ccf41224b023fc290073d5603a755fd12eed upstream.
Eric Dumazet pointed out [0] that when we call skb_set_owner_r()
for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called,
resulting in a negative sk_forward_alloc.
We add a new helper which clones a skb and sets its owner only
when sk_rmem_schedule() succeeds.
Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv()
because tcp_send_synack() can make sk_forward_alloc negative before
ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases.
[0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/
Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()")
Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ec4288fe63966b26d53907212ecd05dfa81dd2cc upstream.
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the flags
argument to encode the base-2 logarithm of the desired page size. The
routine hstate_sizelog() uses the log2 value to find the corresponding
hugetlb hstate structure. Converting the log2 value (page_size_log) to
potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be
greater than 63. This is fine on 64 bit architectures where a long is 64
bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the undefined
shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/
Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3489dbb696d25602aea8c3e669a6d43b76bd5358 upstream.
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".
This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1]. The following two patches address user visible behavior
caused by this issue.
[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
This patch (of 2):
A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD. This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to their
page table.
page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps. Pages referenced via a shared PMD were
incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
count the hugetlb page as shared. A new helper to check for a shared PMD
is added.
[akpm@linux-foundation.org: simplification, per David]
[akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()]
Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com
Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream.
Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll
their own warnings, and each check "panic_on_warn". Consolidate this
into a single function so that future instrumentation can be added in
a single location.
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Gow <davidgow@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream.
There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.
Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.
Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.
As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ddd9a808cee7284931312f2f3e854c9617f44b2 upstream.
Patch series "sysctl: first set of kernel/sysctl cleanups", v2.
Finally had time to respin the series of the work we had started last
year on cleaning up the kernel/sysct.c kitchen sink. People keeps
stuffing their sysctls in that file and this creates a maintenance
burden. So this effort is aimed at placing sysctls where they actually
belong.
I'm going to split patches up into series as there is quite a bit of
work.
This first set adds register_sysctl_init() for uses of registerting a
sysctl on the init path, adds const where missing to a few places,
generalizes common values so to be more easy to share, and starts the
move of a few kernel/sysctl.c out where they belong.
The majority of rework on v2 in this first patch set is 0-day fixes.
Eric Biederman's feedback is later addressed in subsequent patch sets.
I'll only post the first two patch sets for now. We can address the
rest once the first two patch sets get completely reviewed / Acked.
This patch (of 9):
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
Today though folks heavily rely on tables on kernel/sysctl.c so they can
easily just extend this table with their needed sysctls. In order to
help users move their sysctls out we need to provide a helper which can
be used during code initialization.
We special-case the initialization use of register_sysctl() since it
*is* safe to fail, given all that sysctls do is provide a dynamic
interface to query or modify at runtime an existing variable. So the
use case of register_sysctl() on init should *not* stop if the sysctls
don't end up getting registered. It would be counter productive to stop
boot if a simple sysctl registration failed.
Provide a helper for init then, and document the recommended init levels
to use for callers of this routine. We will later use this in
subsequent patches to start slimming down kernel/sysctl.c tables and
moving sysctl registration to the code which actually needs these
sysctls.
[mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ]
Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ]
Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.
When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.
We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel. The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.
PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs"
#0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
#1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
#2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
#3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
#4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
#5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
#6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall. And the process waiting for the msg
corresponding to service A will be woken up.
Actual scheduing of that process might be after rpc.gssd processes the
next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.
This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.
Signed-off-by: minoura makoto <minoura@valinux.co.jp>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ab0377803dafc58f1e22296708c1c28e309414d6 ]
The caller of del_timer_sync must prevent restarting of the timer, If
we have no this synchronization, there is a small probability that the
cancellation will not be successful.
And syzbot report the fellowing crash:
==================================================================
BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:929 [inline]
BUG: KASAN: use-after-free in enqueue_timer+0x18/0xa4 kernel/time/timer.c:605
Write at addr f9ff000024df6058 by task syz-fuzzer/2256
Pointer tag: [f9], memory tag: [fe]
CPU: 1 PID: 2256 Comm: syz-fuzzer Not tainted 6.1.0-rc5-syzkaller-00008-
ge01d50cbd6ee #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace.part.0+0xe0/0xf0 arch/arm64/kernel/stacktrace.c:156
dump_backtrace arch/arm64/kernel/stacktrace.c:162 [inline]
show_stack+0x18/0x40 arch/arm64/kernel/stacktrace.c:163
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x1a8/0x4a0 mm/kasan/report.c:395
kasan_report+0x94/0xb4 mm/kasan/report.c:495
__do_kernel_fault+0x164/0x1e0 arch/arm64/mm/fault.c:320
do_bad_area arch/arm64/mm/fault.c:473 [inline]
do_tag_check_fault+0x78/0x8c arch/arm64/mm/fault.c:749
do_mem_abort+0x44/0x94 arch/arm64/mm/fault.c:825
el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0xd8/0xe4 arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576
hlist_add_head include/linux/list.h:929 [inline]
enqueue_timer+0x18/0xa4 kernel/time/timer.c:605
mod_timer+0x14/0x20 kernel/time/timer.c:1161
mrp_periodic_timer_arm net/802/mrp.c:614 [inline]
mrp_periodic_timer+0xa0/0xc0 net/802/mrp.c:627
call_timer_fn.constprop.0+0x24/0x80 kernel/time/timer.c:1474
expire_timers+0x98/0xc4 kernel/time/timer.c:1519
To fix it, we can introduce a new active flags to make sure the timer will
not restart.
Reported-by: syzbot+6fd64001c20aa99e34a4@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit defbab270d45e32b068e7e73c3567232d745c60f ]
Commit bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
of some byteswap operations") added __always_inline to swab functions
and commit 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to
userspace headers") added a definition of __always_inline for use in
exported headers when the kernel's compiler.h is not available.
However, since swab.h does not include stddef.h, if the header soup does
not indirectly include it, the definition of __always_inline is missing,
resulting in a compilation failure, which was observed compiling the
perf tool using exported headers containing this commit:
In file included from /usr/include/linux/byteorder/little_endian.h:12:0,
from /usr/include/asm/byteorder.h:14,
from tools/include/uapi/linux/perf_event.h:20,
from perf.h:8,
from builtin-bench.c:18:
/usr/include/linux/swab.h:160:8: error: unknown type name `__always_inline'
static __always_inline __u16 __swab16p(const __u16 *p)
Fix this by replacing the inclusion of linux/compiler.h with
linux/stddef.h to ensure that we pick up that definition if required,
without relying on it's indirect inclusion. compiler.h is then included
indirectly, via stddef.h.
Fixes: 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to userspace headers")
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Petr Vaněk <arkamar@atlas.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cf59e1e4c79bf741905484cdb13c130b53576a16 ]
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in sound/core/seq/seq_clientmgr.c:509:22
left shift of 1 by 31 places cannot be represented in type 'int'
...
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xcf
ubsan_epilogue+0xa/0x44
__ubsan_handle_shift_out_of_bounds+0x1e7/0x208
snd_seq_deliver_single_event.constprop.21+0x191/0x2f0
snd_seq_deliver_event+0x1a2/0x350
snd_seq_kernel_client_dispatch+0x8b/0xb0
snd_seq_client_notify_subscription+0x72/0xa0
snd_seq_ioctl_subscribe_port+0x128/0x160
snd_seq_kernel_client_ctl+0xce/0xf0
snd_seq_oss_create_client+0x109/0x15b
alsa_seq_oss_init+0x11c/0x1aa
do_one_initcall+0x80/0x440
kernel_init_freeable+0x370/0x3c3
kernel_init+0x1b/0x190
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Link: https://lore.kernel.org/r/20221121111630.3119259-1-zhongbaisong@huawei.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fd4e60bf0ef8eb9edcfa12dda39e8b6ee9060492 ]
Commit ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()")
forgot to change int to __u64 in the CONFIG_EVENTFD=n stub function.
Link: https://lkml.kernel.org/r/20221124140154.104680-1-zhangqilong3@huawei.com
Fixes: ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Cc: Dylan Yudaken <dylany@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2e41f274f9aa71cdcc69dc1f26a3f9304a651804 ]
Patch series "fix error when writing negative value to simple attribute
files".
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), but some attribute files want to accept a negative
value.
This patch (of 3):
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), so we have to use a 64-bit value to write a
negative value.
This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value.
Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com
Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com
Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2f117484329b233455ee278f2d9b0a4356835060 ]
When `timerqueue_getnext()` is called on an empty timer queue, it will
use `rb_entry()` on a NULL pointer, which is invalid. Fix that by using
`rb_entry_safe()` which handles NULL pointers.
This has not caused any issues so far because the offset of the `rb_node`
member in `timerqueue_node` is 0, so `rb_entry()` is essentially a no-op.
Fixes: 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer")
Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221114195421.342929-1-pobrn@protonmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 26e8f6a75248247982458e8237b98c9fb2ffcf9d ]
bitfield mode in ocr register has only 2 bits not 3, so correct
the OCR_MODE_MASK define.
Signed-off-by: Heiko Schocher <hs@denx.de>
Link: https://lore.kernel.org/all/20221123071636.2407823-1-hs@denx.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2ba99c5e08812494bc57f319fb562f527d9bacd8 upstream.
Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
ensure that the page table was not removed by khugepaged in between.
However, lockless_pages_from_mm() still requires that the page table is
not concurrently freed. Fix it by sending IPIs (if the architecture uses
semi-RCU-style page table freeing) before freeing/reusing page tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@google.com
Fixes: ba76149f47d8 ("thp: khugepaged")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[manual backport: two of the three places in khugepaged that can free
ptes were refactored into a common helper between 5.15 and 6.0;
TLB flushing was refactored between 5.4 and 5.10;
TLB flushing was refactored between 4.19 and 5.4;
pmd collapse for PTE-mapped THP was only added in 5.4;
ugly hack for s390 in <=4.19 and arm]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 62c07983bef9d3e78e71189441e1a470f0d1e653 upstream.
Christophe Leroy reported a ~80ms latency spike
happening at first TCP connect() time.
This is because __inet_hash_connect() uses get_random_once()
to populate a perturbation table which became quite big
after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
get_random_once() uses DO_ONCE(), which block hard irqs for the duration
of the operation.
This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
for operations where we prefer to stay in process context.
Then __inet_hash_connect() can use get_random_slow_once()
to populate its perturbation table.
Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream.
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently dropped
the file type check with it allowing any file to slip through. With the
invarients broken, the d_name and parent accesses can now race against
renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that
cgroupfs is implemented through kernfs, checking the file operations needs
to go through a layer of indirection. Instead, let's check the superblock
and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3aac580d5cc3001ca1627725b3b61edb529f341d ]
On some platforms, some data e.g., timestamps, can be retrieved from
the PMU driver. Usually, the data from the PMU driver is more accurate.
The current perf kernel should output the PMU-filled sample data if
it's available.
To check the availability of the PMU-filled sample data, the current
perf kernel initializes the related fields in the
perf_sample_data_init(). When outputting a sample, the perf checks
whether the field is updated by the PMU driver. If yes, the updated
value will be output. If not, the perf uses an SW way to calculate the
value or just outputs the initialized value if an SW way is unavailable
either.
With more and more data being provided by the PMU driver, more fields
has to be initialized in the perf_sample_data_init(). That will
increase the number of cache lines touched in perf_sample_data_init()
and be harmful to the performance.
Add new "sample_flags" to indicate the PMU-filled sample data. The PMU
driver should set the corresponding PERF_SAMPLE_ flag when the field is
updated. The initialization of the corresponding field is not required
anymore. The following patches will make use of it and remove the
corresponding fields from the perf_sample_data_init(), which will
further minimize the number of cache lines touched.
Only clear the sample flags that have already been done by the PMU
driver in the perf_prepare_sample() for the PERF_RECORD_SAMPLE. For the
other PERF_RECORD_ event type, the sample data is not available.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220901130959.1285717-2-kan.liang@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 986d93f55bdeab1cac858d1e47b41fac10b2d7f6 ]
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in kernel/auditfilter.c:179:23
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
audit_register_class+0x9d/0x137
audit_classes_init+0x4d/0xb8
do_one_initcall+0x76/0x430
kernel_init_freeable+0x3b3/0x422
kernel_init+0x24/0x1e0
ret_from_fork+0x1f/0x30
</TASK>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
[PM: remove bad 'Fixes' tag as issue predates git, added in v2.6.6-rc1]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 46653972e3ea64f79e7f8ae3aa41a4d3fdb70a13 ]
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in security/commoncap.c:1252:2
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
cap_task_prctl+0x561/0x6f0
security_task_prctl+0x5a/0xb0
__x64_sys_prctl+0x61/0x8f0
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Fixes: e338d263a76a ("Add 64-bit capability support to the kernel")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 95b980d62d52c4c1768ee719e8db3efe27ef52b2 upstream.
BIT(), GENMASK(), etc. are useful to define register bits of hardware.
However, low-level code is often written in assembly, where they are
not available due to the hard-coded 1UL, 0UL.
In fact, in-kernel headers such as arch/arm64/include/asm/sysreg.h
use _BITUL() instead of BIT() so that the register bit macros are
available in assembly.
Using macros in include/uapi/linux/const.h have two reasons:
[1] For use in uapi headers
We should use underscore-prefixed variants for user-space.
[2] For use in assembly code
Since _BITUL() uses UL(1) instead of 1UL, it can be used as an
alternative of BIT().
For [2], it is pretty easy to change BIT() etc. for use in assembly.
This allows to replace _BUTUL() in kernel-space headers with BIT().
Link: http://lkml.kernel.org/r/20190609153941.17249-1-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2dd8a62c647691161a2346546834262597739872 upstream.
ARM, ARM64 and UniCore32 duplicate the definition of UL():
#define UL(x) _AC(x, UL)
This is not actually arch-specific, so it will be useful to move it to a
common header. Currently, we only have the uapi variant for
linux/const.h, so I am creating include/linux/const.h.
I also added _UL(), _ULL() and ULL() because _AC() is mostly used in
the form either _AC(..., UL) or _AC(..., ULL). I expect they will be
replaced in follow-up cleanups. The underscore-prefixed ones should
be used for exported headers.
Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2a6cc8a6c0cb44baf7df2f64e5090aaf726002c3 upstream.
Patch series "linux/const.h: cleanups of macros such as UL(), _BITUL(),
BIT() etc", v3.
ARM, ARM64, UniCore32 define UL() as a shorthand of _AC(..., UL). More
architectures may introduce it in the future.
UL() is arch-agnostic, and useful. So let's move it to
include/linux/const.h
Currently, <asm/memory.h> must be included to use UL(). It pulls in more
bloats just for defining some bit macros.
I posted V2 one year ago.
The previous posts are:
https://patchwork.kernel.org/patch/9498273/
https://patchwork.kernel.org/patch/9498275/
https://patchwork.kernel.org/patch/9498269/
https://patchwork.kernel.org/patch/9498271/
At that time, what blocked this series was a comment from
David Howells:
You need to be very careful doing this. Some userspace stuff
depends on the guard macro names on the kernel header files.
(https://patchwork.kernel.org/patch/9498275/)
Looking at the code closer, I noticed this is not a problem.
See the following line.
https://github.com/torvalds/linux/blob/v4.16-rc2/scripts/headers_install.sh#L40
scripts/headers_install.sh rips off _UAPI prefix from guard macro names.
I ran "make headers_install" and confirmed the result is what I expect.
So, we can prefix the include guard of include/uapi/linux/const.h,
and add a new include/linux/const.h.
This patch (of 4):
I am going to add include/linux/const.h for the kernel space.
Add _UAPI to the include guard of include/uapi/linux/const.h to
prepare for that.
Please notice the guard name of the exported one will be kept as-is.
So, this commit has no impact to the userspace even if some userspace
stuff depends on the guard macro names.
scripts/headers_install.sh processes exported headers by SED, and
rips off "_UAPI" from guard macro names.
#ifndef _UAPI_LINUX_CONST_H
#define _UAPI_LINUX_CONST_H
will be turned into
#ifndef _LINUX_CONST_H
#define _LINUX_CONST_H
Link: http://lkml.kernel.org/r/1519301715-31798-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[nd: resolve trivial conflict due to b732e14e6218b being backported before this]
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 161a438d730dade2ba2b1bf8785f0759aba4ca5f upstream.
We no longer need at least 64 bytes of random seed to permit the early
crng init to complete. The RNG is now based on Blake2s, so reduce the
EFI seed size to the Blake2s hash size, which is sufficient for our
purposes.
While at it, drop the READ_ONCE(), which was supposed to prevent size
from being evaluated after seed was unmapped. However, this cannot
actually happen, so READ_ONCE() is unnecessary here.
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d upstream.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis. Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp"). However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic. Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline. So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8da7f0976b9071b528c545008de9d10cc81883b1 ]
If it is a progressive (non-interlaced) format, then ignore the
interlaced timing values.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 7f68127fa11f ([media] videodev2.h: defines to calculate blanking and frame sizes)
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6b80b59b3555706508008f1f127b5412c89c7fd8 upstream.
Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary
Speculative Code Execution with Return Instructions) attack.
[peterz: add hygon]
[kim: invert parity; fam15h]
Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[ bp: Adjust context ]
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e9d7144597b10ff13ff2264c059f7d4a7fbc89ac upstream
Intel uses the same family/model for several CPUs. Sometimes the
stepping must be checked to tell them apart.
On x86 there can be at most 16 steppings. Add a steppings bitmask to
x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro
and support for matching against family/model/stepping.
[ bp: Massage.
tglx: Lightweight variant for backporting ]
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ba5bade4cc0d2013cdf5634dae554693c968a090 upstream
There is no reason that this gunk is in a generic header file. The wildcard
defines need to stay as they are required by file2alias.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131508.736205164@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit ae585de4296413ae4bbb8f2ac09faa38ff78f4cd.
This is commit e9d7144597b10ff13ff2264c059f7d4a7fbc89ac upstream. Reverting this
commit makes the following patches apply cleanly. This patch is then reapplied.
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8f905c0e7354ef261360fb7535ea079b1082c105 upstream.
syzbot reported various issues around early demux,
one being included in this changelog [1]
sk->sk_rx_dst is using RCU protection without clearly
documenting it.
And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
are not following standard RCU rules.
[a] dst_release(dst);
[b] sk->sk_rx_dst = NULL;
They look wrong because a delete operation of RCU protected
pointer is supposed to clear the pointer before
the call_rcu()/synchronize_rcu() guarding actual memory freeing.
In some cases indeed, dst could be freed before [b] is done.
We could cheat by clearing sk_rx_dst before calling
dst_release(), but this seems the right time to stick
to standard RCU annotations and debugging facilities.
[1]
BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
dst_check include/net/dst.h:470 [inline]
tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
__netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
__netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
__netif_receive_skb_list net/core/dev.c:5608 [inline]
netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
gro_normal_list net/core/dev.c:5853 [inline]
gro_normal_list net/core/dev.c:5849 [inline]
napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
__napi_poll+0xaf/0x440 net/core/dev.c:7023
napi_poll net/core/dev.c:7090 [inline]
net_rx_action+0x801/0xb40 net/core/dev.c:7177
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
RIP: 0033:0x7f5e972bfd57
Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
RSP: 002b:00007fff8a413210 EFLAGS: 00000283
RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
</TASK>
Allocated by task 13:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
__kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
kasan_slab_alloc include/linux/kasan.h:259 [inline]
slab_post_alloc_hook mm/slab.h:519 [inline]
slab_alloc_node mm/slub.c:3234 [inline]
slab_alloc mm/slub.c:3242 [inline]
kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
dst_alloc+0x146/0x1f0 net/core/dst.c:92
rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
ip_route_input_rcu net/ipv4/route.c:2470 [inline]
ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
__netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
__netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
__netif_receive_skb_list net/core/dev.c:5608 [inline]
netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
gro_normal_list net/core/dev.c:5853 [inline]
gro_normal_list net/core/dev.c:5849 [inline]
napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
__napi_poll+0xaf/0x440 net/core/dev.c:7023
napi_poll net/core/dev.c:7090 [inline]
net_rx_action+0x801/0xb40 net/core/dev.c:7177
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Freed by task 13:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:1723 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
slab_free mm/slub.c:3513 [inline]
kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
rcu_do_batch kernel/rcu/tree.c:2506 [inline]
rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Last potentially related work creation:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
__kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
__call_rcu kernel/rcu/tree.c:2985 [inline]
call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
dst_release net/core/dst.c:177 [inline]
dst_release+0x79/0xe0 net/core/dst.c:167
tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
sk_backlog_rcv include/net/sock.h:1030 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2768
release_sock+0x54/0x1b0 net/core/sock.c:3300
tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
sock_write_iter+0x289/0x3c0 net/socket.c:1057
call_write_iter include/linux/fs.h:2162 [inline]
new_sync_write+0x429/0x660 fs/read_write.c:503
vfs_write+0x7cd/0xae0 fs/read_write.c:590
ksys_write+0x1ee/0x250 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88807f1cb700
which belongs to the cache ip_dst_cache of size 176
The buggy address is located 58 bytes inside of
176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
The buggy address belongs to the page:
page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
prep_new_page mm/page_alloc.c:2418 [inline]
get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
alloc_slab_page mm/slub.c:1793 [inline]
allocate_slab mm/slub.c:1930 [inline]
new_slab+0x32d/0x4a0 mm/slub.c:1993
___slab_alloc+0x918/0xfe0 mm/slub.c:3022
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
slab_alloc_node mm/slub.c:3200 [inline]
slab_alloc mm/slub.c:3242 [inline]
kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
dst_alloc+0x146/0x1f0 net/core/dst.c:92
rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
__mkroute_output net/ipv4/route.c:2564 [inline]
ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
__ip_route_output_key include/net/route.h:126 [inline]
ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
ip_route_output_key include/net/route.h:142 [inline]
geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
geneve_xmit_skb drivers/net/geneve.c:899 [inline]
geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
__netdev_start_xmit include/linux/netdevice.h:4994 [inline]
netdev_start_xmit include/linux/netdevice.h:5008 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
__dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1338 [inline]
free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
free_unref_page_prepare mm/page_alloc.c:3309 [inline]
free_unref_page+0x19/0x690 mm/page_alloc.c:3388
qlink_free mm/kasan/quarantine.c:146 [inline]
qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
__kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
kasan_slab_alloc include/linux/kasan.h:259 [inline]
slab_post_alloc_hook mm/slab.h:519 [inline]
slab_alloc_node mm/slub.c:3234 [inline]
kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
__alloc_skb+0x215/0x340 net/core/skbuff.c:414
alloc_skb include/linux/skbuff.h:1126 [inline]
alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
Memory state around the buggy address:
ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[cmllamas: fixed trivial merge conflict]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 30393181fdbc1608cc683b4ee99dcce05ffcc8c7 upstream.
This patch adds handling to return -EINVAL for an unknown addr type. The
current behaviour is to return 0 as successful but the size of an
unknown addr type is not defined and should return an error like -EINVAL.
Fixes: 94160108a70c ("net/ieee802154: fix uninit value bug in dgram_sendmsg")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4f58330fcc8482aa90674e1f40f601e82f18ed4a ]
IOMMU_IOVA is intended to be an optional library for users to select as
and when they desire. Since it can be a module now, this means that
built-in code which has chosen not to select it should not fail to link
if it happens to have selected as a module by someone else. Replace
IS_ENABLED() with IS_REACHABLE() to do the right thing.
CC: Thierry Reding <thierry.reding@gmail.com>
Reported-by: John Garry <john.garry@huawei.com>
Fixes: 15bbdec3931e ("iommu: Make the iova library a module")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/548c2f683ca379aface59639a8f0cccc3a1ac050.1663069227.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 630624cb1b5826d753ac8e01a0e42de43d66dedf ]
ACS-5 section
7.13.6.36 Word 78: Serial ATA features supported
states that:
If word 76 is not 0000h or FFFFh, word 78 reports the features supported
by the device. If this word is not supported, the word shall be cleared
to zero.
(This text also exists in really old ACS standards, e.g. ACS-3.)
The problem with ata_id_has_dipm() is that the while it performs a
check against 0 and 0xffff, it performs the check against
ATA_ID_FEATURE_SUPP (word 78), the same word where the feature bit
is stored.
Fix this by performing the check against ATA_ID_SATA_CAPABILITY
(word 76), like required by the spec. The feature bit check itself
is of course still performed against ATA_ID_FEATURE_SUPP (word 78).
Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros
(which already have this check), thus making it more likely that the
next ATA_ID_FEATURE_SUPP macro that is added will include this check.
Fixes: ca77329fb713 ("[libata] Link power management infrastructure")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a5fb6bf853148974dbde092ec1bde553bea5e49f ]
ACS-5 section
7.13.6.36 Word 78: Serial ATA features supported
states that:
If word 76 is not 0000h or FFFFh, word 78 reports the features supported
by the device. If this word is not supported, the word shall be cleared
to zero.
(This text also exists in really old ACS standards, e.g. ACS-3.)
Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros
(which already have this check), thus making it more likely that the
next ATA_ID_FEATURE_SUPP macro that is added will include this check.
Fixes: 5b01e4b9efa0 ("libata: Implement NCQ autosense")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9c6e09a434e1317e09b78b3b69cd384022ec9a03 ]
ACS-5 section
7.13.6.36 Word 78: Serial ATA features supported
states that:
If word 76 is not 0000h or FFFFh, word 78 reports the features supported
by the device. If this word is not supported, the word shall be cleared
to zero.
(This text also exists in really old ACS standards, e.g. ACS-3.)
Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros
(which already have this check), thus making it more likely that the
next ATA_ID_FEATURE_SUPP macro that is added will include this check.
Fixes: 65fe1f0f66a5 ("ahci: implement aggressive SATA device sleep support")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 690aa8c3ae308bc696ec8b1b357b995193927083 ]
ACS-5 section
7.13.6.41 Words 85..87, 120: Commands and feature sets supported or enabled
states that:
If bit 15 of word 86 is set to one, bit 14 of word 119 is set to one,
and bit 15 of word 119 is cleared to zero, then word 119 is valid.
If bit 15 of word 86 is set to one, bit 14 of word 120 is set to one,
and bit 15 of word 120 is cleared to zero, then word 120 is valid.
(This text also exists in really old ACS standards, e.g. ACS-3.)
Currently, ata_id_sense_reporting_enabled() and
ata_id_has_sense_reporting() both check bit 15 of word 86,
but neither of them check that bit 14 of word 119 is set to one,
or that bit 15 of word 119 is cleared to zero.
Additionally, make ata_id_sense_reporting_enabled() return false
if !ata_id_has_sense_reporting(), similar to how e.g.
ata_id_flush_ext_enabled() returns false if !ata_id_has_flush_ext().
Fixes: e87fd28cf9a2 ("libata: Implement support for sense data reporting")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 85d6b66d31c35158364058ee98fb69ab5bb6a6b1 ]
For CONFIG_DYNAMIC_DEBUG=N, the ddebug_dyndbg_module_param_cb()
stub-fn is too permissive:
bash-5.1# modprobe drm JUNKdyndbg
bash-5.1# modprobe drm dyndbgJUNK
[ 42.933220] dyndbg param is supported only in CONFIG_DYNAMIC_DEBUG builds
[ 42.937484] ACPI: bus type drm_connector registered
This caused no ill effects, because unknown parameters are either
ignored by default with an "unknown parameter" warning, or ignored
because dyndbg allows its no-effect use on non-dyndbg builds.
But since the code has an explicit feedback message, it should be
issued accurately. Fix with strcmp for exact param-name match.
Fixes: b48420c1d301 dynamic_debug: make dynamic-debug work for module initialization
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 ]
This commit fixes a bug in the tracking of max_packets_out and
is_cwnd_limited. This bug can cause the connection to fail to remember
that is_cwnd_limited is true, causing the connection to fail to grow
cwnd when it should, causing throughput to be lower than it should be.
The following event sequence is an example that triggers the bug:
(a) The connection is cwnd_limited, but packets_out is not at its
peak due to TSO deferral deciding not to send another skb yet.
In such cases the connection can advance max_packets_seq and set
tp->is_cwnd_limited to true and max_packets_out to a small
number.
(b) Then later in the round trip the connection is pacing-limited (not
cwnd-limited), and packets_out is larger. In such cases the
connection would raise max_packets_out to a bigger number but
(unexpectedly) flip tp->is_cwnd_limited from true to false.
This commit fixes that bug.
One straightforward fix would be to separately track (a) the next
window after max_packets_out reaches a maximum, and (b) the next
window after tp->is_cwnd_limited is set to true. But this would
require consuming an extra u32 sequence number.
Instead, to save space we track only the most important
information. Specifically, we track the strongest available signal of
the degree to which the cwnd is fully utilized:
(1) If the connection is cwnd-limited then we remember that fact for
the current window.
(2) If the connection not cwnd-limited then we track the maximum
number of outstanding packets in the current window.
In particular, note that the new logic cannot trigger the buggy
(a)/(b) sequence above because with the new logic a condition where
tp->packets_out > tp->max_packets_out can only trigger an update of
tp->is_cwnd_limited if tp->is_cwnd_limited is false.
This first showed up in a testing of a BBRv2 dev branch, but this
buggy behavior highlighted a general issue with the
tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
the proper rate for any TCP congestion control, including Reno or
CUBIC.
Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6022f210461fef67e6e676fd8544ca02d1bcfa7a upstream.
The passthrough structure is declared off of the stack, so it needs to be
set to zero before copied back to userspace to prevent any unintentional
data leakage. Switch things to be statically allocated which will fill the
unused fields with 0 automatically.
Link: https://lore.kernel.org/r/YxrjN3OOw2HHl9tx@kroah.com
Cc: stable@kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: hdthky <hdthky0@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|