| Age | Commit message (Collapse) | Author |
|
|
|
commit 6330d5534786d5315d56d558aa6d20740f97d80a upstream.
When built as a module and running with update_ms >= 0, pstore will Oops
during module unload since the work timer is still running. This makes sure
the worker is stopped before unloading.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e9a330c4289f2ba1ca4bf98c2b430ab165a8931b upstream.
The per-prz spinlock should be using the dynamic initializer so that
lockdep can correctly track it. Without this, under lockdep, we get a
warning at boot that the lock is in non-static memory.
Fixes: 109704492ef6 ("pstore: Make spinlock per zone instead of global")
Fixes: 76d5692a5803 ("pstore: Correctly initialize spinlock and flags")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 041939c1ec54208b42f5cd819209173d52a29d34 upstream.
After commit c950fd6f201a kernel registers pstore write based on flag set.
Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for
powerpc architecture. On panic, kernel doesn't write message to
/fs/pstore/dmesg*(Entry doesn't gets created at all).
This patch enables pstore write for powerpc architecture by setting
PSTORE_FLAGS_DMESG flag.
Fixes: c950fd6f201a ("pstore: Split pstore fragile flags")
Signed-off-by: Ankit Kumar <ankit@linux.vnet.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d5483feda85a8f39ee2e940e279547c686aac30c upstream.
Fix failures to create namespaces due to the vmem_altmap not advertising
enough free space to store the memmap.
WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0
[..]
Call Trace:
dump_stack+0x63/0x83
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
arch_add_memory+0xde/0xf0
devm_memremap_pages+0x244/0x440
pmem_attach_disk+0x37e/0x490 [nd_pmem]
nd_pmem_probe+0x7e/0xa0 [nd_pmem]
nvdimm_bus_probe+0x71/0x120 [libnvdimm]
driver_probe_device+0x2bb/0x460
bind_store+0x114/0x160
drv_attr_store+0x25/0x30
In commit 658922e57b84 "libnvdimm, pfn: fix memmap reservation sizing"
we arranged for the capacity to be allocated, but failed to also update
the 'npfns' parameter. This leads to cases where there is enough
capacity reserved to hold all the allocated sections, but
vmemmap_populate_hugepages() still encounters -ENOMEM from
altmap_alloc_block_buf().
This fix is a stop-gap until we can teach the core memory hotplug
implementation to permit sub-section hotplug.
Fixes: 658922e57b84 ("libnvdimm, pfn: fix memmap reservation sizing")
Reported-by: Anisha Allada <anisha.allada@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 452bae0aede774f87bf56c28b6dd50b72c78986c upstream.
A debug patch to turn the standard device_lock() into something that
lockdep can analyze yielded the following:
======================================================
[ INFO: possible circular locking dependency detected ]
4.11.0-rc4+ #106 Tainted: G O
-------------------------------------------------------
lt-libndctl/1898 is trying to acquire lock:
(&dev->nvdimm_mutex/3){+.+.+.}, at: [<ffffffffc023c948>] nd_attach_ndns+0x178/0x1b0 [libnvdimm]
but task is already holding lock:
(&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffc022e0b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
lock_acquire+0xf6/0x1f0
__mutex_lock+0x88/0x980
mutex_lock_nested+0x1b/0x20
nvdimm_bus_lock+0x21/0x30 [libnvdimm]
nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
nd_pmem_probe+0x14/0x180 [nd_pmem]
nvdimm_bus_probe+0xa9/0x260 [libnvdimm]
-> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
__lock_acquire+0x1107/0x1280
lock_acquire+0xf6/0x1f0
__mutex_lock+0x88/0x980
mutex_lock_nested+0x1b/0x20
nd_attach_ndns+0x178/0x1b0 [libnvdimm]
nd_namespace_store+0x308/0x3c0 [libnvdimm]
namespace_store+0x87/0x220 [libnvdimm]
In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.
Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
nd_{attach,detach}_ndns() operations.
Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices")
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b2518c78ce76896f0f8f7940bf02104b227e1709 upstream.
The following BUG was observed when nd_pmem_notify() was called
for a BTT device. The use of a pmem_device pointer is not valid
with BTT.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]
Call Trace:
nd_device_notify+0x40/0x50
child_notify+0x10/0x20
device_for_each_child+0x50/0x90
nd_region_notify+0x20/0x30
nd_device_notify+0x40/0x50
nvdimm_region_notify+0x27/0x30
acpi_nfit_scrub+0x341/0x590 [nfit]
process_one_work+0x197/0x450
worker_thread+0x4e/0x4a0
kthread+0x109/0x140
Fix nd_pmem_notify() by setting nd_region and badblocks pointers
properly for BTT.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 719994660c24 ("libnvdimm: async notification support")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bc042fdfbb92b5b13421316b4548e2d6e98eed37 upstream.
In the case where a dimm does not have any associated flush hints the
ndrd->flush_wpq array may be uninitialized leading to crashes with the
following signature:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: region_visible+0x10f/0x160 [libnvdimm]
Call Trace:
internal_create_group+0xbe/0x2f0
sysfs_create_groups+0x40/0x80
device_add+0x2d8/0x650
nd_async_device_register+0x12/0x40 [libnvdimm]
async_run_entry_fn+0x39/0x170
process_one_work+0x212/0x6c0
? process_one_work+0x197/0x6c0
worker_thread+0x4e/0x4a0
kthread+0x10c/0x140
? process_one_work+0x6c0/0x6c0
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x31/0x40
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Fixes: f284a4f23752 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6de65fcfdb51835789b245203d1bfc8d14cb1e06 upstream.
msg_written_handler() may set ssif_info->multi_data to NULL
when using ipmitool to write fru.
Before setting ssif_info->multi_data to NULL, add new local
pointer "data_to_send" and store correct i2c data pointer to
it to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos.
Signed-off-by: Joeseph Chang <joechang@codeaurora.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c6ade20f5e50e188d20b711a618b20dd1d50457e upstream.
The WRITE SAME to TRIM translation rewrites the DATA OUT buffer. While
the SCSI code accomodates for this by passing a read-writable buffer
userspace applications don't cater for this behavior. In fact it can
be used to rewrite e.g. a readonly file through mmap and should be
considered as a security fix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a590b90d472f2c176c140576ee3ab44df7f67839 upstream.
cgroup_get() expected to be called only on live cgroups and triggers
warning on a dead cgroup; however, cgroup_sk_alloc() may be called
while cloning a socket which is left in an empty and removed cgroup
and thus may legitimately duplicate its reference on a dead cgroup.
This currently triggers the following warning spuriously.
WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
...
[<ffffffff8107e123>] __warn+0xd3/0xf0
[<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20
[<ffffffff810ff465>] cgroup_get+0x55/0x60
[<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0
[<ffffffff81761beb>] sk_clone_lock+0x2db/0x390
[<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0
[<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0
[<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670
[<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0
[<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00
[<ffffffff81837429>] ip6_input_finish+0x59/0x3e0
[<ffffffff81837cb2>] ip6_input+0x32/0xb0
[<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0
[<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0
[<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0
[<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70
[<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80
[<ffffffff817787d8>] napi_gro_frags+0x208/0x270
[<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40
[<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90
[<ffffffff81778b30>] net_rx_action+0x210/0x350
[<ffffffff8188c426>] __do_softirq+0x106/0x2c7
[<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0
[<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI>
[<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20
[<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0
[<ffffffff8103edd1>] start_secondary+0xf1/0x100
This patch renames the existing cgroup_get() with the dead cgroup
warning to cgroup_get_live() after cgroup_kn_lock_live() and
introduces the new cgroup_get() which doesn't check whether the cgroup
is live or dead.
All existing cgroup_get() users except for cgroup_sk_alloc() are
converted to use cgroup_get_live().
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dcb9cfaa5ea9aa0ec08aeb92582ccfe3e4c719a9 upstream.
Make sure to check the tty-device pointer before looking up the sibling
platform device to avoid dereferencing a NULL-pointer when the tty is
one end of a Unix98 pty.
Fixes: 74cdad37cd24 ("Bluetooth: hci_intel: Add runtime PM support")
Fixes: 1ab1f239bf17 ("Bluetooth: hci_intel: Add support for platform driver")
Cc: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 95065a61e9bf25fb85295127fba893200c2bbbd8 upstream.
Make sure to check the tty-device pointer before looking up the sibling
platform device to avoid dereferencing a NULL-pointer when the tty is
one end of a Unix98 pty.
Fixes: 0395ffc1ee05 ("Bluetooth: hci_bcm: Add PM for BCM devices")
Cc: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ab89f0bdd63a3721f7cd3f064f39fc4ac7ca14d4 upstream.
Running 32bit userspace on 64bit kernel results in MSG_CMSG_COMPAT being
defined as 0x80000000. This results in sendmsg failure if used from 32bit
userspace running on 64bit kernel. Fix this by accounting for MSG_CMSG_COMPAT
in flags check in hci_sock_sendmsg.
Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marko Kiiskila <marko@runtime.io>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5a0722b898f851b9ef108ea7babc529e4efc773d upstream.
Define a new early console name for Qualcomm Datacenter Technologies
QDF2400 SOCs affected by erratum 44, instead of piggy-backing on "pl011".
Previously, to enable traditional (non-SPCR) earlycon, the documentation
said to specify "earlycon=pl011,<address>,qdf2400_e44", but the code was
broken and this didn't actually work.
So instead, the method for specifying the E44 work-around with traditional
earlycon is "earlycon=qdf2400_e44,<address>". Both methods of earlycon
are now enabled with the same function.
Fixes: e53e597fd4c4 ("tty: pl011: fix earlycon work-around for QDF2400 erratum 44")
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77dae6134440420bac334581a3ccee94cee1c054 upstream.
While using emacs, cat or others' commands in konsole with recent
kernels, I have met many times that CTRL-C freeze konsole. After
konsole freeze I can't type anything, then I have to open a new one,
it is very annoying.
See bug report:
https://bugs.kde.org/show_bug.cgi?id=175283
The platform in that bug report is Solaris, but now the pty in linux
has the same problem or the same behavior as Solaris :)
It has high possibility to trigger the problem follow steps below:
Note: In my test, BigFile is a text file whose size is bigger than 1G
1:open konsole
1:cat BigFile
2:CTRL-C
After some digging, I find out the reason is that commit 1d1d14da12e7
("pty: Fix buffer flush deadlock") changes the behavior of pty_flush_buffer.
Thread A Thread B
-------- --------
1:n_tty_poll return POLLIN
2:CTRL-C trigger pty_flush_buffer
tty_buffer_flush
n_tty_flush_buffer
3:attempt to check count of chars:
ioctl(fd, TIOCINQ, &available)
available is equal to 0
4:read(fd, buffer, avaiable)
return 0
5:konsole close fd
Yes, I know we could use the same patch included in the BUG report as
a workaround for linux platform too. But I think the data in ldisc is
belong to application of another side, we shouldn't clear it when we
want to flush write buffer of this side in pty_flush_buffer. So I think
it is better to disable ldisc flush in pty_flush_buffer, because its new
hehavior bring no benefit except that it mess up the behavior between
POLLIN, and TIOCINQ or FIONREAD.
Also I find no flush_buffer function in others' tty driver has the
same behavior as current pty_flush_buffer.
Fixes: 1d1d14da12e7 ("pty: Fix buffer flush deadlock")
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77e6fe7fd2b7cba0bf2f2dc8cde51d7b9a35bf74 upstream.
Make sure to actually suspend the device before returning after a failed
(or deferred) probe.
Note that autosuspend must be disabled before runtime pm is disabled in
order to balance the usage count due to a negative autosuspend delay as
well as to make the final put suspend the device synchronously.
Fixes: 388bc2622680 ("omap-serial: Fix the error handling in the omap_serial probe")
Cc: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 099bd73dc17ed77aa8c98323e043613b6e8f54fc upstream.
An unbalanced and misplaced synchronous put was used to suspend the
device on driver unbind, something which with a likewise misplaced
pm_runtime_disable leads to external aborts when an open port is being
removed.
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa024010
...
[<c046e760>] (serial_omap_set_mctrl) from [<c046a064>] (uart_update_mctrl+0x50/0x60)
[<c046a064>] (uart_update_mctrl) from [<c046a400>] (uart_shutdown+0xbc/0x138)
[<c046a400>] (uart_shutdown) from [<c046bd2c>] (uart_hangup+0x94/0x190)
[<c046bd2c>] (uart_hangup) from [<c045b760>] (__tty_hangup+0x404/0x41c)
[<c045b760>] (__tty_hangup) from [<c045b794>] (tty_vhangup+0x1c/0x20)
[<c045b794>] (tty_vhangup) from [<c046ccc8>] (uart_remove_one_port+0xec/0x260)
[<c046ccc8>] (uart_remove_one_port) from [<c046ef4c>] (serial_omap_remove+0x40/0x60)
[<c046ef4c>] (serial_omap_remove) from [<c04845e8>] (platform_drv_remove+0x34/0x4c)
Fix this up by resuming the device before deregistering the port and by
suspending and disabling runtime pm only after the port has been
removed.
Also make sure to disable autosuspend before disabling runtime pm so
that the usage count is balanced and device actually suspended before
returning.
Note that due to a negative autosuspend delay being set in probe, the
unbalanced put would actually suspend the device on first driver unbind,
while rebinding and again unbinding would result in a negative
power.usage_count.
Fixes: 7e9c8e7dbf3b ("serial: omap: make sure to suspend device before remove")
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 500fcc08a32bfd54f11951ba81530775df15c474 upstream.
This patch adds missing checks for dma_map_single() failure and proper error
reporting. Although this issue was harmless on ARM architecture, it is always
good to use the DMA mapping API in a proper way. This patch fixes the following
DMA API debug warning:
WARNING: CPU: 1 PID: 3785 at lib/dma-debug.c:1171 check_unmap+0x8a0/0xf28
dma-pl330 121a0000.pdma: DMA-API: device driver failed to check map error[device address=0x000000006e0f9000] [size=4096 bytes] [mapped as single]
Modules linked in:
CPU: 1 PID: 3785 Comm: (agetty) Tainted: G W 4.11.0-rc1-00137-g07ca963-dirty #59
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
[<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
[<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
[<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
[<c01395a4>] (warn_slowpath_fmt) from [<c072a114>] (check_unmap+0x8a0/0xf28)
[<c072a114>] (check_unmap) from [<c072a834>] (debug_dma_unmap_page+0x98/0xc8)
[<c072a834>] (debug_dma_unmap_page) from [<c0803874>] (s3c24xx_serial_shutdown+0x314/0x52c)
[<c0803874>] (s3c24xx_serial_shutdown) from [<c07f5124>] (uart_port_shutdown+0x54/0x88)
[<c07f5124>] (uart_port_shutdown) from [<c07f522c>] (uart_shutdown+0xd4/0x110)
[<c07f522c>] (uart_shutdown) from [<c07f6a8c>] (uart_hangup+0x9c/0x208)
[<c07f6a8c>] (uart_hangup) from [<c07c426c>] (__tty_hangup+0x49c/0x634)
[<c07c426c>] (__tty_hangup) from [<c07c78ac>] (tty_ioctl+0xc88/0x16e4)
[<c07c78ac>] (tty_ioctl) from [<c03b5f2c>] (do_vfs_ioctl+0xc4/0xd10)
[<c03b5f2c>] (do_vfs_ioctl) from [<c03b6bf4>] (SyS_ioctl+0x7c/0x8c)
[<c03b6bf4>] (SyS_ioctl) from [<c010b4a0>] (ret_fast_syscall+0x0/0x3c)
Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Fixes: 62c37eedb74c8 ("serial: samsung: add dma reqest/release functions")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 768d64f491a530062ddad50e016fb27125f8bd7c upstream.
Driver should provide its own struct device for all DMA-mapping calls instead
of extracting device pointer from DMA engine channel. Although this is harmless
from the driver operation perspective on ARM architecture, it is always good
to use the DMA mapping API in a proper way. This patch fixes following DMA API
debug warning:
WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1241 check_sync+0x520/0x9f4
samsung-uart 12c20000.serial: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000006df0f580] [size=64 bytes]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00137-g07ca963 #51
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
[<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
[<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
[<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
[<c01395a4>] (warn_slowpath_fmt) from [<c0729058>] (check_sync+0x520/0x9f4)
[<c0729058>] (check_sync) from [<c072967c>] (debug_dma_sync_single_for_device+0x88/0xc8)
[<c072967c>] (debug_dma_sync_single_for_device) from [<c0803c10>] (s3c24xx_serial_start_tx_dma+0x100/0x2f8)
[<c0803c10>] (s3c24xx_serial_start_tx_dma) from [<c0804338>] (s3c24xx_serial_tx_chars+0x198/0x33c)
Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Fixes: 62c37eedb74c8 ("serial: samsung: add dma reqest/release functions")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6b06cdee81d68a8a829ad8e8d0f31d6836744af9 upstream.
When accessing an encrypted directory without the key, userspace must
operate on filenames derived from the ciphertext names, which contain
arbitrary bytes. Since we must support filenames as long as NAME_MAX,
we can't always just base64-encode the ciphertext, since that may make
it too long. Currently, this is solved by presenting long names in an
abbreviated form containing any needed filesystem-specific hashes (e.g.
to identify a directory block), then the last 16 bytes of ciphertext.
This needs to be sufficient to identify the actual name on lookup.
However, there is a bug. It seems to have been assumed that due to the
use of a CBC (ciphertext block chaining)-based encryption mode, the last
16 bytes (i.e. the AES block size) of ciphertext would depend on the
full plaintext, preventing collisions. However, we actually use CBC
with ciphertext stealing (CTS), which handles the last two blocks
specially, causing them to appear "flipped". Thus, it's actually the
second-to-last block which depends on the full plaintext.
This caused long filenames that differ only near the end of their
plaintexts to, when observed without the key, point to the wrong inode
and be undeletable. For example, with ext4:
# echo pass | e4crypt add_key -p 16 edir/
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
2004
# rm -rf edir/
rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
...
To fix this, when presenting long encrypted filenames, encode the
second-to-last block of ciphertext rather than the last 16 bytes.
Although it would be nice to solve this without depending on a specific
encryption mode, that would mean doing a cryptographic hash like SHA-256
which would be much less efficient. This way is sufficient for now, and
it's still compatible with encryption modes like HEH which are strong
pseudorandom permutations. Also, changing the presented names is still
allowed at any time because they are only provided to allow applications
to do things like delete encrypted directories. They're not designed to
be used to persistently identify files --- which would be hard to do
anyway, given that they're encrypted after all.
For ease of backports, this patch only makes the minimal fix to both
ext4 and f2fs. It leaves ubifs as-is, since ubifs doesn't compare the
ciphertext block yet. Follow-on patches will clean things up properly
and make the filesystems use a shared helper function.
Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption")
Reported-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 272f98f6846277378e1758a49a49d7bf39343c02 upstream.
To mitigate some types of offline attacks, filesystem encryption is
designed to enforce that all files in an encrypted directory tree use
the same encryption policy (i.e. the same encryption context excluding
the nonce). However, the fscrypt_has_permitted_context() function which
enforces this relies on comparing struct fscrypt_info's, which are only
available when we have the encryption keys. This can cause two
incorrect behaviors:
1. If we have the parent directory's key but not the child's key, or
vice versa, then fscrypt_has_permitted_context() returned false,
causing applications to see EPERM or ENOKEY. This is incorrect if
the encryption contexts are in fact consistent. Although we'd
normally have either both keys or neither key in that case since the
master_key_descriptors would be the same, this is not guaranteed
because keys can be added or removed from keyrings at any time.
2. If we have neither the parent's key nor the child's key, then
fscrypt_has_permitted_context() returned true, causing applications
to see no error (or else an error for some other reason). This is
incorrect if the encryption contexts are in fact inconsistent, since
in that case we should deny access.
To fix this, retrieve and compare the fscrypt_contexts if we are unable
to set up both fscrypt_infos.
While this slightly hurts performance when accessing an encrypted
directory tree without the key, this isn't a case we really need to be
optimizing for; access *with* the key is much more important.
Furthermore, the performance hit is barely noticeable given that we are
already retrieving the fscrypt_context and doing two keyring searches in
fscrypt_get_encryption_info(). If we ever actually wanted to optimize
this case we might start by caching the fscrypt_contexts.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 394e4f5d5834b610ee032cceb20a1b1f45b01d28 upstream.
Commit 17a9be317475 ("initramfs: Always do fput() and load modules after
rootfs populate") introduced an error for the
CONFIG_BLK_DEV_RAM=y
case, because even though the code looks fine, the compiler really wants
a statement after a label, or you'll get complaints:
init/initramfs.c: In function 'populate_rootfs':
init/initramfs.c:644:2: error: label at end of compound statement
That commit moved the subsequent statements to outside the compound
statement, leaving the label without any associated statements.
Reported-by: Jörg Otte <jrg.otte@gmail.com>
Fixes: 17a9be317475 ("initramfs: Always do fput() and load modules after rootfs populate")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 17a9be31747535184f2af156b1f080ec4c92a952 upstream.
In OpenRISC we do not have a bootloader passed initrd, but the built in
initramfs does contain the /init and other binaries, including modules.
The previous commit 08865514805d2 ("initramfs: finish fput() before
accessing any binary from initramfs") made a change to only call fput()
if the bootloader initrd was available, this caused intermittent crashes
for OpenRISC.
This patch changes the fput() to happen unconditionally if any rootfs is
loaded. Also, I added some comments to make it a bit more clear why we
call unpack_to_rootfs() multiple times.
Fixes: 08865514805d2 ("initramfs: finish fput() before accessing any binary from initramfs")
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3adc5fcb7edf5f8dfe8d37dcb50ba6b30077c905 upstream.
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions. generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.
Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6332cd32c8290a80e929fc044dc5bdba77396e33 upstream.
If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.
Eric reported how to reproduce this issue by:
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(fixed f2fs_dentry_hash() to work even when the hash is 0)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d3bb910c15d75ee3340311c64a1c05985bb663a3 upstream.
Commit 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having
same name) does not cover the scenario where inline dentry is enabled.
In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
lookup dentries one more time.
This patch fixes it by moving the assigment of current task to a upper
level to cover both normal and inline dentry.
Fixes: 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name)
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9bb02c3627f46e50246bf7ab957b56ffbef623cb upstream.
This patch fixes the following scenario.
- f2fs_create/f2fs_mkdir - write_checkpoint
- f2fs_mark_inode_dirty_sync - block_operations
- f2fs_lock_all
- f2fs_sync_inode_meta
- f2fs_unlock_all
- sync_inode_metadata
- f2fs_lock_op
- f2fs_write_inode
- update_inode_page
- get_node_page
return -ENOENT
- new_inode_page
- fill_node_footer
- f2fs_mark_inode_dirty_sync
- ...
- f2fs_unlock_op
- f2fs_inode_synced
- f2fs_lock_all
- do_checkpoint
In this checkpoint, we can get an inode page which contains zeros having valid
node footer only.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c6f82fe90d7458e5fa190a6820bfc24f96b0de4e upstream.
This reverts commit 3436c4bdb30de421d46f58c9174669fbcfd40ce0.
This makes a leak to register dirty segments. I reproduced the issue by
modified postmark which injects a lot of file create/delete/update and
finally triggers huge number of SSR allocations.
[Jaegeuk Kim: Change missing incorrect comment]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c541a51b8ce81d003b02ed67ad3604a2e6220e3e upstream.
This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.
Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 876f29460cbd4086b43475890c1bf2488fa11d40 upstream.
This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.
Currently DAX PMD read fault can race with write(2) in the following
way:
CPU1 - write(2) CPU2 - read fault
dax_iomap_pmd_fault()
->iomap_begin() - sees hole
dax_iomap_rw()
iomap_apply()
->iomap_begin - allocates blocks
dax_iomap_actor()
invalidate_inode_pages2_range()
- there's nothing to invalidate
grab_mapping_entry()
- we add huge zero page to the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fb26a1cbed8c90025572d48bc9eabe59f7571e88 upstream.
DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().
Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cd656375f94632d7b5af57bf67b7b5c0270c591c upstream.
Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX. That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:
- open an mmap over a 2MiB hole
- read from a 2MiB hole, faulting in a 2MiB zero page
- write to the hole with write(3p). The write succeeds but we
incorrectly leave the 2MiB zero page mapping intact.
- via the mmap, read the data that was just written. Since the zero
page mapping is still intact we read back zeroes instead of the new
data.
Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.
Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4636e70bb0a8b871998b6841a2e4b205cf2bc863 upstream.
Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.
This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).
The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.
This patch (of 4):
dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked. This is done via:
invalidate_mapping_pages()
invalidate_exceptional_entry()
dax_invalidate_mapping_entry()
However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped. This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().
For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry. This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.
We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 565851c972b50612f3a4542e26879ffb3e906fc2 upstream.
Usage of device_lock() for dax_region attributes is unnecessary and
deadlock prone. It's unnecessary because the order of registration /
un-registration guarantees that drvdata is always valid. It's deadlock
prone because it sets up this situation:
ndctl D 0 2170 2082 0x00000000
Call Trace:
__schedule+0x31f/0x980
schedule+0x3d/0x90
schedule_preempt_disabled+0x15/0x20
__mutex_lock+0x402/0x980
? __mutex_lock+0x158/0x980
? align_show+0x2b/0x80 [dax]
? kernfs_seq_start+0x2f/0x90
mutex_lock_nested+0x1b/0x20
align_show+0x2b/0x80 [dax]
dev_attr_show+0x20/0x50
ndctl D 0 2186 2079 0x00000000
Call Trace:
__schedule+0x31f/0x980
schedule+0x3d/0x90
__kernfs_remove+0x1f6/0x340
? kernfs_remove_by_name_ns+0x45/0xa0
? remove_wait_queue+0x70/0x70
kernfs_remove_by_name_ns+0x45/0xa0
remove_files.isra.1+0x35/0x70
sysfs_remove_group+0x44/0x90
sysfs_remove_groups+0x2e/0x50
dax_region_unregister+0x25/0x40 [dax]
devm_action_release+0xf/0x20
release_nodes+0x16d/0x2b0
devres_release_all+0x3c/0x60
device_release_driver_internal+0x17d/0x220
device_release_driver+0x12/0x20
unbind_store+0x112/0x160
ndctl/2170 is trying to acquire the device_lock() to read an attribute,
and ndctl/2186 is holding the device_lock() while trying to drain all
active attribute readers.
Thanks to Yi Zhang for the reproduction script.
Fixes: d7fe1a67f658 ("dax: add region 'id', 'size', and 'align' attributes")
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ed01e50acdd3e4a640cf9ebd28a7e810c3ceca97 upstream.
If device_add() fails, cleanup the cdev. Otherwise, we leak a kobj_map()
with a stale device number.
As Jason points out, there is a small possibility that userspace has
opened and mapped the device in the time between cdev_add() and the
device_add() failure. We need a new kill_dax_dev() helper to invalidate
any established mappings.
Fixes: ba09c01d2fa8 ("dax: convert to the cdev api")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0c9d5b127f695818c2c5a3868c1f28ca2969e905 upstream.
fix_sync_read_error() modifies a bio on a newly faulty
device by setting bi_end_io to end_sync_write.
This ensure that put_buf() will still call rdev_dec_pending()
as required, but makes sure that subsequent code in
fix_sync_read_error() doesn't try to read from the device.
Unfortunately this interacts badly with sync_request_write()
which assumes that any bio with bi_end_io set to non-NULL
other than end_sync_read is safe to write to.
As the device is now faulty it doesn't make sense to write.
As the bio was recently used for a read, it is "dirty"
and not suitable for immediate submission.
In particular, ->bi_next might be non-NULL, which will cause
generic_make_request() to complain.
Break this interaction by refusing to write to devices
which are marked as Faulty.
Reported-and-tested-by: Michael Wang <yun.wang@profitbricks.com>
Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07a77929ba672d93642a56dc2255dd21e6e2290b upstream.
The author meant to free the variable that was just allocated, instead
of the one that failed to be allocated, but made a simple typo. This
patch rectifies that.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4a99f3c83dc493c8ea84693d78cd792839c8aa64 upstream.
The optimization for opaque dir create was wrongly being applied
also to non-dir create.
Fixes: 97c684cc9110 ("ovl: create directories inside merged parent opaque")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 85435d7a15294f9f7ef23469e6aaf7c5dfcc54f0 upstream.
SFM is mapping doublequote to 0xF020
Without this patch creating files with doublequote fails to Windows/Mac
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d8a6e505d6bba2250852fbc1c1c86fe68aaf9af3 upstream.
An open directory may have a NULL private_data pointer prior to readdir.
Fixes: 0de1f4c6f6c0 ("Add way to query server fs info for smb3")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3998e6b87d4258a70df358296d6f1c7234012bfe upstream.
When the final cifsFileInfo_put() is called from cifsiod and an oplock
break work is queued, lockdep complains loudly:
=============================================
[ INFO: possible recursive locking detected ]
4.11.0+ #21 Not tainted
---------------------------------------------
kworker/0:2/78 is trying to acquire lock:
("cifsiod"){++++.+}, at: flush_work+0x215/0x350
but task is already holding lock:
("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock("cifsiod");
lock("cifsiod");
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by kworker/0:2/78:
#0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
#1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
stack backtrace:
CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
Workqueue: cifsiod cifs_writev_complete
Call Trace:
dump_stack+0x85/0xc2
__lock_acquire+0x17dd/0x2260
? match_held_lock+0x20/0x2b0
? trace_hardirqs_off_caller+0x86/0x130
? mark_lock+0xa6/0x920
lock_acquire+0xcc/0x260
? lock_acquire+0xcc/0x260
? flush_work+0x215/0x350
flush_work+0x236/0x350
? flush_work+0x215/0x350
? destroy_worker+0x170/0x170
__cancel_work_timer+0x17d/0x210
? ___preempt_schedule+0x16/0x18
cancel_work_sync+0x10/0x20
cifsFileInfo_put+0x338/0x7f0
cifs_writedata_release+0x2a/0x40
? cifs_writedata_release+0x2a/0x40
cifs_writev_complete+0x29d/0x850
? preempt_count_sub+0x18/0xd0
process_one_work+0x304/0x8e0
worker_thread+0x9b/0x6a0
kthread+0x1b2/0x200
? process_one_work+0x8e0/0x8e0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x31/0x40
This is a real warning. Since the oplock is queued on the same
workqueue this can deadlock if there is only one worker thread active
for the workqueue (which will be the case during memory pressure when
the rescuer thread is handling it).
Furthermore, there is at least one other kind of hang possible due to
the oplock break handling if there is only worker. (This can be
reproduced without introducing memory pressure by having passing 1 for
the max_active parameter of cifsiod.) cifs_oplock_break() can wait
indefintely in the filemap_fdatawait() while the cifs_writev_complete()
work is blocked:
sysrq: SysRq : Show Blocked State
task PC stack pid father
kworker/0:1 D 0 16 2 0x00000000
Workqueue: cifsiod cifs_oplock_break
Call Trace:
__schedule+0x562/0xf40
? mark_held_locks+0x4a/0xb0
schedule+0x57/0xe0
io_schedule+0x21/0x50
wait_on_page_bit+0x143/0x190
? add_to_page_cache_lru+0x150/0x150
__filemap_fdatawait_range+0x134/0x190
? do_writepages+0x51/0x70
filemap_fdatawait_range+0x14/0x30
filemap_fdatawait+0x3b/0x40
cifs_oplock_break+0x651/0x710
? preempt_count_sub+0x18/0xd0
process_one_work+0x304/0x8e0
worker_thread+0x9b/0x6a0
kthread+0x1b2/0x200
? process_one_work+0x8e0/0x8e0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x31/0x40
dd D 0 683 171 0x00000000
Call Trace:
__schedule+0x562/0xf40
? mark_held_locks+0x29/0xb0
schedule+0x57/0xe0
io_schedule+0x21/0x50
wait_on_page_bit+0x143/0x190
? add_to_page_cache_lru+0x150/0x150
__filemap_fdatawait_range+0x134/0x190
? do_writepages+0x51/0x70
filemap_fdatawait_range+0x14/0x30
filemap_fdatawait+0x3b/0x40
filemap_write_and_wait+0x4e/0x70
cifs_flush+0x6a/0xb0
filp_close+0x52/0xa0
__close_fd+0xdc/0x150
SyS_close+0x33/0x60
entry_SYSCALL_64_fastpath+0x1f/0xbe
Showing all locks held in the system:
2 locks held by kworker/0:1/16:
#0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
#1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
Showing busy workqueues and worker pools:
workqueue cifsiod: flags=0xc
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
in-flight: 16:cifs_oplock_break
delayed: cifs_writev_complete, cifs_echo_request
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
Fix these problems by creating a a new workqueue (with a rescuer) for
the oplock break work.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6026685de33b0db5b2b6b0e9b41b3a1a3261033c upstream.
As with 618763958b22, an open directory may have a NULL private_data
pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this
before dereference.
Fixes: 834170c85978 ("Enable previous version support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0e5c795592930d51fd30d53a2e7b73cba022a29b upstream.
The server may respond with success, and an output buffer less than
sizeof(struct smb_snapshot_array) in length. Do not leak the output
buffer in this case.
Fixes: 834170c85978 ("Enable previous version support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b704e70b7cf48f9b67c07d585168e102dfa30bb4 upstream.
- trailing space maps to 0xF028
- trailing period maps to 0xF029
This fix corrects the mapping of file names which have a trailing character
that would otherwise be illegal (period or space) but is allowed by POSIX.
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7db0a6efdc3e990cdfd4b24820d010e9eb7890ad upstream.
Macs send the maximum buffer size in response on ioctl to validate
negotiate security information, which causes us to fail the mount
as the response buffer is larger than the expected response.
Changed ioctl response processing to allow for padding of validate
negotiate ioctl response and limit the maximum response size to
maximum buffer size.
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 26c9cb668c7fbf9830516b75d8bee70b699ed449 upstream.
Mac requires the unicode flag to be set for cifs, even for the smb
echo request (which doesn't have strings).
Without this Mac rejects the periodic echo requests (when mounting
with cifs) that we use to check if server is down
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7d0c234fd2e1c9ca3fa032696c0c58b1b74a9e0b upstream.
commit 620d8745b35d ("Introduce cifs_copy_file_range()") changes the
behaviour of the cifs ioctl call CIFS_IOC_COPYCHUNK_FILE. In case of
successful writes, it now returns the number of bytes written. This
return value is treated as an error by the xfstest cifs/001. Depending
on the errno set at that time, this may or may not result in the test
failing.
The patch fixes this by setting the return value to 0 in case of
successful writes.
Fixes: commit 620d8745b35d ("Introduce cifs_copy_file_range()")
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cd8c42968ee651b69e00f8661caff32b0086e82d upstream.
Incorrect return value for shares not using the prefix path means that
we will never match superblocks for these shares.
Fixes: commit c1d8b24d1819 ("Compare prepaths when comparing superblocks")
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 62be1511b1db8066220b18b7d4da2e6b9fdc69fb upstream.
Patch series "more robust PF_MEMALLOC handling"
This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim. There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag. This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier). Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them. Patches 3
and 4 convert non-mm code.
This patch (of 4):
__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()). Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed609 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.
Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true. There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.
Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath")
Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|