<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include, branch v5.0.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.0.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.0.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-05-14T17:17:13Z</updated>
<entry>
<title>cpu/speculation: Add 'mitigations=' cmdline option</title>
<updated>2019-05-14T17:17:13Z</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2019-04-12T20:39:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6d7407ef9272ad3b7d12ca9634075bf5b4aef49d'/>
<id>urn:sha1:6d7407ef9272ad3b7d12ca9634075bf5b4aef49d</id>
<content type='text'>
commit 98af8452945c55652de68536afdde3b520fec429 upstream

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Jiri Kosina &lt;jkosina@suse.cz&gt; (on x86)
Reviewed-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: "H . Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jiri Kosina &lt;jikos@kernel.org&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Jon Masters &lt;jcm@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Steven Price &lt;steven.price@arm.com&gt;
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/speculation/mds: Add sysfs reporting for MDS</title>
<updated>2019-05-14T17:17:11Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-02-18T21:51:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3944139ce82897fca4b3c374cd5e7a3208af340b'/>
<id>urn:sha1:3944139ce82897fca4b3c374cd5e7a3208af340b</id>
<content type='text'>
commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream

Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Jon Masters &lt;jcm@redhat.com&gt;
Tested-by: Jon Masters &lt;jcm@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Bluetooth: Align minimum encryption key size for LE and BR/EDR connections</title>
<updated>2019-05-10T16:36:13Z</updated>
<author>
<name>Marcel Holtmann</name>
<email>marcel@holtmann.org</email>
</author>
<published>2019-04-24T20:19:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2c93762f4b38d894158bdf73a783685d39d6a981'/>
<id>urn:sha1:2c93762f4b38d894158bdf73a783685d39d6a981</id>
<content type='text'>
commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream.

The minimum encryption key size for LE connections is 56 bits and to
align LE with BR/EDR, enforce 56 bits of minimum encryption key size for
BR/EDR connections as well.

Signed-off-by: Marcel Holtmann &lt;marcel@holtmann.org&gt;
Signed-off-by: Johan Hedberg &lt;johan.hedberg@intel.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>nvmet: fix discover log page when offsets are used</title>
<updated>2019-05-10T16:36:11Z</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2019-04-09T16:03:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6f2733f48cb0f11d30813b48965f21cae7cb8c6'/>
<id>urn:sha1:e6f2733f48cb0f11d30813b48965f21cae7cb8c6</id>
<content type='text'>
[ Upstream commit d808b7f759b50acf0784ce6230ffa63e12ef465d ]

The nvme target hadn't been taking the Get Log Page offset parameter
into consideration, and so has been returning corrupted log pages when
offsets are used. Since many tools, including nvme-cli, split the log
request to 4k, we've been breaking discovery log responses when more
than 3 subsystems exist.

Fix the returned data by internally generating the entire discovery
log page and copying only the requested bytes into the user buffer. The
command log page offset type has been modified to a native __le64 to
make it easier to extract the value from a command.

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Tested-by: Minwoo Im &lt;minwoo.im@samsung.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;chaitanya.kulkarni@wdc.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: introduce blk_mq_complete_request_sync()</title>
<updated>2019-05-10T16:36:11Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-08T22:31:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e62732d12bd97ec7a122299356c23e4948db11bc'/>
<id>urn:sha1:e62732d12bd97ec7a122299356c23e4948db11bc</id>
<content type='text'>
[ Upstream commit 1b8f21b74c3c9c82fce5a751d7aefb7cc0b8d33d ]

In NVMe's error handler, follows the typical steps of tearing down
hardware for recovering controller:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
	blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
	mark the request as abort
	blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because blk_mq_complete_request()
may run q-&gt;mq_ops-&gt;complete(rq) remotelly and asynchronously, and
-&gt;complete(rq) may be run after #4.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.

Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: linux-nvme@lists.infradead.org
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>KEYS: trusted: fix -Wvarags warning</title>
<updated>2019-05-10T16:36:10Z</updated>
<author>
<name>ndesaulniers@google.com</name>
<email>ndesaulniers@google.com</email>
</author>
<published>2018-10-22T23:43:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=353392e5b9a56f59fd9b12e3d51446b7ef7f5119'/>
<id>urn:sha1:353392e5b9a56f59fd9b12e3d51446b7ef7f5119</id>
<content type='text'>
[ Upstream commit be24b37e22c20cbaa891971616784dd0f35211e8 ]

Fixes the warning reported by Clang:
security/keys/trusted.c:146:17: warning: passing an object that
undergoes default
      argument promotion to 'va_start' has undefined behavior [-Wvarargs]
        va_start(argp, h3);
                       ^
security/keys/trusted.c:126:37: note: parameter of type 'unsigned
char' is declared here
unsigned char *h2, unsigned char h3, ...)
                               ^
Specifically, it seems that both the C90 (4.8.1.1) and C11 (7.16.1.4)
standards explicitly call this out as undefined behavior:

The parameter parmN is the identifier of the rightmost parameter in
the variable parameter list in the function definition (the one just
before the ...). If the parameter parmN is declared with ... or with a
type that is not compatible with the type that results after
application of the default argument promotions, the behavior is
undefined.

Link: https://github.com/ClangBuiltLinux/linux/issues/41
Link: https://www.eskimo.com/~scs/cclass/int/sx11c.html
Suggested-by: David Laight &lt;David.Laight@aculab.com&gt;
Suggested-by: Denis Kenzior &lt;denkenz@gmail.com&gt;
Suggested-by: James Bottomley &lt;jejb@linux.vnet.ibm.com&gt;
Suggested-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Reviewed-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Tested-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Reviewed-by: Jarkko Sakkinen &lt;jarkko.sakkinen@linux.intel.com&gt;
Signed-off-by: Jarkko Sakkinen &lt;jarkko.sakkinen@linux.intel.com&gt;
Signed-off-by: James Morris &lt;james.morris@microsoft.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()</title>
<updated>2019-05-10T16:36:08Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2019-03-29T21:46:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e94d4e8f2b9ea26e933b8cd1fef5b4b7a3f23db'/>
<id>urn:sha1:2e94d4e8f2b9ea26e933b8cd1fef5b4b7a3f23db</id>
<content type='text'>
[ Upstream commit a0fe2c6479aab5723239b315ef1b552673f434a3 ]

Use parentheses around uses of the argument in u64_to_user_ptr() to
ensure that the cast doesn't apply to part of the argument.

There are existing uses of the macro of the form

  u64_to_user_ptr(A + B)

which expands to

  (void __user *)(uintptr_t)A + B

(the cast applies to the first operand of the addition, the addition
is a pointer addition). This happens to still work as intended, the
semantic difference doesn't cause a difference in behavior.

But I want to use u64_to_user_ptr() with a ternary operator in the
argument, like so:

  u64_to_user_ptr(A ? B : C)

This currently doesn't work as intended.

Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Mukesh Ojha &lt;mojha@codeaurora.org&gt;
Cc: Andrei Vagin &lt;avagin@openvz.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jani Nikula &lt;jani.nikula@intel.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: NeilBrown &lt;neilb@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qiaowei Ren &lt;qiaowei.ren@intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ASoC: dpcm: prevent snd_soc_dpcm use after free</title>
<updated>2019-05-10T16:36:05Z</updated>
<author>
<name>KaiChieh Chuang</name>
<email>kaichieh.chuang@mediatek.com</email>
</author>
<published>2019-03-08T05:05:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f4f4303c6d54d7d7238776ff9afba7fbaeddf384'/>
<id>urn:sha1:f4f4303c6d54d7d7238776ff9afba7fbaeddf384</id>
<content type='text'>
[ Upstream commit a9764869779081e8bf24da07ac040e8f3efcf13a ]

The dpcm get from fe_clients/be_clients
may be free before use

Add a spin lock at snd_soc_card level,
to protect the dpcm instance.
The lock may be used in atomic context, so use spin lock.

Use irq spin lock version,
since the lock may be used in interrupts.

possible race condition between
void dpcm_be_disconnect(
	...
	list_del(&amp;dpcm-&gt;list_be);
	list_del(&amp;dpcm-&gt;list_fe);
	kfree(dpcm);
	...

and
	for_each_dpcm_fe()
	for_each_dpcm_be*()

race condition example
Thread 1:
    snd_soc_dapm_mixer_update_power()
        -&gt; soc_dpcm_runtime_update()
            -&gt; dpcm_be_disconnect()
                -&gt; kfree(dpcm);
Thread 2:
    dpcm_fe_dai_trigger()
        -&gt; dpcm_be_dai_trigger()
            -&gt; snd_soc_dpcm_can_be_free_stop()
                -&gt; if (dpcm-&gt;fe == fe)

Excpetion Scenario:
	two FE link to same BE
	FE1 -&gt; BE
	FE2 -&gt;

	Thread 1: switch of mixer between FE2 -&gt; BE
	Thread 2: pcm_stop FE1

Exception:

Unable to handle kernel paging request at virtual address dead0000000000e0

pc=&lt;&gt; [&lt;ffffff8960e2cd10&gt;] dpcm_be_dai_trigger+0x29c/0x47c
	sound/soc/soc-pcm.c:3226
		if (dpcm-&gt;fe == fe)
lr=&lt;&gt; [&lt;ffffff8960e2f694&gt;] dpcm_fe_dai_do_trigger+0x94/0x26c

Backtrace:
[&lt;ffffff89602dba80&gt;] notify_die+0x68/0xb8
[&lt;ffffff896028c7dc&gt;] die+0x118/0x2a8
[&lt;ffffff89602a2f84&gt;] __do_kernel_fault+0x13c/0x14c
[&lt;ffffff89602a27f4&gt;] do_translation_fault+0x64/0xa0
[&lt;ffffff8960280cf8&gt;] do_mem_abort+0x4c/0xd0
[&lt;ffffff8960282ad0&gt;] el1_da+0x24/0x40
[&lt;ffffff8960e2cd10&gt;] dpcm_be_dai_trigger+0x29c/0x47c
[&lt;ffffff8960e2f694&gt;] dpcm_fe_dai_do_trigger+0x94/0x26c
[&lt;ffffff8960e2edec&gt;] dpcm_fe_dai_trigger+0x3c/0x44
[&lt;ffffff8960de5588&gt;] snd_pcm_do_stop+0x50/0x5c
[&lt;ffffff8960dded24&gt;] snd_pcm_action+0xb4/0x13c
[&lt;ffffff8960ddfdb4&gt;] snd_pcm_drop+0xa0/0x128
[&lt;ffffff8960de69bc&gt;] snd_pcm_common_ioctl+0x9d8/0x30f0
[&lt;ffffff8960de1cac&gt;] snd_pcm_ioctl_compat+0x29c/0x2f14
[&lt;ffffff89604c9d60&gt;] compat_SyS_ioctl+0x128/0x244
[&lt;ffffff8960283740&gt;] el0_svc_naked+0x34/0x38
[&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

Signed-off-by: KaiChieh Chuang &lt;kaichieh.chuang@mediatek.com&gt;
Signed-off-by: Mark Brown &lt;broonie@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>clk: x86: Add system specific quirk to mark clocks as critical</title>
<updated>2019-05-08T05:22:59Z</updated>
<author>
<name>David Müller</name>
<email>dave.mueller@gmx.ch</email>
</author>
<published>2019-04-08T13:33:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ad5ef880a0e38162adea1b4c50597a395402814f'/>
<id>urn:sha1:ad5ef880a0e38162adea1b4c50597a395402814f</id>
<content type='text'>
commit 7c2e07130090ae001a97a6b65597830d6815e93e upstream.

Since commit 648e921888ad ("clk: x86: Stop marking clocks as
CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are
unconditionally gated off. Unfortunately this will break systems where these
clocks are used for external purposes beyond the kernel's knowledge. Fix it
by implementing a system specific quirk to mark the necessary pmc_plt_clks as
critical.

Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Signed-off-by: David Müller &lt;dave.mueller@gmx.ch&gt;
Signed-off-by: Hans de Goede &lt;hdegoede@redhat.com&gt;
Reviewed-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Signed-off-by: Stephen Boyd &lt;sboyd@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock</title>
<updated>2019-05-08T05:22:55Z</updated>
<author>
<name>Kirill Smelkov</name>
<email>kirr@nexedi.com</email>
</author>
<published>2019-03-26T22:20:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e42ae9c9b8640bc05a8ea1e223c7aae17eef3ef2'/>
<id>urn:sha1:e42ae9c9b8640bc05a8ea1e223c7aae17eef3ef2</id>
<content type='text'>
[ Upstream commit 10dce8af34226d90fa56746a934f8da5dcdba3df ]

Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.

This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.

The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.

However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655e3 -
is doing so irregardless of whether a file is seekable or not.

See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

for historic context.

The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:

	kernel/power/user.c		snapshot_read
	fs/debugfs/file.c		u32_array_read
	fs/fuse/control.c		fuse_conn_waiting_read + ...
	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
	arch/s390/hypfs/inode.c		hypfs_read_iter
	...

Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -&gt; deadlock.

Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):

	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.

FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:

See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

Let's fix this regression. The plan is:

1. We can't change nonseekable_open to include &amp;~FMODE_ATOMIC_POS -
   doing so would break many in-kernel nonseekable_open users which
   actually use ppos in read/write handlers.

2. Add stream_open() to kernel to open stream-like non-seekable file
   descriptors. Read and write on such file descriptors would never use
   nor change ppos. And with that property on stream-like files read and
   write will be running without taking f_pos lock - i.e. read and write
   could be running simultaneously.

3. With semantic patch search and convert to stream_open all in-kernel
   nonseekable_open users for which read and write actually do not
   depend on ppos and where there is no other methods in file_operations
   which assume @offset access.

4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
   steam_open if that bit is present in filesystem open reply.

   It was tempting to change fs/fuse/ open handler to use stream_open
   instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
   grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
   and in particular GVFS which actually uses offset in its read and
   write handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

   so if we would do such a change it will break a real user.

5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
   from v3.14+ (the kernel where 9c225f2655 first appeared).

   This will allow to patch OSSPD and other FUSE filesystems that
   provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
   in their open handler and this way avoid the deadlock on all kernel
   versions. This should work because fs/fuse/ ignores unknown open
   flags returned from a filesystem and so passing FOPEN_STREAM to a
   kernel that is not aware of this flag cannot hurt. In turn the kernel
   that is not aware of FOPEN_STREAM will be &lt; v3.14 where just
   FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
   write deadlock.

This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.

Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.

The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Yongzhi Pan &lt;panyongzhi@gmail.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Julia Lawall &lt;Julia.Lawall@lip6.fr&gt;
Cc: Nikolaus Rath &lt;Nikolaus@rath.org&gt;
Cc: Han-Wen Nienhuys &lt;hanwen@google.com&gt;
Signed-off-by: Kirill Smelkov &lt;kirr@nexedi.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin (Microsoft) &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
