<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/io_uring, branch v6.1.41</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.41</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.41'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-07-19T14:22:18Z</updated>
<entry>
<title>io_uring: Use io_schedule* in cqring wait</title>
<updated>2023-07-19T14:22:18Z</updated>
<author>
<name>Andres Freund</name>
<email>andres@anarazel.de</email>
</author>
<published>2023-07-16T18:13:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f32dfc802e8733028088edf54499d5669cb0ef69'/>
<id>urn:sha1:f32dfc802e8733028088edf54499d5669cb0ef69</id>
<content type='text'>
Commit 8a796565cec3601071cbbd27d6304e202019d014 upstream.

I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().

The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0

This is repeatable with different filesystems, using raw block devices
and using different block devices.

Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.

After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).

There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.

Cc: stable@vger.kernel.org # 5.10+
Cc: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Cc: io-uring@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andres Freund &lt;andres@anarazel.de&gt;
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de
[axboe: minor style fixup]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring: wait interruptibly for request completions on exit</title>
<updated>2023-07-19T14:22:09Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-12T03:14:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b50d6e06cca7b67a3d73ca660dda27662b76e6ea'/>
<id>urn:sha1:b50d6e06cca7b67a3d73ca660dda27662b76e6ea</id>
<content type='text'>
commit 4826c59453b3b4677d6bf72814e7ababdea86949 upstream.

WHen the ring exits, cleanup is done and the final cancelation and
waiting on completions is done by io_ring_exit_work. That function is
invoked by kworker, which doesn't take any signals. Because of that, it
doesn't really matter if we wait for completions in TASK_INTERRUPTIBLE
or TASK_UNINTERRUPTIBLE state. However, it does matter to the hung task
detection checker!

Normally we expect cancelations and completions to happen rather
quickly. Some test cases, however, will exit the ring and park the
owning task stopped (eg via SIGSTOP). If the owning task needs to run
task_work to complete requests, then io_ring_exit_work won't make any
progress until the task is runnable again. Hence io_ring_exit_work can
trigger the hung task detection, which is particularly problematic if
panic-on-hung-task is enabled.

As the ring exit doesn't take signals to begin with, have it wait
interruptibly rather than uninterruptibly. io_uring has a separate
stuck-exit warning that triggers independently anyway, so we're not
really missing anything by making this switch.

Cc: stable@vger.kernel.org # 5.10+
Link: https://lore.kernel.org/r/b0e4aaef-7088-56ce-244c-976edeac0e66@kernel.dk
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr</title>
<updated>2023-06-28T09:12:33Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-20T22:11:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eff07bf118411b9d2de8fd2f949d23c8dc24c3dc'/>
<id>urn:sha1:eff07bf118411b9d2de8fd2f949d23c8dc24c3dc</id>
<content type='text'>
[ Upstream commit 26fed83653d0154704cadb7afc418f315c7ac1f0 ]

Rather than assign the user pointer to msghdr-&gt;msg_control, assign it
to msghdr-&gt;msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f4c ("io_uring/net: save msghdr-&gt;msg_control for retries")
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>io_uring/poll: serialize poll linked timer start with poll removal</title>
<updated>2023-06-28T09:12:27Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-18T01:50:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=24f473769e7ecf35e2772469a063d5a8bbca6f63'/>
<id>urn:sha1:24f473769e7ecf35e2772469a063d5a8bbca6f63</id>
<content type='text'>
Commit ef7dfac51d8ed961b742218f526bd589f3900a59 upstream.

We selectively grab the ctx-&gt;uring_lock for poll update/removal, but
we really should grab it from the start to fully synchronize with
linked timeouts. Normally this is indeed the case, but if requests
are forced async by the application, we don't fully cover removal
and timer disarm within the uring_lock.

Make this simpler by having consistent locking state for poll removal.

Cc: stable@vger.kernel.org # 6.1+
Reported-by: Querijn Voet &lt;querijnqyn@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/net: disable partial retries for recvmsg with cmsg</title>
<updated>2023-06-28T09:12:24Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-19T15:41:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1d9dc9bed9996f35bd7b1ceec690e617f760bc58'/>
<id>urn:sha1:1d9dc9bed9996f35bd7b1ceec690e617f760bc58</id>
<content type='text'>
commit 78d0d2063bab954d19a1696feae4c7706a626d48 upstream.

We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.

Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.

Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher &lt;metze@samba.org&gt;
Reviewed-by: Stefan Metzmacher &lt;metze@samba.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/net: clear msg_controllen on partial sendmsg retry</title>
<updated>2023-06-28T09:12:24Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-19T15:35:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4d729cc67b05b873055668c1868578f74ce17703'/>
<id>urn:sha1:4d729cc67b05b873055668c1868578f74ce17703</id>
<content type='text'>
commit b1dc492087db0f2e5a45f1072a743d04618dd6be upstream.

If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.

Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr-&gt;msg_control for retries")
Reported-by: Stefan Metzmacher &lt;metze@samba.org&gt;
Reviewed-by: Stefan Metzmacher &lt;metze@samba.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/net: save msghdr-&gt;msg_control for retries</title>
<updated>2023-06-21T14:00:55Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-12T19:51:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c9c3163c7ab901fb1f184a821285851534ba46e2'/>
<id>urn:sha1:c9c3163c7ab901fb1f184a821285851534ba46e2</id>
<content type='text'>
commit cac9e4418f4cbd548ccb065b3adcafe073f7f7d2 upstream.

If the application sets -&gt;msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski &lt;marek@cloudflare.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring: unlock sqd-&gt;lock before sq thread release CPU</title>
<updated>2023-06-21T14:00:53Z</updated>
<author>
<name>Wenwen Chen</name>
<email>wenwen.chen@samsung.com</email>
</author>
<published>2023-05-25T08:26:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c9c205945033a41f3e5e38ffefd24270bb2bc393'/>
<id>urn:sha1:c9c205945033a41f3e5e38ffefd24270bb2bc393</id>
<content type='text'>
[ Upstream commit 533ab73f5b5c95dcb4152b52d5482abcc824c690 ]

The sq thread actively releases CPU resources by calling the
cond_resched() and schedule() interfaces when it is idle. Therefore,
more resources are available for other threads to run.

There exists a problem in sq thread: it does not unlock sqd-&gt;lock before
releasing CPU resources every time. This makes other threads pending on
sqd-&gt;lock for a long time. For example, the following interfaces all
require sqd-&gt;lock: io_sq_offload_create(), io_register_iowq_max_workers()
and io_ring_exit_work().

Before the sq thread releases CPU resources, unlocking sqd-&gt;lock will
provide the user a better experience because it can respond quickly to
user requests.

Signed-off-by: Kanchan Joshi&lt;joshi.k@samsung.com&gt;
Signed-off-by: Wenwen Chen&lt;wenwen.chen@samsung.com&gt;
Link: https://lore.kernel.org/r/20230525082626.577862-1-wenwen.chen@samsung.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>io_uring: undeprecate epoll_ctl support</title>
<updated>2023-06-09T08:34:23Z</updated>
<author>
<name>Ben Noordhuis</name>
<email>info@bnoordhuis.nl</email>
</author>
<published>2023-05-06T09:55:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6fb0b098f6905a9e4bcf37516773d04e98de6b17'/>
<id>urn:sha1:6fb0b098f6905a9e4bcf37516773d04e98de6b17</id>
<content type='text'>
commit 4ea0bf4b98d66a7a790abb285539f395596bae92 upstream.

Libuv recently started using it so there is at least one consumer now.

Cc: stable@vger.kernel.org
Fixes: 61a2732af4b0 ("io_uring: deprecate epoll_ctl support")
Link: https://github.com/libuv/libuv/pull/3979
Signed-off-by: Ben Noordhuis &lt;info@bnoordhuis.nl&gt;
Link: https://lore.kernel.org/r/20230506095502.13401-1-info@bnoordhuis.nl
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/rsrc: use nospec'ed indexes</title>
<updated>2023-05-11T14:03:24Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2023-04-13T14:28:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5611be6c3d9c7d0c49d3e868892b237ab4e5fc63'/>
<id>urn:sha1:5611be6c3d9c7d0c49d3e868892b237ab4e5fc63</id>
<content type='text'>
[ Upstream commit 953c37e066f05a3dca2d74643574b8dfe8a83983 ]

We use array_index_nospec() for registered buffer indexes, but don't use
it while poking into rsrc tags, fix that.

Fixes: 634d00df5e1cf ("io_uring: add full-fledged dynamic buffers support")
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/f02fafc5a9c0dd69be2b0618c38831c078232ff0.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
