<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/uapi/linux/io_uring.h, branch v5.18.19</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.18.19</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.18.19'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-04-11T01:08:18Z</updated>
<entry>
<title>io_uring: flag the fact that linked file assignment is sane</title>
<updated>2022-04-11T01:08:18Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-04-10T21:13:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4212f3eb89fd5654f0a6ed2ee1d13fcb86cb664'/>
<id>urn:sha1:c4212f3eb89fd5654f0a6ed2ee1d13fcb86cb664</id>
<content type='text'>
Give applications a way to tell if the kernel supports sane linked files,
as in files being assigned at the right time to be able to reliably
do &lt;open file direct into slot X&gt;&lt;read file from slot X&gt; while using
IOSQE_IO_LINK to order them.

Not really a bug fix, but flag it as such so that it gets pulled in with
backports of the deferred file assignment.

Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: remove IORING_CQE_F_MSG</title>
<updated>2022-03-24T12:53:18Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-03-24T12:53:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ef66d186eb95f987a97fb3329b65c840e2dc9bf'/>
<id>urn:sha1:7ef66d186eb95f987a97fb3329b65c840e2dc9bf</id>
<content type='text'>
This was introduced with the message ring opcode, but isn't strictly
required for the request itself. The sender can encode what is needed
in user_data, which is passed to the receiver. It's unclear if having
a separate flag that essentially says "This CQE did not originate from
an SQE on this ring" provides any real utility to applications. While
we can always re-introduce a flag to provide this information, we cannot
take it away at a later point in time.

Remove the flag while we still can, before it's in a released kernel.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: allow submissions to continue on error</title>
<updated>2022-03-10T20:05:25Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-03-10T19:59:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bcbb7bf6ccde7cb969a5642879832bc84ebf06a3'/>
<id>urn:sha1:bcbb7bf6ccde7cb969a5642879832bc84ebf06a3</id>
<content type='text'>
By default, io_uring will stop submitting a batch of requests if we run
into an error submitting a request. This isn't strictly necessary, as
the error result is passed out-of-band via a CQE anyway. And it can be
a bit confusing for some applications.

Provide a way to setup a ring that will continue submitting on error,
when the error CQE has been posted.

There's still one case that will break out of submission. If we fail
allocating a request, then we'll still return -ENOMEM. We could in theory
post a CQE for that condition too even if we never got a request. Leave
that for a potential followup.

Reported-by: Dylan Yudaken &lt;dylany@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add support for IORING_OP_MSG_RING command</title>
<updated>2022-03-10T16:16:04Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-03-10T13:27:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4f57f06ce2186c31c3da52386125dc57b1cd6f96'/>
<id>urn:sha1:4f57f06ce2186c31c3da52386125dc57b1cd6f96</id>
<content type='text'>
This adds support for IORING_OP_MSG_RING, which allows an SQE to signal
another ring. That allows either waking up someone waiting on the ring,
or even passing a 64-bit value via the user_data field in the CQE.

sqe-&gt;fd must contain the fd of a ring that should receive the CQE.
sqe-&gt;off will be propagated to the cqe-&gt;user_data on the target ring,
and sqe-&gt;len will be propagated to cqe-&gt;res. The results CQE will have
IORING_CQE_F_MSG set in its flags, to indicate that this CQE was generated
from a messaging request rather than a SQE issued locally on that ring.
This effectively allows passing a 64-bit and a 32-bit quantify between
the two rings.

This request type has the following request specific error cases:

- -EBADFD. Set if the sqe-&gt;fd doesn't point to a file descriptor that is
  of the io_uring type.
- -EOVERFLOW. Set if we were not able to deliver a request to the target
  ring.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add support for registering ring file descriptors</title>
<updated>2022-03-10T13:32:49Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-03-04T15:22:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e7a6c00dc77aedf27a601738ea509f1caea6d673'/>
<id>urn:sha1:e7a6c00dc77aedf27a601738ea509f1caea6d673</id>
<content type='text'>
Lots of workloads use multiple threads, in which case the file table is
shared between them. This makes getting and putting the ring file
descriptor for each io_uring_enter(2) system call more expensive, as it
involves an atomic get and put for each call.

Similarly to how we allow registering normal file descriptors to avoid
this overhead, add support for an io_uring_register(2) API that allows
to register the ring fds themselves:

1) IORING_REGISTER_RING_FDS - takes an array of io_uring_rsrc_update
   structs, and registers them with the task.
2) IORING_UNREGISTER_RING_FDS - takes an array of io_uring_src_update
   structs, and unregisters them.

When a ring fd is registered, it is internally represented by an offset.
This offset is returned to the application, and the application then
uses this offset and sets IORING_ENTER_REGISTERED_RING for the
io_uring_enter(2) system call. This works just like using a registered
file descriptor, rather than a real one, in an SQE, where
IOSQE_FIXED_FILE gets set to tell io_uring that we're using an internal
offset/descriptor rather than a real file descriptor.

In initial testing, this provides a nice bump in performance for
threaded applications in real world cases where the batch count (eg
number of requests submitted per io_uring_enter(2) invocation) is low.
In a microbenchmark, submitting NOP requests, we see the following
increases in performance:

Requests per syscall	Baseline	Registered	Increase
----------------------------------------------------------------
1			 ~7030K		 ~8080K		+15%
2			~13120K		~14800K		+13%
4			~22740K		~25300K		+11%

Co-developed-by: Xiaoguang Wang &lt;xiaoguang.wang@linux.alibaba.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add option to skip CQE posting</title>
<updated>2021-11-24T18:17:53Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2021-11-10T15:49:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=04c76b41ca974b508522831441dd7e5b1b59cbb0'/>
<id>urn:sha1:04c76b41ca974b508522831441dd7e5b1b59cbb0</id>
<content type='text'>
Emitting a CQE is expensive from the kernel perspective. Often, it's
also not convenient for the userspace, spends some cycles on processing
and just complicates the logic. A similar problems goes for linked
requests, where we post an CQE for each request in the link.

Introduce a new flags, IOSQE_CQE_SKIP_SUCCESS, trying to help with it.
When set and a request completed successfully, it won't generate a CQE.
When fails, it produces an CQE, but all following linked requests will
be CQE-less, regardless whether they have IOSQE_CQE_SKIP_SUCCESS or not.
The notion of "fail" is the same as for link failing-cancellation, where
it's opcode dependent, and _usually_ result &gt;= 0 is a success, but not
always.

Linked timeouts are a bit special. When the requests it's linked to was
not attempted to be executed, e.g. failing linked requests, it follows
the description above. Otherwise, whether a linked timeout will post a
completion or not solely depends on IOSQE_CQE_SKIP_SUCCESS of that
linked timeout request. Linked timeout never "fail" during execution, so
for them it's unconditional. It's expected for users to not really care
about the result of it but rely solely on the result of the master
request. Another reason for such a treatment is that it's racy, and the
timeout callback may be running awhile the master request posts its
completion.

use case 1:
If one doesn't care about results of some requests, e.g. normal
timeouts, just set IOSQE_CQE_SKIP_SUCCESS. Error result will still be
posted and need to be handled.

use case 2:
Set IOSQE_CQE_SKIP_SUCCESS for all requests of a link but the last,
and it'll post a completion only for the last one if everything goes
right, otherwise there will be one only one CQE for the first failed
request.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/0220fbe06f7cf99e6fc71b4297bb1cb6c0e89c2c.1636559119.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add flag to not fail link after timeout</title>
<updated>2021-10-19T11:49:54Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2021-10-02T18:36:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6224590d242fc8c6b26941328e02a40b4384949b'/>
<id>urn:sha1:6224590d242fc8c6b26941328e02a40b4384949b</id>
<content type='text'>
For some reason non-off IORING_OP_TIMEOUT always fails links, it's
pretty inconvenient and unnecessary limits chaining after it to hard
linking, which is far from ideal, e.g. doesn't pair well with timeout
cancellation. Add a flag forcing it to not fail links on -ETIME.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items</title>
<updated>2021-09-13T16:38:13Z</updated>
<author>
<name>Eugene Syromiatnikov</name>
<email>esyr@redhat.com</email>
</author>
<published>2021-09-13T15:44:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd47c104533dedb90434a3f142e94a671ac623a6'/>
<id>urn:sha1:dd47c104533dedb90434a3f142e94a671ac623a6</id>
<content type='text'>
The items passed in the array pointed by the arg parameter
of IORING_REGISTER_IOWQ_MAX_WORKERS io_uring_register operation
carry certain semantics: they refer to different io-wq worker categories;
provide IO_WQ_* constants in the UAPI, so these categories can be referenced
in the user space code.

Suggested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Complements: 2e480058ddc21ec5 ("io-wq: provide a way to limit max number of workers")
Signed-off-by: Eugene Syromiatnikov &lt;esyr@redhat.com&gt;
Link: https://lore.kernel.org/r/20210913154415.GA12890@asgard.redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-block</title>
<updated>2021-08-31T02:39:59Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-08-31T02:39:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b91db6a0b52e019b6bdabea3f1dbe36d85c7e52c'/>
<id>urn:sha1:b91db6a0b52e019b6bdabea3f1dbe36d85c7e52c</id>
<content type='text'>
Pull io_uring mkdirat/symlinkat/linkat support from Jens Axboe:
 "This adds io_uring support for mkdirat, symlinkat, and linkat"

* tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-block:
  io_uring: add support for IORING_OP_LINKAT
  io_uring: add support for IORING_OP_SYMLINKAT
  io_uring: add support for IORING_OP_MKDIRAT
  namei: update do_*() helpers to return ints
  namei: make do_linkat() take struct filename
  namei: add getname_uflags()
  namei: make do_symlinkat() take struct filename
  namei: make do_mknodat() take struct filename
  namei: make do_mkdirat() take struct filename
  namei: change filename_parentat() calling conventions
  namei: ignore ERR/NULL names in putname()
</content>
</entry>
<entry>
<title>io_uring: allow updating linked timeouts</title>
<updated>2021-08-29T22:12:21Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2021-08-29T01:54:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f1042b6ccb887f07301f6b096b3d0cfcf9189323'/>
<id>urn:sha1:f1042b6ccb887f07301f6b096b3d0cfcf9189323</id>
<content type='text'>
We allow updating normal timeouts, add support for adjusting timings of
linked timeouts as well.

Reported-by: Victor Stewart &lt;v@nametag.social&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
