user/sven/linux.git/drivers/vhost, branch v4.4.266

vhost_net: fix ubuf refcount incorrectly when sendmsg fails

2021-01-12T18:47:54Z

[ Upstream commit 01e31bea7e622f1890c274f4aaaaf8bccd296aa5 ] Currently the vhost_zerocopy_callback() maybe be called to decrease the refcount when sendmsg fails in tun. The error handling in vhost handle_tx_zerocopy() will try to decrease the same refcount again. This is wrong. To fix this issue, we only call vhost_net_ubuf_put() when vq->heads[nvq->desc].len == VHOST_DMA_IN_PROGRESS. Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Signed-off-by: Yunjian Wang Acked-by: Willem de Bruijn Acked-by: Michael S. Tsirkin Acked-by: Jason Wang Link: https://lore.kernel.org/r/1609207308-20544-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman

vringh: fix __vringh_iov() when riov and wiov are different

2020-11-10T09:22:17Z

commit 5745bcfbbf89b158416075374254d3c013488f21 upstream. If riov and wiov are both defined and they point to different objects, only riov is initialized. If the wiov is not initialized by the caller, the function fails returning -EINVAL and printing "Readable desc 0x... after writable" error message. This issue happens when descriptors have both readable and writable buffers (eg. virtio-blk devices has virtio_blk_outhdr in the readable buffer and status as last byte of writable buffer) and we call __vringh_iov() to get both type of buffers in two different iovecs. Let's replace the 'else if' clause with 'if' to initialize both riov and wiov if they are not NULL. As checkpatch pointed out, we also avoid crashing the kernel when riov and wiov are both NULL, replacing BUG() with WARN_ON() and returning -EINVAL. Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella Link: https://lore.kernel.org/r/20201008204256.162292-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman

vhost: Check docket sk_family instead of call getname

2020-04-02T17:02:34Z

[ Upstream commit 42d84c8490f9f0931786f1623191fcab397c3d64 ] Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

vhost: make sure log_num < in_num

2019-09-16T06:13:36Z

commit 060423bfdee3f8bc6e2c1bac97de24d5415e2bc4 upstream. The code assumes log_num < in_num everywhere, and that is true as long as in_num is incremented by descriptor iov count, and log_num by 1. However this breaks if there's a zero sized descriptor. As a result, if a malicious guest creates a vring desc with desc.len = 0, it may cause the host kernel to crash by overflowing the log array. This bug can be triggered during the VM migration. There's no need to log when desc.len = 0, so just don't increment log_num in this case. Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server") Cc: stable@vger.kernel.org Reviewed-by: Lidong Chen Signed-off-by: ruippan Signed-off-by: yongduan Acked-by: Michael S. Tsirkin Reviewed-by: Tyler Hicks Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman

vhost/test: fix build for vhost test

2019-09-16T06:13:35Z

commit 264b563b8675771834419057cbe076c1a41fb666 upstream. Since vhost_exceeds_weight() was introduced, callers need to specify the packet weight and byte weight in vhost_dev_init(). Note that, the packet weight isn't counted in this patch to keep the original behavior unchanged. Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()") Cc: stable@vger.kernel.org Signed-off-by: Tiwei Bie Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: Greg Kroah-Hartman

vhost: scsi: add weight support

2019-09-06T08:18:12Z

commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream. This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable. This addresses CVE-2019-3900. Cc: Paolo Bonzini Cc: Stefan Hajnoczi Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefan Hajnoczi [bwh: Backported to 4.4: - Drop changes in vhost_scsi_ctl_handle_vq() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin

vhost_net: fix possible infinite loop

2019-09-06T08:18:12Z

commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream. When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet: while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... } This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like: 1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when: - userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers This addresses CVE-2019-3900. Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael S. Tsirkin [bwh: Backported to 4.4: - Both Tx modes are handled in one loop in handle_tx() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin

vhost: introduce vhost_exceeds_weight()

2019-09-06T08:18:12Z

commit e82b9b0727ff6d665fff2d326162b460dded554d upstream. We used to have vhost_exceeds_weight() for vhost-net to: - prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed. Signed-off-by: Jason Wang Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael S. Tsirkin [bwh: Backported to 4.4: - Drop changes to vhost_vsock - In vhost_net, both Tx modes are handled in one loop in handle_tx() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin

vhost_net: introduce vhost_exceeds_weight()

2019-09-06T08:18:12Z

commit 272f35cba53d088085e5952fd81d7a133ab90789 upstream. Signed-off-by: Jason Wang Signed-off-by: David S. Miller [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin

vhost_net: use packet weight for rx handler, too

2019-09-06T08:18:11Z

commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3 upstream. Similar to commit a2ac99905f1e ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling. The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit. The selected limit has been validated with PVP[1] performance test with different queue sizes: queue size 256 512 1024 baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418 A packet weight of 256 gives peek performances in under all the tested scenarios. No measurable regression in unidirectional performance tests has been detected. [1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/ Signed-off-by: Paolo Abeni Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin