user/sven/linux.git/drivers/vhost, branch v6.6.54

vhost_vdpa: assign irq bypass producer token correctly

2024-10-04T14:29:40Z

[ Upstream commit 02e9e9366fefe461719da5d173385b6685f70319 ] We used to call irq_bypass_unregister_producer() in vhost_vdpa_setup_vq_irq() which is problematic as we don't know if the token pointer is still valid or not. Actually, we use the eventfd_ctx as the token so the life cycle of the token should be bound to the VHOST_SET_VRING_CALL instead of vhost_vdpa_setup_vq_irq() which could be called by set_status(). Fixing this by setting up irq bypass producer's token when handling VHOST_SET_VRING_CALL and un-registering the producer before calling vhost_vring_ioctl() to prevent a possible use after free as eventfd could have been released in vhost_vring_ioctl(). And such registering and unregistering will only be done if DRIVER_OK is set. Reported-by: Dragos Tatulea Tested-by: Dragos Tatulea Reviewed-by: Dragos Tatulea Fixes: 2cf1ba9a4d15 ("vhost_vdpa: implement IRQ offloading in vhost_vdpa") Signed-off-by: Jason Wang Message-Id: <20240816031900.18013-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost-vdpa: switch to use vmf_insert_pfn() in the fault handler

2024-08-14T11:58:55Z

commit 0823dc64586ba5ea13a7d200a5d33e4c5fa45950 upstream. remap_pfn_page() should not be called in the fault handler as it may change the vma->flags which may trigger lockdep warning since the vma write lock is not held. Actually there's no need to modify the vma->flags as it has been set in the mmap(). So this patch switches to use vmf_insert_pfn() instead. Reported-by: Dragos Tatulea Tested-by: Dragos Tatulea Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap") Cc: stable@vger.kernel.org Signed-off-by: Jason Wang Message-Id: <20240701033159.18133-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Michal Kubiak Signed-off-by: Greg Kroah-Hartman

vhost/vsock: always initialize seqpacket_allow

2024-08-03T06:54:01Z

[ Upstream commit 1e1fdcbdde3b7663e5d8faeb2245b9b151417d22 ] There are two issues around seqpacket_allow: 1. seqpacket_allow is not initialized when socket is created. Thus if features are never set, it will be read uninitialized. 2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared, then seqpacket_allow will not be cleared appropriately (existing apps I know about don't usually do this but it's legal and there's no way to be sure no one relies on this). To fix: - initialize seqpacket_allow after allocation - set it unconditionally in set_features Reported-by: syzbot+6c21aeb59d0e82eb2782@syzkaller.appspotmail.com Reported-by: Jeongjun Park Fixes: ced7b713711f ("vhost/vsock: support SEQPACKET for transport"). Tested-by: Arseny Krasnov Cc: David S. Miller Cc: Stefan Hajnoczi Message-ID: <20240422100010-mutt-send-email-mst@kernel.org> Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Reviewed-by: Stefano Garzarella Reviewed-by: Eugenio Pérez Acked-by: Jakub Kicinski Signed-off-by: Sasha Levin

vhost-scsi: Handle vhost_vq_work_queue failures for events

2024-07-11T10:49:20Z

[ Upstream commit b1b2ce58ed23c5d56e0ab299a5271ac01f95b75c ] Currently, we can try to queue an event's work before the vhost_task is created. When this happens we just drop it in vhost_scsi_do_plug before even calling vhost_vq_work_queue. During a device shutdown we do the same thing after vhost_scsi_clear_endpoint has cleared the backends. In the next patches we will be able to kill the vhost_task before we have cleared the endpoint. In that case, vhost_vq_work_queue can fail and we will leak the event's memory. This has handle the failure by just freeing the event. This is safe to do, because vhost_vq_work_queue will only return failure for us when the vhost_task is killed and so userspace will not be able to handle events if we sent them. Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost_task: Handle SIGKILL by flushing work and exiting

2024-07-11T10:49:10Z

[ Upstream commit db5247d9bf5c6ade9fd70b4e4897441e0269b233 ] Instead of lingering until the device is closed, this has us handle SIGKILL by: 1. marking the worker as killed so we no longer try to use it with new virtqueues and new flush operations. 2. setting the virtqueue to worker mapping so no new works are queued. 3. running all the exiting works. Suggested-by: Edward Adam Davis Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com Message-Id: Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost: Release worker mutex during flushes

2024-07-11T10:49:10Z

[ Upstream commit ba704ff4e142fd3cfaf3379dd3b3b946754e06e3 ] In the next patches where the worker can be killed while in use, we need to be able to take the worker mutex and kill queued works for new IO and flushes, and set some new flags to prevent new __vhost_vq_attach_worker calls from swapping in/out killed workers. If we are holding the worker mutex during a flush and the flush's work is still in the queue, the worker code that will handle the SIGKILL cleanup won't be able to take the mutex and perform it's cleanup. So this patch has us drop the worker mutex while waiting for the flush to complete. Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost: Use virtqueue mutex for swapping worker

2024-07-11T10:49:10Z

[ Upstream commit 34cf9ba5f00a222dddd9fc71de7c68fdaac7fb97 ] __vhost_vq_attach_worker uses the vhost_dev mutex to serialize the swapping of a virtqueue's worker. This was done for simplicity because we are already holding that mutex. In the next patches where the worker can be killed while in use, we need finer grained locking because some drivers will hold the vhost_dev mutex while flushing. However in the SIGKILL handler in the next patches, we will need to be able to swap workers (set current one to NULL), kill queued works and stop new flushes while flushes are in progress. To prepare us, this has us use the virtqueue mutex for swapping workers instead of the vhost_dev one. Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost: Add smp_rmb() in vhost_enable_notify()

2024-04-17T09:19:35Z

commit df9ace7647d4123209395bb9967e998d5758c645 upstream. A smp_rmb() has been missed in vhost_enable_notify(), inspired by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_enable_notify(). When it returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()"). Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") Cc: # v5.18+ Reported-by: Yihuang Yu Suggested-by: Will Deacon Signed-off-by: Gavin Shan Acked-by: Jason Wang Message-Id: <20240328002149.1141302-3-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella Signed-off-by: Greg Kroah-Hartman

vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-04-17T09:19:35Z

commit 22e1992cf7b034db5325660e98c41ca5afa5f519 upstream. A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch() returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit 275bf960ac697 ("vhost: better detection of available buffers"). Fixes: 275bf960ac69 ("vhost: better detection of available buffers") Cc: # v4.11+ Reported-by: Yihuang Yu Suggested-by: Will Deacon Signed-off-by: Gavin Shan Acked-by: Jason Wang Message-Id: <20240328002149.1141302-2-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella Signed-off-by: Greg Kroah-Hartman

virtio/vsock: send credit update during setting SO_RCVLOWAT

2024-01-25T23:35:26Z

[ Upstream commit 0fe1798968115488c0c02f4633032a015b1faf97 ] Send credit update message when SO_RCVLOWAT is updated and it is bigger than number of bytes in rx queue. It is needed, because 'poll()' will wait until number of bytes in rx queue will be not smaller than O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup for tx/rx is possible: sender waits for free space and receiver is waiting data in 'poll()'. Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set 'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the transport doesn't need to do it. Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov Reviewed-by: Stefano Garzarella Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin