user/sven/linux.git/virt/kvm, branch v3.16.66

kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init

2019-04-04T15:14:04Z

commit f1b9dd5eb86cec1fcf66aad17e7701d98d024a9a upstream. Previously, in the case where (gpa + len) wrapped around, the entire region was not validated, as the comment claimed. It doesn't actually seem that wraparound should be allowed here at all. Furthermore, since some callers don't check the return code from this function, it seems prudent to clear ghc->memslot in the event of an error. Fixes: 8f964525a121f ("KVM: Allow cross page reads and writes from cached translations.") Reported-by: Cfir Cohen Signed-off-by: Jim Mattson Reviewed-by: Cfir Cohen Reviewed-by: Marc Orr Cc: Andrew Honig Signed-off-by: Radim Krčmář [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings

kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)

2019-03-25T17:32:35Z

commit cfa39381173d5f969daf43582c95ad679189cbc9 upstream. kvm_ioctl_create_device() does the following: 1. creates a device that holds a reference to the VM object (with a borrowed reference, the VM's refcount has not been bumped yet) 2. initializes the device 3. transfers the reference to the device to the caller's file descriptor table 4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real reference The ownership transfer in step 3 must not happen before the reference to the VM becomes a proper, non-borrowed reference, which only happens in step 4. After step 3, an attacker can close the file descriptor and drop the borrowed reference, which can cause the refcount of the kvm object to drop to zero. This means that we need to grab a reference for the device before anon_inode_getfd(), otherwise the VM can disappear from under us. Fixes: 852b6d57dc7f ("kvm: add device control API") Signed-off-by: Jann Horn Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings

KVM: use after free in kvm_ioctl_create_device()

2019-03-25T17:32:35Z

commit a0f1d21c1ccb1da66629627a74059dd7f5ac9c61 upstream. We should move the ops->destroy(dev) after the list_del(&dev->vm_node) so that we don't use "dev" after freeing it. Fixes: a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock") Signed-off-by: Dan Carpenter Reviewed-by: David Hildenbrand Signed-off-by: Radim Krčmář Signed-off-by: Ben Hutchings

KVM: Protect device ops->create and list_add with kvm->lock

2019-03-25T17:32:35Z

commit a28ebea2adc4a2bef5989a5a181ec238f59fbcad upstream. KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall Reviewed-by: Paolo Bonzini Acked-by: Christian Borntraeger Signed-off-by: Radim Krčmář [bwh: Backported to 3.16: - Drop change to a failure path that doesn't exist in kvm_vgic_create() - Adjust filename, context] Signed-off-by: Ben Hutchings

KVM: PPC: Move xics_debugfs_init out of create

2019-03-25T17:32:34Z

commit 023e9fddc3616b005c3753fc1bb6526388cd7a30 upstream. As we are about to hold the kvm->lock during the create operation on KVM devices, we should move the call to xics_debugfs_init into its own function, since holding a mutex over extended amounts of time might not be a good idea. Introduce an init operation on the kvm_device_ops struct which cannot fail and call this, if configured, after the device has been created. Signed-off-by: Christoffer Dall Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Ben Hutchings

KVM: mmu: Fix overlap between public and private memslots

2018-06-16T21:22:23Z

commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream. Reported by syzkaller: pte_list_remove: ffff9714eb1f8078 0->BUG ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu.c:1157! invalid opcode: 0000 [#1] SMP RIP: 0010:pte_list_remove+0x11b/0x120 [kvm] Call Trace: drop_spte+0x83/0xb0 [kvm] mmu_page_zap_pte+0xcc/0xe0 [kvm] kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm] kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm] kvm_arch_flush_shadow_all+0xe/0x10 [kvm] kvm_mmu_notifier_release+0x6c/0xa0 [kvm] ? kvm_mmu_notifier_release+0x5/0xa0 [kvm] __mmu_notifier_release+0x79/0x110 ? __mmu_notifier_release+0x5/0x110 exit_mmap+0x15a/0x170 ? do_exit+0x281/0xcb0 mmput+0x66/0x160 do_exit+0x2c9/0xcb0 ? __context_tracking_exit.part.5+0x4a/0x150 do_group_exit+0x50/0xd0 SyS_exit_group+0x14/0x20 do_syscall_64+0x73/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 The reason is that when creates new memslot, there is no guarantee for new memslot not overlap with private memslots. This can be triggered by the following program: #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include long r[16]; int main() { void *p = valloc(0x4000); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); uint64_t addr = 0xf000; ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr); r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul); ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul); ioctl(r[6], KVM_RUN, 0); ioctl(r[6], KVM_RUN, 0); struct kvm_userspace_memory_region mr = { .slot = 0, .flags = KVM_MEM_LOG_DIRTY_PAGES, .guest_phys_addr = 0xf000, .memory_size = 0x4000, .userspace_addr = (uintptr_t) p }; ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr); return 0; } This patch fixes the bug by not adding a new memslot even if it overlaps with private memslots. Reported-by: Dmitry Vyukov Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Dmitry Vyukov Cc: Eric Biggers Signed-off-by: Wanpeng Li [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings

vfio: New external user group/file match

2017-10-12T14:27:55Z

commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream. At the point where the kvm-vfio pseudo device wants to release its vfio group reference, we can't always acquire a new reference to make that happen. The group can be in a state where we wouldn't allow a new reference to be added. This new helper function allows a caller to match a file to a group to facilitate this. Given a file and group, report if they match. Thus the caller needs to already have a group reference to match to the file. This allows the deletion of a group without acquiring a new reference. Signed-off-by: Alex Williamson Reviewed-by: Eric Auger Reviewed-by: Paolo Bonzini Tested-by: Eric Auger Signed-off-by: Ben Hutchings

KVM: kvm_io_bus_unregister_dev() should never fail

2017-07-18T17:40:22Z

commit 90db10434b163e46da413d34db8d0e77404cc645 upstream. No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU") Reported-by: Dmitry Vyukov Reviewed-by: Cornelia Huck Signed-off-by: David Hildenbrand Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: - Drop changes to kvm_io_bus_get_dev() - Adjust context] Signed-off-by: Ben Hutchings

KVM: x86: clear bus pointer when destroyed

2017-07-18T17:40:22Z

commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream. When releasing the bus, let's clear the bus pointers to mark it out. If any further device unregister happens on this bus, we know that we're done if we found the bus being released already. Signed-off-by: Peter Xu Signed-off-by: Radim Krčmář Signed-off-by: Ben Hutchings

KVM: nVMX: Fix memory corruption when using VMCS shadowing

2016-11-20T01:16:51Z

commit 2f1fe81123f59271bddda673b60116bde9660385 upstream. When freeing the nested resources of a vcpu, there is an assumption that the vcpu's vmcs01 is the current VMCS on the CPU that executes nested_release_vmcs12(). If this assumption is violated, the vcpu's vmcs01 may be made active on multiple CPUs at the same time, in violation of Intel's specification. Moreover, since the vcpu's vmcs01 is not VMCLEARed on every CPU on which it is active, it can linger in a CPU's VMCS cache after it has been freed and potentially repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity miss can result in memory corruption. It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If the vcpu in question was last loaded on a different CPU, it must be migrated to the current CPU before calling vmx_load_vmcs01(). Signed-off-by: Jim Mattson Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings