user/sven/linux.git/virt/kvm, branch v3.2.96

KVM: nVMX: Fix memory corruption when using VMCS shadowing

2016-11-20T01:01:26Z

commit 2f1fe81123f59271bddda673b60116bde9660385 upstream. When freeing the nested resources of a vcpu, there is an assumption that the vcpu's vmcs01 is the current VMCS on the CPU that executes nested_release_vmcs12(). If this assumption is violated, the vcpu's vmcs01 may be made active on multiple CPUs at the same time, in violation of Intel's specification. Moreover, since the vcpu's vmcs01 is not VMCLEARed on every CPU on which it is active, it can linger in a CPU's VMCS cache after it has been freed and potentially repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity miss can result in memory corruption. It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If the vcpu in question was last loaded on a different CPU, it must be migrated to the current CPU before calling vmx_load_vmcs01(). Signed-off-by: Jim Mattson Signed-off-by: Paolo Bonzini [bwh: Backported to 3.2: vcpu_load() returns void] Signed-off-by: Ben Hutchings

kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES

2016-08-22T21:37:14Z

commit caf1ff26e1aa178133df68ac3d40815fed2187d9 upstream. These days, we experienced one guest crash with 8 cores and 3 disks, with qemu error logs as bellow: qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. And then we found one patch(bdf026317d) in qemu tree, which said could fix this bug. Execute the following script will reproduce the BUG quickly: irq_affinity.sh ======================================================================== vda_irq_num=25 vdb_irq_num=27 while [ 1 ] do for irq in {1,2,4,8,10,20,40,80} do echo $irq > /proc/irq/$vda_irq_num/smp_affinity echo $irq > /proc/irq/$vdb_irq_num/smp_affinity dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct done done ======================================================================== The following qemu log is added in the qemu code and is displayed when this bug reproduced: kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024, irq_routes->nr: 1024, gsi_count: 1024. That's to say when irq_routes->nr == 1024, there are 1024 routing entries, but in the kernel code when routes->nr >= 1024, will just return -EINVAL; The nr is the number of the routing entries which is in of [1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1]. This patch fix the BUG above. Signed-off-by: Xiubo Li Signed-off-by: Wei Tang Signed-off-by: Zhang Zhuoyu Signed-off-by: Paolo Bonzini [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings

KVM: async_pf: do not warn on page allocation failures

2016-04-01T00:54:34Z

commit d7444794a02ff655eda87e3cc54e86b940e7736f upstream. In async_pf we try to allocate with NOWAIT to get an element quickly or fail. This code also handle failures gracefully. Lets silence potential page allocation failures under load. qemu-system-s39: page allocation failure: order:0,mode:0x2200000 [...] Call Trace: ([<00000000001146b8>] show_trace+0xf8/0x148) [<000000000011476a>] show_stack+0x62/0xe8 [<00000000004a36b8>] dump_stack+0x70/0x98 [<0000000000272c3a>] warn_alloc_failed+0xd2/0x148 [<000000000027709e>] __alloc_pages_nodemask+0x94e/0xb38 [<00000000002cd36a>] new_slab+0x382/0x400 [<00000000002cf7ac>] ___slab_alloc.constprop.30+0x2dc/0x378 [<00000000002d03d0>] kmem_cache_alloc+0x160/0x1d0 [<0000000000133db4>] kvm_setup_async_pf+0x6c/0x198 [<000000000013dee8>] kvm_arch_vcpu_ioctl_run+0xd48/0xd58 [<000000000012fcaa>] kvm_vcpu_ioctl+0x372/0x690 [<00000000002f66f6>] do_vfs_ioctl+0x3be/0x510 [<00000000002f68ec>] SyS_ioctl+0xa4/0xb8 [<0000000000781c5e>] system_call+0xd6/0x264 [<000003ffa24fa06a>] 0x3ffa24fa06a Signed-off-by: Christian Borntraeger Reviewed-by: Dominik Dingel Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings

kvm: fix excessive pages un-pinning in kvm_iommu_map error path.

2014-12-14T16:23:51Z

commit 3d32e4dbe71374a6780eaf51d719d76f9a9bf22f upstream. The third parameter of kvm_unpin_pages() when called from kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin and not the page size. This error was facilitated with an inconsistent API: kvm_pin_pages() takes a size, but kvn_unpin_pages() takes a number of pages, so fix the problem by matching the two. This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of un-pinning for pages intended to be un-pinned (i.e. memory leak) but unfortunately potentially aggravated the number of pages we un-pin that should have stayed pinned. As far as I understand though, the same practical mitigations apply. This issue was found during review of Red Hat 6.6 patches to prepare Ksplice rebootless updates. Thanks to Vegard for his time on a late Friday evening to help me in understanding this code. Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)") Cc: stable@vger.kernel.org Signed-off-by: Quentin Casasnovas Signed-off-by: Vegard Nossum Signed-off-by: Jamie Iles Reviewed-by: Sasha Levin Signed-off-by: Paolo Bonzini [bwh: Backported to 3.2: kvm_pin_pages() also takes a struct kvm *kvm param] Signed-off-by: Ben Hutchings

kvm: don't take vcpu mutex for obviously invalid vcpu ioctls

2014-12-14T16:23:44Z

commit 2ea75be3219571d0ec009ce20d9971e54af96e09 upstream. vcpu ioctls can hang the calling thread if issued while a vcpu is running. However, invalid ioctls can happen when userspace tries to probe the kind of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case, we know the ioctl is going to be rejected as invalid anyway and we can fail before trying to take the vcpu mutex. This patch does not change functionality, it just makes invalid ioctls fail faster. Signed-off-by: David Matlack Signed-off-by: Paolo Bonzini [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings

kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)

2014-09-13T22:41:45Z

commit 350b8bdd689cd2ab2c67c8a86a0be86cfa0751a7 upstream. The third parameter of kvm_iommu_put_pages is wrong, It should be 'gfn - slot->base_gfn'. By making gfn very large, malicious guest or userspace can cause kvm to go to this error path, and subsequently to pass a huge value as size. Alternatively if gfn is small, then pages would be pinned but never unpinned, causing host memory leak and local DOS. Passing a reasonable but large value could be the most dangerous case, because it would unpin a page that should have stayed pinned, and thus allow the device to DMA into arbitrary memory. However, this cannot happen because of the condition that can trigger the error: - out of memory (where you can't allocate even a single page) should not be possible for the attacker to trigger - when exceeding the iommu's address space, guest pages after gfn will also exceed the iommu's address space, and inside kvm_iommu_put_pages() the iommu_iova_to_phys() will fail. The page thus would not be unpinned at all. Reported-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings

KVM: async_pf: mm->mm_users can not pin apf->mm

2014-06-09T12:29:03Z

commit 41c22f626254b9dc0376928cae009e73d1b6a49a upstream. get_user_pages(mm) is simply wrong if mm->mm_users == 0 and exit_mmap/etc was already called (or is in progress), mm->mm_count can only pin mm->pgd and mm_struct itself. Change kvm_setup_async_pf/async_pf_execute to inc/dec mm->mm_users. kvm_create_vm/kvm_destroy_vm play with ->mm_count too but this case looks fine at first glance, it seems that this ->mm is only used to verify that current->mm == kvm->mm. Signed-off-by: Oleg Nesterov Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings

kvm: remove .done from struct kvm_async_pf

2014-06-09T12:29:03Z

commit 98fda169290b3b28c0f2db2b8f02290c13da50ef upstream. '.done' is used to mark the completion of 'async_pf_execute()', but 'cancel_work_sync()' returns true when the work was canceled, so we use it instead. Signed-off-by: Radim Krčmář Reviewed-by: Paolo Bonzini Reviewed-by: Gleb Natapov Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings

kvm: free resources after canceling async_pf

2014-06-09T12:29:03Z

commit 28b441e24088081c1e213139d1303b451a34a4f4 upstream. When we cancel 'async_pf_execute()', we should behave as if the work was never scheduled in 'kvm_setup_async_pf()'. Fixes a bug when we can't unload module because the vm wasn't destroyed. Signed-off-by: Radim Krčmář Reviewed-by: Paolo Bonzini Reviewed-by: Gleb Natapov Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings

KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio()

2014-04-01T23:58:46Z

commit aac5c4226e7136c331ed384c25d5560204da10a0 upstream. If kvm_io_bus_register_dev() fails then it returns success but it should return an error code. I also did a little cleanup like removing an impossible NULL test. Fixes: 2b3c246a682c ('KVM: Make coalesced mmio use a device per zone') Signed-off-by: Dan Carpenter Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings