user/sven/linux.git/virt/kvm, branch v3.18.16

KVM: arm/arm64: check IRQ number on userland injection

2015-05-17T23:11:50Z

[ Upstream commit fd1d0ddf2ae92fb3df42ed476939861806c5d785 ] When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently only check it against a fixed limit, which historically is set to 127. With the new dynamic IRQ allocation the effective limit may actually be smaller (64). So when now a malicious or buggy userland injects a SPI in that range, we spill over on our VGIC bitmaps and bytemaps memory. I could trigger a host kernel NULL pointer dereference with current mainline by injecting some bogus IRQ number from a hacked kvmtool: ----------------- .... DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1) DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1) DEBUG: IRQ #114 still in the game, writing to bytemap now... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc07652e000 [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027 Hardware name: FVP Base (DT) task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000 PC is at kvm_vgic_inject_irq+0x234/0x310 LR is at kvm_vgic_inject_irq+0x30c/0x310 pc : [] lr : [] pstate: 80000145 ..... So this patch fixes this by checking the SPI number against the actual limit. Also we remove the former legacy hard limit of 127 in the ioctl code. Signed-off-by: Andre Przywara Reviewed-by: Christoffer Dall CC: # 4.0, 3.19, 3.18 [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__, as suggested by Christopher Covington] Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin

KVM: use slowpath for cross page cached accesses

2015-05-17T23:11:49Z

[ Upstream commit ca3f0874723fad81d0c701b63ae3a17a408d5f25 ] kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by: Radim Krčmář Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by: Wanpeng Li Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin

arm/arm64: KVM: Keep elrsr/aisr in sync with software model

2015-05-11T11:07:37Z

commit ae705930fca6322600690df9dc1c7d0516145a93 upstream. There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier Reported-by: Alex Bennee Signed-off-by: Christoffer Dall Signed-off-by: Alex Bennée Acked-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Shannon Zhao Signed-off-by: Sasha Levin

KVM: arm/arm64: vgic: vgic_init returns -ENODEV when no online vcpu

2015-05-11T11:07:34Z

commit 66b030e48af68fd4c22d343908bc057207a0a31e upstream. To be more explicit on vgic initialization failure, -ENODEV is returned by vgic_init when no online vcpus can be found at init. Signed-off-by: Eric Auger Signed-off-by: Christoffer Dall Signed-off-by: Shannon Zhao Signed-off-by: Sasha Levin

arm/arm64: KVM: Require in-kernel vgic for the arch timers

2015-05-11T11:07:34Z

commit 05971120fca43e0357789a14b3386bb56eef2201 upstream. It is curently possible to run a VM with architected timers support without creating an in-kernel VGIC, which will result in interrupts from the virtual timer going nowhere. To address this issue, move the architected timers initialization to the time when we run a VCPU for the first time, and then only initialize (and enable) the architected timers if we have a properly created and initialized in-kernel VGIC. When injecting interrupts from the virtual timer to the vgic, the current setup should ensure that this never calls an on-demand init of the VGIC, which is the only call path that could return an error from kvm_vgic_inject_irq(), so capture the return value and raise a warning if there's an error there. We also change the kvm_timer_init() function from returning an int to be a void function, since the function always succeeds. Reviewed-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Shannon Zhao Signed-off-by: Sasha Levin

arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs

2015-05-11T11:07:34Z

commit ca7d9c829d419c06e450afa5f785d58198c37caa upstream. Userspace assumes that it can wire up IRQ injections after having created all VCPUs and after having created the VGIC, but potentially before starting the first VCPU. This can currently lead to lost IRQs because the state of that IRQ injection is not stored anywhere and we don't return an error to userspace. We haven't seen this problem manifest itself yet, presumably because guests reset the devices on boot, but this could cause issues with migration and other non-standard startup configurations. Reviewed-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Shannon Zhao Signed-off-by: Sasha Levin

arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

2015-05-11T11:07:33Z

commit 016ed39c54b8a3db680e5c6a43419f806133caf2 upstream. When call kvm_vgic_inject_irq to inject interrupt, we can known which vcpu the interrupt for by the irq_num and the cpuid. So we should just kick this vcpu to avoid iterating through all. Reviewed-by: Christoffer Dall Signed-off-by: Shannon Zhao Signed-off-by: Marc Zyngier Signed-off-by: Shannon Zhao Signed-off-by: Sasha Levin

arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps()

2015-05-11T11:07:33Z

commit 6d3cfbe21bef5b66530b50ad16c88fdc71a04c35 upstream. VGIC initialization currently happens in three phases: (1) kvm_vgic_create() (triggered by userspace GIC creation) (2) vgic_init_maps() (triggered by userspace GIC register read/write requests, or from kvm_vgic_init() if not already run) (3) kvm_vgic_init() (triggered by first VM run) We were doing initialization of some state to correspond with the state of a freshly-reset GIC in kvm_vgic_init(); this is too late, since it will overwrite changes made by userspace using the register access APIs before the VM is run. Move this initialization earlier, into the vgic_init_maps() phase. This fixes a bug where QEMU could successfully restore a saved VM state snapshot into a VM that had already been run, but could not restore it "from cold" using the -loadvm command line option (the symptoms being that the restored VM would run but interrupts were ignored). Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to kvm_vgic_map_resources. [ This patch is originally written by Peter Maydell, but I have modified it somewhat heavily, renaming various bits and moving code around. If something is broken, I am to be blamed. - Christoffer ] Acked-by: Marc Zyngier Reviewed-by: Eric Auger Signed-off-by: Peter Maydell Signed-off-by: Christoffer Dall Signed-off-by: Shannon Zhao Signed-off-by: Sasha Levin

kvm: avoid page allocation failure in kvm_set_memory_region()

2015-04-24T21:14:14Z

[ Upstream commit 744961341d472db6272ed9b42319a90f5a2aa7c4 ] KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov Signed-off-by: Marcelo Tosatti Signed-off-by: Sasha Levin

kvm: move advertising of KVM_CAP_IRQFD to common code

2015-03-28T14:04:06Z

[ Upstream commit dc9be0fac70a2ad86e31a81372bb0bdfb6945353 ] POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz Cc: stable@vger.kernel.org Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc Signed-off-by: Paolo Bonzini Signed-off-by: Marcelo Tosatti Signed-off-by: Sasha Levin