<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/virt, branch v4.9.243</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.243</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.243'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-10-01T18:40:06Z</updated>
<entry>
<title>KVM: fix overflow of zero page refcount with ksm running</title>
<updated>2020-10-01T18:40:06Z</updated>
<author>
<name>Zhuang Yanying</name>
<email>ann.zhuangyanying@huawei.com</email>
</author>
<published>2019-10-12T03:37:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=154e9494bffb1d202e8c5da6d5833dcac76095ac'/>
<id>urn:sha1:154e9494bffb1d202e8c5da6d5833dcac76095ac</id>
<content type='text'>
[ Upstream commit 7df003c85218b5f5b10a7f6418208f31e813f38f ]

We are testing Virtual Machine with KSM on v5.4-rc2 kernel,
and found the zero_page refcount overflow.
The cause of refcount overflow is increased in try_async_pf
(get_user_page) without being decreased in mmu_set_spte()
while handling ept violation.
In kvm_release_pfn_clean(), only unreserved page will call
put_page. However, zero page is reserved.
So, as well as creating and destroy vm, the refcount of
zero page will continue to increase until it overflows.

step1:
echo 10000 &gt; /sys/kernel/pages_to_scan/pages_to_scan
echo 1 &gt; /sys/kernel/pages_to_scan/run
echo 1 &gt; /sys/kernel/pages_to_scan/use_zero_pages

step2:
just create several normal qemu kvm vms.
And destroy it after 10s.
Repeat this action all the time.

After a long period of time, all domains hang because
of the refcount of zero page overflow.

Qemu print error log as follow:
 â€¦
 error: kvm run failed Bad address
 EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd
 ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4
 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
 SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
 TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
 GDT=     000f7070 00000037
 IDT=     000f70ae 00000000
 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
 DR6=00000000ffff0ff0 DR7=0000000000000400
 EFER=0000000000000000
 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 &lt;8b&gt; 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04
 â€¦

Meanwhile, a kernel warning is departed.

 [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30
 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G           OE     5.2.0-rc2 #5
 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30
 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 &lt;0f&gt; 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0
 0 00 00 00 48 8b 47 08 a8
 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286
 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000
 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440
 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000
 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8
 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440
 [40914.836423] FS:  00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000
 [40914.836425] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0
 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [40914.836428] Call Trace:
 [40914.836433]  follow_page_pte+0x302/0x47b
 [40914.836437]  __get_user_pages+0xf1/0x7d0
 [40914.836441]  ? irq_work_queue+0x9/0x70
 [40914.836443]  get_user_pages_unlocked+0x13f/0x1e0
 [40914.836469]  __gfn_to_pfn_memslot+0x10e/0x400 [kvm]
 [40914.836486]  try_async_pf+0x87/0x240 [kvm]
 [40914.836503]  tdp_page_fault+0x139/0x270 [kvm]
 [40914.836523]  kvm_mmu_page_fault+0x76/0x5e0 [kvm]
 [40914.836588]  vcpu_enter_guest+0xb45/0x1570 [kvm]
 [40914.836632]  kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm]
 [40914.836645]  kvm_vcpu_ioctl+0x26e/0x5d0 [kvm]
 [40914.836650]  do_vfs_ioctl+0xa9/0x620
 [40914.836653]  ksys_ioctl+0x60/0x90
 [40914.836654]  __x64_sys_ioctl+0x16/0x20
 [40914.836658]  do_syscall_64+0x5b/0x180
 [40914.836664]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 [40914.836666] RIP: 0033:0x7fb61cb6bfc7

Signed-off-by: LinFeng &lt;linfeng23@huawei.com&gt;
Signed-off-by: Zhuang Yanying &lt;ann.zhuangyanying@huawei.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>KVM: fix memory leak in kvm_io_bus_unregister_dev()</title>
<updated>2020-10-01T18:39:59Z</updated>
<author>
<name>Rustam Kovhaev</name>
<email>rkovhaev@gmail.com</email>
</author>
<published>2020-09-07T18:55:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=840e124f89a5127e7eb97ebf377f4b8ca745c070'/>
<id>urn:sha1:840e124f89a5127e7eb97ebf377f4b8ca745c070</id>
<content type='text'>
[ Upstream commit f65886606c2d3b562716de030706dfe1bea4ed5e ]

when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them

Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
Signed-off-by: Rustam Kovhaev &lt;rkovhaev@gmail.com&gt;
Reviewed-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Message-Id: &lt;20200907185535.233114-1-rkovhaev@gmail.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>KVM: Check for a bad hva before dropping into the ghc slow path</title>
<updated>2020-03-11T06:53:07Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2020-01-09T23:56:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=24d85373578acb9566852a0e094ff33e71b5a924'/>
<id>urn:sha1:24d85373578acb9566852a0e094ff33e71b5a924</id>
<content type='text'>
commit fcfbc617547fc6d9552cb6c1c563b6a90ee98085 upstream.

When reading/writing using the guest/host cache, check for a bad hva
before checking for a NULL memslot, which triggers the slow path for
handing cross-page accesses.  Because the memslot is nullified on error
by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after
crossing into a new page, then the kvm_{read,write}_guest() slow path
could potentially write/access the first chunk prior to detecting the
bad hva.

Arguably, performing a partial access is semantically correct from an
architectural perspective, but that behavior is certainly not intended.
In the original implementation, memslot was not explicitly nullified
and therefore the partial access behavior varied based on whether the
memslot itself was null, or if the hva was simply bad.  The current
behavior was introduced as a seemingly unintentional side effect in
commit f1b9dd5eb86c ("kvm: Disallow wraparound in
kvm_gfn_to_hva_cache_init"), which justified the change with "since some
callers don't check the return code from this function, it sit seems
prudent to clear ghc-&gt;memslot in the event of an error".

Regardless of intent, the partial access is dependent on _not_ checking
the result of the cache initialization, which is arguably a bug in its
own right, at best simply weird.

Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.")
Cc: Jim Mattson &lt;jmattson@google.com&gt;
Cc: Andrew Honig &lt;ahonig@google.com&gt;
Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved</title>
<updated>2019-11-28T17:29:02Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-11-11T22:12:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e528acd31a13f4115ab5a0bf804fd2356a8d3c32'/>
<id>urn:sha1:e528acd31a13f4115ab5a0bf804fd2356a8d3c32</id>
<content type='text'>
commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream.

Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup().  But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.

This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().

Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()).  But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.

[*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl

Reported-by: Adam Borowski &lt;kilobyte@angband.pl&gt;
Analyzed-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: stable@vger.kernel.org
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
[sean: backport to 4.x; resolve conflict in mmu.c]
Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
</content>
</entry>
<entry>
<title>kvm: x86: mmu: Recovery of shattered NX large pages</title>
<updated>2019-11-16T09:29:57Z</updated>
<author>
<name>Junaid Shahid</name>
<email>junaids@google.com</email>
</author>
<published>2019-11-04T11:22:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c6e94acbf6abab3e3c25fcdd3343d0c2a3f160ca'/>
<id>urn:sha1:c6e94acbf6abab3e3c25fcdd3343d0c2a3f160ca</id>
<content type='text'>
commit 1aa9b9572b10529c2e64e2b8f44025d86e124308 upstream.

The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution.  This removes the performance penalty
for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid &lt;junaids@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[bwh: Backported to 4.9:
 - Update another error path in kvm_create_vm() to use out_err_no_mmu_notifier
 - Adjust filename, context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kvm: Add helper function for creating VM worker threads</title>
<updated>2019-11-16T09:29:56Z</updated>
<author>
<name>Junaid Shahid</name>
<email>junaids@google.com</email>
</author>
<published>2019-11-04T11:22:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=61e191b467f1edea6fae9123c37355133273a31a'/>
<id>urn:sha1:61e191b467f1edea6fae9123c37355133273a31a</id>
<content type='text'>
commit c57c80467f90e5504c8df9ad3555d2c78800bf94 upstream.

Add a function to create a kernel thread associated with a given VM. In
particular, it ensures that the worker thread inherits the priority and
cgroups of the calling thread.

Signed-off-by: Junaid Shahid &lt;junaids@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kvm: Convert kvm_lock to a mutex</title>
<updated>2019-11-16T09:29:49Z</updated>
<author>
<name>Junaid Shahid</name>
<email>junaids@google.com</email>
</author>
<published>2019-01-04T01:14:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c6170b81e7b78942cb4b36fc72cbd75145fd08d5'/>
<id>urn:sha1:c6170b81e7b78942cb4b36fc72cbd75145fd08d5</id>
<content type='text'>
commit 0d9ce162cf46c99628cc5da9510b959c7976735b upstream.

It doesn't seem as if there is any particular need for kvm_lock to be a
spinlock, so convert the lock to a mutex so that sleepable functions (in
particular cond_resched()) can be called while holding it.

Signed-off-by: Junaid Shahid &lt;junaids@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
[bwh: Backported to 4.9:
 - Drop changes in kvm_hyperv_tsc_notifier(), vm_stat_clear(),
   vcpu_stat_clear(), kvm_uevent_notify_change()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>KVM: coalesced_mmio: add bounds checking</title>
<updated>2019-09-21T05:14:09Z</updated>
<author>
<name>Matt Delco</name>
<email>delco@chromium.org</email>
</author>
<published>2019-09-16T21:16:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=52cb3fd60ec0a1c9b61cd28bae4e8c1d377e7b60'/>
<id>urn:sha1:52cb3fd60ec0a1c9b61cd28bae4e8c1d377e7b60</id>
<content type='text'>
commit b60fe990c6b07ef6d4df67bc0530c7c90a62623a upstream.

The first/last indexes are typically shared with a user app.
The app can change the 'last' index that the kernel uses
to store the next result.  This change sanity checks the index
before using it for writing to a potentially arbitrary address.

This fixes CVE-2019-14821.

Cc: stable@vger.kernel.org
Fixes: 5f94c1741bdc ("KVM: Add coalesced MMIO support (common part)")
Signed-off-by: Matt Delco &lt;delco@chromium.org&gt;
Signed-off-by: Jim Mattson &lt;jmattson@google.com&gt;
Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
[Use READ_ONCE. - Paolo]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI</title>
<updated>2019-09-06T08:19:53Z</updated>
<author>
<name>Marc Zyngier</name>
<email>maz@kernel.org</email>
</author>
<published>2019-08-28T10:10:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a3a71b1e91614538eeb199c0a47ba4d0e7d8b8bf'/>
<id>urn:sha1:a3a71b1e91614538eeb199c0a47ba4d0e7d8b8bf</id>
<content type='text'>
[ Upstream commit 82e40f558de566fdee214bec68096bbd5e64a6a4 ]

A guest is not allowed to inject a SGI (or clear its pending state)
by writing to GICD_ISPENDR0 (resp. GICD_ICPENDR0), as these bits are
defined as WI (as per ARM IHI 0048B 4.3.7 and 4.3.8).

Make sure we correctly emulate the architecture.

Fixes: 96b298000db4 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Cc: stable@vger.kernel.org # 4.7+
Reported-by: Andre Przywara &lt;andre.przywara@arm.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long</title>
<updated>2019-09-06T08:19:53Z</updated>
<author>
<name>Heyi Guo</name>
<email>guoheyi@huawei.com</email>
</author>
<published>2019-08-27T11:26:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f35eb8a3aa3b492489116965558c3f2f7d641f31'/>
<id>urn:sha1:f35eb8a3aa3b492489116965558c3f2f7d641f31</id>
<content type='text'>
[ Upstream commit d4a8061a7c5f7c27a2dc002ee4cb89b3e6637e44 ]

If the ap_list is longer than 256 entries, merge_final() in list_sort()
will call the comparison callback with the same element twice, causing
a deadlock in vgic_irq_cmp().

Fix it by returning early when irqa == irqb.

Cc: stable@vger.kernel.org # 4.7+
Fixes: 8e4447457965 ("KVM: arm/arm64: vgic-new: Add IRQ sorting")
Signed-off-by: Zenghui Yu &lt;yuzenghui@huawei.com&gt;
Signed-off-by: Heyi Guo &lt;guoheyi@huawei.com&gt;
[maz: massaged commit log and patch, added Fixes and Cc-stable]
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
