<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/Documentation/virtual, branch v4.19.261</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.261</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.261'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-08-15T11:05:04Z</updated>
<entry>
<title>KVM: X86: MMU: Use the correct inherited permissions to get shadow page</title>
<updated>2021-08-15T11:05:04Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-06-03T05:24:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4c07e70141eebd3db64297515a427deea4822957'/>
<id>urn:sha1:4c07e70141eebd3db64297515a427deea4822957</id>
<content type='text'>
commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream.

When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions.  Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different.  KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.

For example, here is a shared pagetable:

   pgd[]   pud[]        pmd[]            virtual address pointers
                     /-&gt;pmd1(u--)-&gt;pte1(uw-)-&gt;page1 &lt;- ptr1 (u--)
        /-&gt;pud1(uw-)---&gt;pmd2(uw-)-&gt;pte2(uw-)-&gt;page2 &lt;- ptr2 (uw-)
   pgd-|           (shared pmd[] as above)
        \-&gt;pud2(u--)---&gt;pmd1(u--)-&gt;pte1(uw-)-&gt;page1 &lt;- ptr3 (u--)
                     \-&gt;pmd2(uw-)-&gt;pte2(uw-)-&gt;page2 &lt;- ptr4 (u--)

  pud1 and pud2 point to the same pmd table, so:
  - ptr1 and ptr3 points to the same page.
  - ptr2 and ptr4 points to the same page.

(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)

- First, the guest reads from ptr1 first and KVM prepares a shadow
  page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
  "u--" comes from the effective permissions of pgd, pud1 and
  pmd1, which are stored in pt-&gt;access.  "u--" is used also to get
  the pagetable for pud1, instead of "uw-".

- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
  The hypervisor set up a shadow page for ptr2 with pt-&gt;access is "uw-"
  even though the pud1 pmd (because of the incorrect argument to
  kvm_mmu_get_page in the previous step) has role.access="u--".

- Then the guest reads from ptr3.  The hypervisor reuses pud1's
  shadow pmd for pud2, because both use "u--" for their permissions.
  Thus, the shadow pmd already includes entries for both pmd1 and pmd2.

- At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
  because pud1's shadow page structures included an "uw-" page even though
  its role.access was "u--".

Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.

In order to fix the problem, we change pt-&gt;access to be an array, and
any access in it will not include permissions ANDed from child ptes.

The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.

The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit.  So there is no fixes tag here.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Message-Id: &lt;20210603052455.21023-1-jiangshanlai@gmail.com&gt;
Cc: stable@vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
[OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
     - apply documentation changes to Documentation/virtual/kvm/mmu.txt
     - adjusted context in arch/x86/kvm/paging_tmpl.h]
Signed-off-by: Ovidiu Panait &lt;ovidiu.panait@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit</title>
<updated>2020-06-22T07:05:12Z</updated>
<author>
<name>Jon Doron</name>
<email>arilou@gmail.com</email>
</author>
<published>2020-04-24T11:37:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ff40b1166a2f7e7761ac9e0fe6d2975ef7f7335c'/>
<id>urn:sha1:ff40b1166a2f7e7761ac9e0fe6d2975ef7f7335c</id>
<content type='text'>
[ Upstream commit f7d31e65368aeef973fab788aa22c4f1d5a6af66 ]

The problem the patch is trying to address is the fact that 'struct
kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit
modes.

In 64-bit mode the default alignment boundary is 64 bits thus
forcing extra gaps after 'type' and 'msr' but in 32-bit mode the
boundary is at 32 bits thus no extra gaps.

This is an issue as even when the kernel is 64 bit, the userspace using
the interface can be both 32 and 64 bit but the same 32 bit userspace has
to work with 32 bit kernel.

The issue is fixed by forcing the 64 bit layout, this leads to ABI
change for 32 bit builds and while we are obviously breaking '32 bit
userspace with 32 bit kernel' case, we're fixing the '32 bit userspace
with 64 bit kernel' one.

As the interface has no (known) users and 32 bit KVM is rather baroque
nowadays, this seems like a reasonable decision.

Reviewed-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Jon Doron &lt;arilou@gmail.com&gt;
Message-Id: &lt;20200424113746.3473563-2-arilou@gmail.com&gt;
Reviewed-by: Roman Kagan &lt;rvkagan@yandex-team.ru&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>kvm: Convert kvm_lock to a mutex</title>
<updated>2019-11-12T18:21:40Z</updated>
<author>
<name>Junaid Shahid</name>
<email>junaids@google.com</email>
</author>
<published>2019-01-04T01:14:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=30d8d8d6cd928b5bc8547aca6d8eb30b334d5451'/>
<id>urn:sha1:30d8d8d6cd928b5bc8547aca6d8eb30b334d5451</id>
<content type='text'>
commit 0d9ce162cf46c99628cc5da9510b959c7976735b upstream.

It doesn't seem as if there is any particular need for kvm_lock to be a
spinlock, so convert the lock to a mutex so that sleepable functions (in
particular cond_resched()) can be called while holding it.

Signed-off-by: Junaid Shahid &lt;junaids@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>KVM: Reject device ioctls from processes other than the VM's creator</title>
<updated>2019-04-03T04:26:29Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-02-15T20:48:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ceedcefc2d2506536f634f1887521be51d598c2'/>
<id>urn:sha1:7ceedcefc2d2506536f634f1887521be51d598c2</id>
<content type='text'>
commit ddba91801aeb5c160b660caed1800eb3aef403f8 upstream.

KVM's API requires thats ioctls must be issued from the same process
that created the VM.  In other words, userspace can play games with a
VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
creator can do anything useful.  Explicitly reject device ioctls that
are issued by a process other than the VM's creator, and update KVM's
API documentation to extend its requirements to device ioctls.

Fixes: 852b6d57dc7f ("kvm: add device control API")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>KVM: x86: Control guest reads of MSR_PLATFORM_INFO</title>
<updated>2018-09-19T22:51:46Z</updated>
<author>
<name>Drew Schmitt</name>
<email>dasch@google.com</email>
</author>
<published>2018-08-20T17:32:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6fbbde9a1969dfb476467ebf69a475095ef3fd4d'/>
<id>urn:sha1:6fbbde9a1969dfb476467ebf69a475095ef3fd4d</id>
<content type='text'>
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).

Signed-off-by: Drew Schmitt &lt;dasch@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: s390: Make huge pages unavailable in ucontrol VMs</title>
<updated>2018-09-12T12:46:37Z</updated>
<author>
<name>Janosch Frank</name>
<email>frankja@linux.ibm.com</email>
</author>
<published>2018-08-01T10:48:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=40ebdb8e59df36e2cc71810bd021a0808b16c956'/>
<id>urn:sha1:40ebdb8e59df36e2cc71810bd021a0808b16c956</id>
<content type='text'>
We currently do not notify all gmaps when using gmap_pmdp_xchg(), due
to locking constraints. This makes ucontrol VMs, which is the only VM
type that creates multiple gmaps, incompatible with huge pages. Also
we would need to hold the guest_table_lock of all gmaps that have this
vmaddr maped to synchronize access to the pmd.

ucontrol VMs are rather exotic and creating a new locking concept is
no easy task. Hence we return EINVAL when trying to active
KVM_CAP_S390_HPAGE_1M and report it as being not available when
checking for it.

Fixes: a4499382 ("KVM: s390: Add huge page enablement control")
Signed-off-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Message-Id: &lt;20180801112508.138159-1-frankja@linux.ibm.com&gt;
Signed-off-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR</title>
<updated>2018-08-22T12:08:05Z</updated>
<author>
<name>Dongjiu Geng</name>
<email>gengdongjiu@huawei.com</email>
</author>
<published>2018-08-20T21:39:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=688e0581dbe0fa851188dd4fef887ea0c55aab13'/>
<id>urn:sha1:688e0581dbe0fa851188dd4fef887ea0c55aab13</id>
<content type='text'>
In the documentation description, this capability's name is
KVM_CAP_ARM_SET_SERROR_ESR, but in the header file this
capability's name is KVM_CAP_ARM_INJECT_SERROR_ESR, so change
the documentation description to make it same.

Signed-off-by: Dongjiu Geng &lt;gengdongjiu@huawei.com&gt;
Reported-by: Andrew Jones &lt;drjones@redhat.com&gt;
Reviewed-by: Andrew Jones &lt;drjones@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD</title>
<updated>2018-08-22T12:07:56Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2018-08-22T12:07:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=631989303b06b8fdb15ec3b88aee2d25e80d4cec'/>
<id>urn:sha1:631989303b06b8fdb15ec3b88aee2d25e80d4cec</id>
<content type='text'>
KVM/arm updates for 4.19

- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
</content>
</entry>
<entry>
<title>KVM: X86: Implement "send IPI" hypercall</title>
<updated>2018-08-06T15:59:20Z</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpengli@tencent.com</email>
</author>
<published>2018-07-23T06:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4180bf1b655a791a0a6ef93a2ffffc762722c782'/>
<id>urn:sha1:4180bf1b655a791a0a6ef93a2ffffc762722c782</id>
<content type='text'>
Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

 Dry-run:                         0,            2392199 ns
 Self-IPI:                  6907514,           15027589 ns
 Normal IPI:              223910476,          251301666 ns
 Broadcast IPI:                   0,         9282161150 ns
 Broadcast lock:                  0,         8812934104 ns

x2apic cluster mode, pv-ipi

 Dry-run:                         0,            2449341 ns
 Self-IPI:                  6720360,           15028732 ns
 Normal IPI:              228643307,          255708477 ns
 Broadcast IPI:                   0,         7572293590 ns  =&gt; 22% performance boost
 Broadcast lock:                  0,         8316124651 ns

x2apic physical mode, vanilla

 Dry-run:                         0,            3135933 ns
 Self-IPI:                  8572670,           17901757 ns
 Normal IPI:              226444334,          255421709 ns
 Broadcast IPI:                   0,        19845070887 ns
 Broadcast lock:                  0,        19827383656 ns

x2apic physical mode, pv-ipi

 Dry-run:                         0,            2446381 ns
 Self-IPI:                  6788217,           15021056 ns
 Normal IPI:              219454441,          249583458 ns
 Broadcast IPI:                   0,         7806540019 ns  =&gt; 154% performance boost
 Broadcast lock:                  0,         9143618799 ns

Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Wanpeng Li &lt;wanpengli@tencent.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: nVMX: Introduce KVM_CAP_NESTED_STATE</title>
<updated>2018-08-06T15:58:30Z</updated>
<author>
<name>Jim Mattson</name>
<email>jmattson@google.com</email>
</author>
<published>2018-07-10T09:27:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8fcc4b5923af5de58b80b53a069453b135693304'/>
<id>urn:sha1:8fcc4b5923af5de58b80b53a069453b135693304</id>
<content type='text'>
For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jim Mattson &lt;jmattson@google.com&gt;
[karahmed@ - rename structs and functions and make them ready for AMD and
             address previous comments.
           - handle nested.smm state.
           - rebase &amp; a bit of refactoring.
           - Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: KarimAllah Ahmed &lt;karahmed@amazon.de&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
