<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/kvm_host.h, branch v4.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-05-25T14:12:05Z</updated>
<entry>
<title>KVM: Create debugfs dir and stat files for each VM</title>
<updated>2016-05-25T14:12:05Z</updated>
<author>
<name>Janosch Frank</name>
<email>frankja@linux.vnet.ibm.com</email>
</author>
<published>2016-05-18T11:26:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=536a6f88c49dd739961ffd53774775afed852c83'/>
<id>urn:sha1:536a6f88c49dd739961ffd53774775afed852c83</id>
<content type='text'>
This patch adds a kvm debugfs subdirectory for each VM, which is named
after its pid and file descriptor. The directories contain the same
kind of files that are already in the kvm debugfs directory, but the
data exported through them is now VM specific.

This makes the debugfs kvm data a convenient alternative to the
tracepoints which already have per VM data. The debugfs data is easy
to read and low overhead.

CC: Dan Carpenter &lt;dan.carpenter@oracle.com&gt; [includes fixes by Dan Carpenter]
Signed-off-by: Janosch Frank &lt;frankja@linux.vnet.ibm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick</title>
<updated>2016-05-18T16:04:27Z</updated>
<author>
<name>Radim Krčmář</name>
<email>rkrcmar@redhat.com</email>
</author>
<published>2016-05-04T19:09:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd1a4cc1fbdf516bb38ca31b65c76e720d414d0d'/>
<id>urn:sha1:dd1a4cc1fbdf516bb38ca31b65c76e720d414d0d</id>
<content type='text'>
AVIC has a use for kvm_vcpu_wake_up.

Signed-off-by: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Tested-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
Reviewed-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: halt_polling: provide a way to qualify wakeups during poll</title>
<updated>2016-05-13T15:29:23Z</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2016-05-13T10:16:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3491caf2755e9f312666712510d80b00c81ff247'/>
<id>urn:sha1:3491caf2755e9f312666712510d80b00c81ff247</id>
<content type='text'>
Some wakeups should not be considered a sucessful poll. For example on
s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
would be considered runnable - letting all vCPUs poll all the time for
transactional like workload, even if one vCPU would be enough.
This can result in huge CPU usage for large guests.
This patch lets architectures provide a way to qualify wakeups if they
should be considered a good/bad wakeups in regard to polls.

For s390 the implementation will fence of halt polling for anything but
known good, single vCPU events. The s390 implementation for floating
interrupts does a wakeup for one vCPU, but the interrupt will be delivered
by whatever CPU checks first for a pending interrupt. We prefer the
woken up CPU by marking the poll of this CPU as "good" poll.
This code will also mark several other wakeup reasons like IPI or
expired timers as "good". This will of course also mark some events as
not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
we prefer to not poll, unless we are really sure, though.

This patch successfully limits the CPU usage for cases like uperf 1byte
transactional ping pong workload or wakeup heavy workload like OLTP
while still providing a proper speedup.

This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
wakeups that are considered not good for polling.

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Acked-by: Radim Krčmář &lt;rkrcmar@redhat.com&gt; (for an earlier version)
Cc: David Matlack &lt;dmatlack@google.com&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
[Rename config symbol. - Paolo]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: Conditionally register IRQ bypass consumer</title>
<updated>2016-05-11T20:37:55Z</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2016-05-05T17:58:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14717e2031862d9aa2512b24a7df42cf68a977ec'/>
<id>urn:sha1:14717e2031862d9aa2512b24a7df42cf68a977ec</id>
<content type='text'>
If we don't support a mechanism for bypassing IRQs, don't register as
a consumer.  This eliminates meaningless dev_info()s when the connect
fails between producer and consumer, such as on AMD systems where
kvm_x86_ops-&gt;update_pi_irte is not implemented

Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: introduce KVM_MAX_VCPU_ID</title>
<updated>2016-05-11T20:37:54Z</updated>
<author>
<name>Greg Kurz</name>
<email>gkurz@linux.vnet.ibm.com</email>
</author>
<published>2016-05-09T16:13:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0b1b1dfd52a67f4f09a18cb82337199bc90ad7fb'/>
<id>urn:sha1:0b1b1dfd52a67f4f09a18cb82337199bc90ad7fb</id>
<content type='text'>
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
also the upper limit for vCPU ids. This is okay for all archs except PowerPC
which can have higher ids, depending on the cpu/core/thread topology. In the
worst case (single threaded guest, host with 8 threads per core), it limits
the maximum number of vCPUS to KVM_MAX_VCPUS / 8.

This patch separates the vCPU numbering from the total number of vCPUs, with
the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
plus one.

The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
before passing them to KVM_CREATE_VCPU.

This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
Other archs continue to return KVM_MAX_VCPUS instead.

Suggested-by: Radim Krcmar &lt;rkrcmar@redhat.com&gt;
Signed-off-by: Greg Kurz &lt;gkurz@linux.vnet.ibm.com&gt;
Reviewed-by: Cornelia Huck &lt;cornelia.huck@de.ibm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: remove NULL return path for vcpu ids &gt;= KVM_MAX_VCPUS</title>
<updated>2016-05-11T20:37:53Z</updated>
<author>
<name>Greg Kurz</name>
<email>gkurz@linux.vnet.ibm.com</email>
</author>
<published>2016-05-09T16:11:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9b9e3fc4d5a31f6050508f2404369beac4356867'/>
<id>urn:sha1:9b9e3fc4d5a31f6050508f2404369beac4356867</id>
<content type='text'>
Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.

This patch simply reverses the logic so that we only try fast path if the
vcpu id can be tried as an index in kvm-&gt;vcpus[]. The slow path is not
affected by the change.

Reviewed-by: David Hildenbrand &lt;dahi@linux.vnet.ibm.com&gt;
Reviewed-by: Cornelia Huck &lt;cornelia.huck@de.ibm.com&gt;
Signed-off-by: Greg Kurz &lt;gkurz@linux.vnet.ibm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: add missing memory barrier in kvm_{make,check}_request</title>
<updated>2016-04-20T13:29:17Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2016-03-10T15:30:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e4682ba2ed79d8082b78d292b3b80f54d970b7a'/>
<id>urn:sha1:2e4682ba2ed79d8082b78d292b3b80f54d970b7a</id>
<content type='text'>
kvm_make_request and kvm_check_request imply a producer-consumer
relationship; add implicit memory barriers to them.  There was indeed
already a place that was adding an explicit smp_mb() to order between
kvm_check_request and the processing of the request.  That memory
barrier can be removed (as an added benefit, kvm_check_request can use
smp_mb__after_atomic which is free on x86).

Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Use simple waitqueue for vcpu-&gt;wq</title>
<updated>2016-02-25T10:27:16Z</updated>
<author>
<name>Marcelo Tosatti</name>
<email>mtosatti@redhat.com</email>
</author>
<published>2016-02-19T08:46:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8577370fb0cbe88266b7583d8d3b9f43ced077a0'/>
<id>urn:sha1:8577370fb0cbe88266b7583d8d3b9f43ced077a0</id>
<content type='text'>
The problem:

On -rt, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:

This patch reduces the average latency in my tests from 14us to 11us.

Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core 4.4.0-rc8-01134-g0905f04:

  ./x86-run x86/tscdeadline_latency.flat -cpu host

with idle=poll.

The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:

"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.

The mean shows an improvement indeed."

Before:

               min             max         mean           std
count  1000.000000     1000.000000  1000.000000   1000.000000
mean   5162.596000  2019270.084000  5824.491541  20681.645558
std      75.431231   622607.723969    89.575700   6492.272062
min    4466.000000    23928.000000  5537.926500    585.864966
25%    5163.000000  1613252.750000  5790.132275  16683.745433
50%    5175.000000  2281919.000000  5834.654000  23151.990026
75%    5190.000000  2382865.750000  5861.412950  24148.206168
max    5228.000000  4175158.000000  6254.827300  46481.048691

After
               min            max         mean           std
count  1000.000000     1000.00000  1000.000000   1000.000000
mean   5143.511000  2076886.10300  5813.312474  21207.357565
std      77.668322   610413.09583    86.541500   6331.915127
min    4427.000000    25103.00000  5529.756600    559.187707
25%    5148.000000  1691272.75000  5784.889825  17473.518244
50%    5160.000000  2308328.50000  5832.025000  23464.837068
75%    5172.000000  2393037.75000  5853.177675  24223.969976
max    5222.000000  3922458.00000  6186.720500  42520.379830

[Patch was originaly based on the swait implementation found in the -rt
 tree. Daniel ported it to mainline's version and gathered the
 benchmark numbers for tscdeadline_latency test.]

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>kvm: rename pfn_t to kvm_pfn_t</title>
<updated>2016-01-16T01:56:32Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2016-01-16T00:56:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ba049e93aef7e8c571567088b1b73f4f5b99272a'/>
<id>urn:sha1:ba049e93aef7e8c571567088b1b73f4f5b99272a</id>
<content type='text'>
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

This patch (of 18):

The core has developed a need for a "pfn_t" type [1].  Move the existing
pfn_t in KVM to kvm_pfn_t [2].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Acked-by: Christoffer Dall &lt;christoffer.dall@linaro.org&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>KVM: move architecture-dependent requests to arch/</title>
<updated>2016-01-08T18:04:36Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2016-01-07T14:05:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2860c4b1678646c99f5f1d77d026cd12ffd8a3a9'/>
<id>urn:sha1:2860c4b1678646c99f5f1d77d026cd12ffd8a3a9</id>
<content type='text'>
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.

Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
