<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/time/timekeeping.c, branch v3.18.78</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.78</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.78'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-02-02T18:53:36Z</updated>
<entry>
<title>time: Avoid signed overflow in timekeeping_get_ns()</title>
<updated>2016-02-02T18:53:36Z</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2015-11-30T01:30:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9282d8b196725d802a57a757e78a3c74cccb9708'/>
<id>urn:sha1:9282d8b196725d802a57a757e78a3c74cccb9708</id>
<content type='text'>
[ Upstream commit 35a4933a895927990772ae96fdcfd2f806929ee2 ]

1e75fa8 "time: Condense timekeeper.xtime into xtime_sec" replaced a call to
clocksource_cyc2ns() from timekeeping_get_ns() with an open-coded version
of the same logic to avoid keeping a semi-redundant struct timespec
in struct timekeeper.

However, the commit also introduced a subtle semantic change - where
clocksource_cyc2ns() uses purely unsigned math, the new version introduces
a signed temporary, meaning that if (delta * tk-&gt;mult) has a 63-bit
overflow the following shift will still give a negative result.  The
choice of 'maxsec' in __clocksource_updatefreq_scale() means this will
generally happen if there's a ~10 minute pause in examining the
clocksource.

This can be triggered on a powerpc KVM guest by stopping it from qemu for
a bit over 10 minutes.  After resuming time has jumped backwards several
minutes causing numerous problems (jiffies does not advance, msleep()s can
be extended by minutes..).  It doesn't happen on x86 KVM guests, because
the guest TSC is effectively frozen while the guest is stopped, which is
not the case for the powerpc timebase.

Obviously an unsigned (64 bit) overflow will only take twice as long as a
signed, 63-bit overflow.  I don't know the time code well enough to know
if that will still cause incorrect calculations, or if a 64-bit overflow
is avoided elsewhere.

Still, an incorrect forwards clock adjustment will cause less trouble than
time going backwards.  So, this patch removes the potential for
intermediate signed overflow.

Cc: stable@vger.kernel.org  (3.7+)
Suggested-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Tested-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()</title>
<updated>2015-10-28T02:12:51Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2015-09-09T23:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1dbcfc2dcd84d2b2781fb8611fb4bcd3792c32ae'/>
<id>urn:sha1:1dbcfc2dcd84d2b2781fb8611fb4bcd3792c32ae</id>
<content type='text'>
[ Upstream commit 2619d7e9c92d524cb155ec89fd72875321512e5b ]

The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

  dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

  "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.

Reported-by: Nuno Gonçalves &lt;nunojpg@gmail.com&gt;
Tested-by: Nuno Goncalves &lt;nunojpg@gmail.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v3.17+
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: Richard Cochran &lt;richardcochran@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>timekeeping: Update timekeeper before updating vsyscall and pvclock</title>
<updated>2014-09-06T10:58:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-09-06T10:24:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9bf2419fa7bffa16ce58a4d5c20399eff8c970c9'/>
<id>urn:sha1:9bf2419fa7bffa16ce58a4d5c20399eff8c970c9</id>
<content type='text'>
The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.

The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.

commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.

Put the update where it belongs and fix the issue.

Reported-by: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Reported-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Gleb Natapov &lt;gleb@kernel.org&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall</title>
<updated>2014-08-14T17:04:11Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-08-13T19:47:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0680eb1f485ba5aac2ee02c9f0622239c9a4b16c'/>
<id>urn:sha1:0680eb1f485ba5aac2ee02c9f0622239c9a4b16c</id>
<content type='text'>
Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64.  This causes issues on powerpc32, which expects a 32bit
timespec.

This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.

[ Thomas is currently on vacation, but reviewed it and wanted me to send
  this fix on to you directly. ]

Cc: LKML &lt;linux-kernel@vger.kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Use cached ntp_tick_length when accumulating error</title>
<updated>2014-07-23T22:01:57Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-04-24T03:53:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=375f45b5b53a91dfa8f0c11328e0e044f82acbed'/>
<id>urn:sha1:375f45b5b53a91dfa8f0c11328e0e044f82acbed</id>
<content type='text'>
By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.

This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.

This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.

This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.

Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.

Cc: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Cc: Richard Cochran &lt;richardcochran@gmail.com&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reported-by: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Rework frequency adjustments to work better w/ nohz</title>
<updated>2014-07-23T22:01:56Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2013-12-07T01:25:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a'/>
<id>urn:sha1:dc491596f6394382fbc74ad331156207d619fa0a</id>
<content type='text'>
The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.

2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.

This method performs fairly well in the simulator Miroslav created.

Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.

Cc: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Cc: Richard Cochran &lt;richardcochran@gmail.com&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reported-by: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Minor fixup for timespec64-&gt;timespec assignment</title>
<updated>2014-07-23T22:01:56Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-07-23T21:35:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e2dff1ec0cc81fcf3e0696604bacc3e1c816538c'/>
<id>urn:sha1:e2dff1ec0cc81fcf3e0696604bacc3e1c816538c</id>
<content type='text'>
In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
we take the tk_xtime() value, which returns a timespec64, and
store it in a timespec.

This luckily is ok, since the only architectures that use
GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
64 bit systems where timespec64 is the same as a timespec.

Even so, for cleanliness reasons, use the conversion function
to assign the proper type.

Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC</title>
<updated>2014-07-23T22:01:55Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086'/>
<id>urn:sha1:4396e058c52e167729729cf64ea3dfa229637086</id>
<content type='text'>
Tracers want a correlated time between the kernel instrumentation and
user space. We really do not want to export sched_clock() to user
space, so we need to provide something sensible for this.

Using separate data structures with an non blocking sequence count
based update mechanism allows us to do that. The data structure
required for the readout has a sequence counter and two copies of the
timekeeping data.

On the update side:

  smp_wmb();
  tkf-&gt;seq++;
  smp_wmb();
  update(tkf-&gt;base[0], tk);
  smp_wmb();
  tkf-&gt;seq++;
  smp_wmb();
  update(tkf-&gt;base[1], tk);

On the reader side:

  do {
     seq = tkf-&gt;seq;
     smp_rmb();
     idx = seq &amp; 0x01;
     now = now(tkf-&gt;base[idx]);
     smp_rmb();
  } while (seq != tkf-&gt;seq)

So if a NMI hits the update of base[0] it will use base[1] which is
still consistent, but this timestamp is not guaranteed to be monotonic
across an update.

The timestamp is calculated by:

	now = base_mono + clock_delta * slope

So if the update lowers the slope, readers who are forced to the
not yet updated second array are still using the old steeper slope.

 tmono
 ^
 |    o  n
 |   o n
 |  u
 | o
 |o
 |12345678---&gt; reader order

 o = old slope
 u = update
 n = new slope

So reader 6 will observe time going backwards versus reader 5.

While other CPUs are likely to be able observe that, the only way
for a CPU local observation is when an NMI hits in the middle of
the update. Timestamps taken from that NMI context might be ahead
of the following timestamps. Callers need to be aware of that and
deal with it.

V2: Got rid of clock monotonic raw and reorganized the data
    structures. Folded in the barrier fix from Mathieu.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Use tk_read_base as argument for timekeeping_get_ns()</title>
<updated>2014-07-23T22:01:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0e5ac3a8b100469ea154f87dd57b685fbdd356f6'/>
<id>urn:sha1:0e5ac3a8b100469ea154f87dd57b685fbdd356f6</id>
<content type='text'>
All the function needs is in the tk_read_base struct. No functional
change for the current code, just a preparatory patch for the NMI safe
accessor to clock monotonic which will use struct tk_read_base as well.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Create struct tk_read_base and use it in struct timekeeper</title>
<updated>2014-07-23T22:01:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d28ede83791defee9a81e558540699dc46dbbe13'/>
<id>urn:sha1:d28ede83791defee9a81e558540699dc46dbbe13</id>
<content type='text'>
The members of the new struct are the required ones for the new NMI
safe accessor to clcok monotonic. In order to reuse the existing
timekeeping code and to make the update of the fast NMI safe
timekeepers a simple memcpy use the struct for the timekeeper as well
and convert all users.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
</feed>
