<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/time, branch v5.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-12-17T22:06:22Z</updated>
<entry>
<title>timekeeping: Really make sure wall_to_monotonic isn't positive</title>
<updated>2021-12-17T22:06:22Z</updated>
<author>
<name>Yu Liao</name>
<email>liaoyu15@huawei.com</email>
</author>
<published>2021-12-13T13:57:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4e8c11b6b3f0b6a283e898344f154641eda94266'/>
<id>urn:sha1:4e8c11b6b3f0b6a283e898344f154641eda94266</id>
<content type='text'>
Even after commit e1d7ba873555 ("time: Always make sure wall_to_monotonic
isn't positive") it is still possible to make wall_to_monotonic positive
by running the following code:

    int main(void)
    {
        struct timespec time;

        clock_gettime(CLOCK_MONOTONIC, &amp;time);
        time.tv_nsec = 0;
        clock_settime(CLOCK_REALTIME, &amp;time);
        return 0;
    }

The reason is that the second parameter of timespec64_compare(), ts_delta,
may be unnormalized because the delta is calculated with an open coded
substraction which causes the comparison of tv_sec to yield the wrong
result:

  wall_to_monotonic = { .tv_sec = -10, .tv_nsec =  900000000 }
  ts_delta 	    = { .tv_sec =  -9, .tv_nsec = -900000000 }

That makes timespec64_compare() claim that wall_to_monotonic &lt; ts_delta,
but actually the result should be wall_to_monotonic &gt; ts_delta.

After normalization, the result of timespec64_compare() is correct because
the tv_sec comparison is not longer misleading:

  wall_to_monotonic = { .tv_sec = -10, .tv_nsec =  900000000 }
  ts_delta 	    = { .tv_sec = -10, .tv_nsec =  100000000 }

Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the
issue.

Fixes: e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive")
Signed-off-by: Yu Liao &lt;liaoyu15@huawei.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
</content>
</entry>
<entry>
<title>timers: implement usleep_idle_range()</title>
<updated>2021-12-11T01:10:55Z</updated>
<author>
<name>SeongJae Park</name>
<email>sj@kernel.org</email>
</author>
<published>2021-12-10T22:46:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e4779015fd5d2fb8390c258268addff24d6077c7'/>
<id>urn:sha1:e4779015fd5d2fb8390c258268addff24d6077c7</id>
<content type='text'>
Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.

This patchset fixes DAMON's fake load report issue.  The first patch
makes yet another variant of usleep_range() for this fix, and the second
patch fixes the issue of DAMON by making it using the newly introduced
function.

This patch (of 2):

Some kernel threads such as DAMON could need to repeatedly sleep in
micro seconds level.  Because usleep_range() sleeps in uninterruptible
state, however, such threads would make /proc/loadavg reports fake load.

To help such cases, this commit implements a variant of usleep_range()
called usleep_idle_range().  It is same to usleep_range() but sets the
state of the current task as TASK_IDLE while sleeping.

Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
Signed-off-by: SeongJae Park &lt;sj@kernel.org&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>timers/nohz: Last resort update jiffies on nohz_full IRQ entry</title>
<updated>2021-12-02T14:07:22Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2021-10-26T14:10:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=53e87e3cdc155f20c3417b689df8d2ac88d79576'/>
<id>urn:sha1:53e87e3cdc155f20c3417b689df8d2ac88d79576</id>
<content type='text'>
When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU
is guaranteed to stay online and to never stop its tick.

Meanwhile on some rare case, the dedicated timekeeper may be running
with interrupts disabled for a while, such as in stop_machine.

If jiffies stop being updated, a nohz_full CPU may end up endlessly
programming the next tick in the past, taking the last jiffies update
monotonic timestamp as a stale base, resulting in an tick storm.

Here is a scenario where it matters:

0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.

1) A stop machine callback is queued to execute somewhere.

2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in
   MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1
   can still take IRQs.

3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.

4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking
   last_jiffies_update as a base. But last_jiffies_update hasn't been
   updated for 2 jiffies since the timekeeper has interrupts disabled.

5) clockevents_program_event(), which relies on ktime_get(), observes
   that the expiration is in the past and therefore programs the min
   delta event on the clock.

6) The tick fires immediately, goto 3)

7) Tick storm, the nohz_full CPU is drown and takes ages to reach
   MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.

Solve this with unconditionally updating jiffies if the value is stale
on nohz_full IRQ entry. IRQs and other disturbances are expected to be
rare enough on nohz_full for the unconditional call to ktime_get() to
actually matter.

Reported-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org

</content>
</entry>
<entry>
<title>posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()</title>
<updated>2021-11-02T11:52:17Z</updated>
<author>
<name>Michael Pratt</name>
<email>mpratt@google.com</email>
</author>
<published>2021-11-01T21:06:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ca7752caeaa70bd31d1714af566c9809688544af'/>
<id>urn:sha1:ca7752caeaa70bd31d1714af566c9809688544af</id>
<content type='text'>
copy_process currently copies task_struct.posix_cputimers_work as-is. If a
timer interrupt arrives while handling clone and before dup_task_struct
completes then the child task will have:

1. posix_cputimers_work.scheduled = true
2. posix_cputimers_work.work queued.

copy_process clears task_struct.task_works, so (2) will have no effect and
posix_cpu_timers_work will never run (not to mention it doesn't make sense
for two tasks to share a common linked list).

Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is
never cleared. Since scheduled is set, future timer interrupts will skip
scheduling work, with the ultimate result that the task will never receive
timer expirations.

Together, the complete flow is:

1. Task 1 calls clone(), enters kernel.
2. Timer interrupt fires, schedules task work on Task 1.
   2a. task_struct.posix_cputimers_work.scheduled = true
   2b. task_struct.posix_cputimers_work.work added to
       task_struct.task_works.
3. dup_task_struct() copies Task 1 to Task 2.
4. copy_process() clears task_struct.task_works for Task 2.
5. Future timer interrupts on Task 2 see
   task_struct.posix_cputimers_work.scheduled = true and skip scheduling
   work.

Fix this by explicitly clearing contents of task_struct.posix_cputimers_work
in copy_process(). This was never meant to be shared or inherited across
tasks in the first place.

Fixes: 1fb497dd0030 ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work")
Reported-by: Rhys Hiltner &lt;rhys@justin.tv&gt;
Signed-off-by: Michael Pratt &lt;mpratt@google.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com

</content>
</entry>
<entry>
<title>posix-cpu-timers: Prevent spuriously armed 0-value itimer</title>
<updated>2021-09-23T09:53:51Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2021-09-13T14:53:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8cd9da85d2bd87ce889043e7b1735723dd10eb89'/>
<id>urn:sha1:8cd9da85d2bd87ce889043e7b1735723dd10eb89</id>
<content type='text'>
Resetting/stopping an itimer eventually leads to it being reprogrammed
with an actual "0" value. As a result the itimer expires on the next
tick, triggering an unexpected signal.

To fix this, make sure that
struct signal_struct::it[CPUCLOCK_PROF/VIRT]::expires is set to 0 when
setitimer() passes a 0 it_value, indicating that the timer must stop.

Fixes: 406dd42bd1ba ("posix-cpu-timers: Force next expiration recalc after itimer reset")
Reported-by: Victor Stinner &lt;vstinner@redhat.com&gt;
Reported-by: Chris Hixon &lt;linux-kernel-bugs@hixontech.com&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210913145332.232023-1-frederic@kernel.org
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2021-09-03T17:08:28Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-03T17:08:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14726903c835101cd8d0a703b609305094350d61'/>
<id>urn:sha1:14726903c835101cd8d0a703b609305094350d61</id>
<content type='text'>
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
</content>
</entry>
<entry>
<title>memcg: enable accounting for posix_timers_cache slab</title>
<updated>2021-09-03T16:58:13Z</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@virtuozzo.com</email>
</author>
<published>2021-09-02T21:55:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c509723ec27e925bb91a20682c448e95d4bc8c9f'/>
<id>urn:sha1:c509723ec27e925bb91a20682c448e95d4bc8c9f</id>
<content type='text'>
A program may create multiple interval timers using timer_create().  For
each timer the kernel preallocates a "queued real-time signal",
Consequently, the number of timers is limited by the RLIMIT_SIGPENDING
resource limit.  The allocated object is quite small, ~250 bytes, but even
the default signal limits allow to consume up to 100 megabytes per user.

It makes sense to account for them to limit the host's memory consumption
from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/57795560-025c-267c-6b1a-dea852d95530@virtuozzo.com
Signed-off-by: Vasily Averin &lt;vvs@virtuozzo.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Andrei Vagin &lt;avagin@gmail.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Jiri Slaby &lt;jirislaby@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Serge Hallyn &lt;serge@hallyn.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Yutian Yang &lt;nglaive@gmail.com&gt;
Cc: Zefan Li &lt;lizefan.x@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: enable accounting for new namesapces and struct nsproxy</title>
<updated>2021-09-03T16:58:12Z</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@virtuozzo.com</email>
</author>
<published>2021-09-02T21:55:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=30acd0bdfb86548172168a0cc71d455944de0683'/>
<id>urn:sha1:30acd0bdfb86548172168a0cc71d455944de0683</id>
<content type='text'>
Container admin can create new namespaces and force kernel to allocate up
to several pages of memory for the namespaces and its associated
structures.

Net and uts namespaces have enabled accounting for such allocations.  It
makes sense to account for rest ones to restrict the host's memory
consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/5525bcbf-533e-da27-79b7-158686c64e13@virtuozzo.com
Signed-off-by: Vasily Averin &lt;vvs@virtuozzo.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Acked-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Andrei Vagin &lt;avagin@gmail.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Jiri Slaby &lt;jirislaby@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Yutian Yang &lt;nglaive@gmail.com&gt;
Cc: Zefan Li &lt;lizefan.x@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>clocksource: Make clocksource watchdog test safe for slow-HZ systems</title>
<updated>2021-08-28T15:01:32Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2021-08-12T16:31:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d25a025201ed98f4b93775e0999a3f2135702106'/>
<id>urn:sha1:d25a025201ed98f4b93775e0999a3f2135702106</id>
<content type='text'>
The clocksource watchdog test sets a local JIFFIES_SHIFT macro and assumes
that HZ is &gt;= 100. For smaller HZ values this shift value is too large and
causes undefined behaviour.

Move the HZ-based definitions of JIFFIES_SHIFT from kernel/time/jiffies.c
to kernel/time/tick-internal.h so the clocksource watchdog test can utilize
them, which makes it work correctly with all HZ values.

[ tglx: Resolved conflicts and massaged changelog ]

Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/lkml/20210812000133.GA402890@paulmck-ThinkPad-P17-Gen-1/
</content>
</entry>
<entry>
<title>hrtimer: Unbreak hrtimer_force_reprogram()</title>
<updated>2021-08-12T20:34:40Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-12T20:32:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f80e21489590c00f46226d5802d900e6f66e5633'/>
<id>urn:sha1:f80e21489590c00f46226d5802d900e6f66e5633</id>
<content type='text'>
Since the recent consoliation of reprogramming functions,
hrtimer_force_reprogram() is affected by a check whether the new expiry
time is past the current expiry time.

This breaks the NOHZ logic as that relies on the fact that the tick hrtimer
is moved into the future. That means cpu_base-&gt;expires_next becomes stale
and subsequent reprogramming attempts fail as well until the situation is
cleaned up by an hrtimer interrupts.

For some yet unknown reason this leads to a complete stall, so for now
partially revert the offending commit to a known working state. The root
cause for the stall is still investigated and will be fixed in a subsequent
commit.

Fixes: b14bca97c9f5 ("hrtimer: Consolidate reprogramming code")
Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Reported-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Mike Galbraith &lt;efault@gmx.de&gt;
Link: https://lore.kernel.org/r/8735recskh.ffs@tglx

</content>
</entry>
</feed>
