<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v4.17.19</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.17.19</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.17.19'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-08-24T11:07:14Z</updated>
<entry>
<title>mm: make vm_area_alloc() initialize core fields</title>
<updated>2018-08-24T11:07:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-07-21T22:24:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d9b875a4a1a49414bfed766614901b41aefffe4f'/>
<id>urn:sha1:d9b875a4a1a49414bfed766614901b41aefffe4f</id>
<content type='text'>
[ Upstream commit 490fc053865c9cc40f1085ef8a5504f5341f79d2 ]

Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make vm_area_dup() actually copy the old vma data</title>
<updated>2018-08-24T11:07:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-07-21T21:48:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ae97be30fa7d9e99359598fcef22db0415e10181'/>
<id>urn:sha1:ae97be30fa7d9e99359598fcef22db0415e10181</id>
<content type='text'>
[ Upstream commit 95faf6992df468f617edb788da8c21c6eed0dfa7 ]

.. and re-initialize th eanon_vma_chain head.

This removes some boiler-plate from the users, and also makes it clear
why it didn't need use the 'zalloc()' version.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use helper functions for allocating and freeing vm_area structs</title>
<updated>2018-08-24T11:07:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-07-21T20:48:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5aac8b553059deec6d5d628f6da9271e247a0aac'/>
<id>urn:sha1:5aac8b553059deec6d5d628f6da9271e247a0aac</id>
<content type='text'>
[ Upstream commit 3928d4f5ee37cdc523894f6e549e6aae521d8980 ]

The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls.  This is a purely mechanical conversion:

    # new vma:
    kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -&gt; vm_area_alloc()

    # copy old vma
    kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -&gt; vm_area_dup(old)

    # free vma
    kmem_cache_free(vm_area_cachep, vma) -&gt; vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix switched_from_dl() warning</title>
<updated>2018-08-24T11:07:10Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2018-07-11T07:29:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=92ab8ea99f2b962b0458de85dd647d3eeb1f272d'/>
<id>urn:sha1:92ab8ea99f2b962b0458de85dd647d3eeb1f272d</id>
<content type='text'>
[ Upstream commit e117cb52bdb4d376b711bee34af6434c9e314b3b ]

Mark noticed that syzkaller is able to reliably trigger the following warning:

  dl_rq-&gt;running_bw &gt; dl_rq-&gt;this_bw
  WARNING: CPU: 1 PID: 153 at kernel/sched/deadline.c:124 switched_from_dl+0x454/0x608
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 1 PID: 153 Comm: syz-executor253 Not tainted 4.18.0-rc3+ #29
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x0/0x458
   show_stack+0x20/0x30
   dump_stack+0x180/0x250
   panic+0x2dc/0x4ec
   __warn_printk+0x0/0x150
   report_bug+0x228/0x2d8
   bug_handler+0xa0/0x1a0
   brk_handler+0x2f0/0x568
   do_debug_exception+0x1bc/0x5d0
   el1_dbg+0x18/0x78
   switched_from_dl+0x454/0x608
   __sched_setscheduler+0x8cc/0x2018
   sys_sched_setattr+0x340/0x758
   el0_svc_naked+0x30/0x34

syzkaller reproducer runs a bunch of threads that constantly switch
between DEADLINE and NORMAL classes while interacting through futexes.

The splat above is caused by the fact that if a DEADLINE task is setattr
back to NORMAL while in non_contending state (blocked on a futex -
inactive timer armed), its contribution to running_bw is not removed
before sub_rq_bw() gets called (!task_on_rq_queued() branch) and the
latter sees running_bw &gt; this_bw.

Fix it by removing a task contribution from running_bw if the task is
not queued and in non_contending state while switched to a different
class.

Reported-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Reviewed-by: Luca Abeni &lt;luca.abeni@santannapisa.it&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: claudio@evidence.eu.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20180711072948.27061-1-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tracing: Use __printf markup to silence compiler</title>
<updated>2018-08-24T11:06:58Z</updated>
<author>
<name>Mathieu Malaterre</name>
<email>malat@debian.org</email>
</author>
<published>2018-03-08T20:58:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=25b98023d73c4dcc08d57260ddb7334a9ae9fa16'/>
<id>urn:sha1:25b98023d73c4dcc08d57260ddb7334a9ae9fa16</id>
<content type='text'>
[ Upstream commit 26b68dd2f48fe7699a89f0cfbb9f4a650dc1c837 ]

Silence warnings (triggered at W=1) by adding relevant __printf attributes.

  CC      kernel/trace/trace.o
kernel/trace/trace.c: In function ‘__trace_array_vprintk’:
kernel/trace/trace.c:2979:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
  len = vscnprintf(tbuffer, TRACE_BUF_SIZE, fmt, args);
  ^~~
  AR      kernel/trace/built-in.o

Link: http://lkml.kernel.org/r/20180308205843.27447-1-malat@debian.org

Signed-off-by: Mathieu Malaterre &lt;malat@debian.org&gt;
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: hash map: decrement counter on error</title>
<updated>2018-08-24T11:06:58Z</updated>
<author>
<name>Mauricio Vasquez B</name>
<email>mauricio.vasquez@polito.it</email>
</author>
<published>2018-06-29T12:48:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ba80c24ebc85ced0d9efe83b73651558f67bce56'/>
<id>urn:sha1:ba80c24ebc85ced0d9efe83b73651558f67bce56</id>
<content type='text'>
[ Upstream commit ed2b82c03dc187018307c7c6bf9299705f3db383 ]

Decrement the number of elements in the map in case the allocation
of a new node fails.

Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Signed-off-by: Mauricio Vasquez B &lt;mauricio.vasquez@polito.it&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kthread, sched/core: Fix kthread_parkme() (again...)</title>
<updated>2018-08-24T11:06:57Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2018-06-07T09:45:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2caa5529cb44baf69e766707e1d30482e47391a4'/>
<id>urn:sha1:2caa5529cb44baf69e766707e1d30482e47391a4</id>
<content type='text'>
[ Upstream commit 1cef1150ef40ec52f507436a14230cbc2623299c ]

Gaurav reports that commit:

  85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")

isn't working for him. Because of the following race:

&gt; controller Thread                               CPUHP Thread
&gt; takedown_cpu
&gt; kthread_park
&gt; kthread_parkme
&gt; Set KTHREAD_SHOULD_PARK
&gt;                                                 smpboot_thread_fn
&gt;                                                 set Task interruptible
&gt;
&gt;
&gt; wake_up_process
&gt;  if (!(p-&gt;state &amp; state))
&gt;                 goto out;
&gt;
&gt;                                                 Kthread_parkme
&gt;                                                 SET TASK_PARKED
&gt;                                                 schedule
&gt;                                                 raw_spin_lock(&amp;rq-&gt;lock)
&gt; ttwu_remote
&gt; waiting for __task_rq_lock
&gt;                                                 context_switch
&gt;
&gt;                                                 finish_lock_switch
&gt;
&gt;
&gt;
&gt;                                                 Case TASK_PARKED
&gt;                                                 kthread_park_complete
&gt;
&gt;
&gt; SET Running

Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
handling is buggered because the TASK_DEAD thing is done with
preemption disabled, the current code can still complete early on
preemption :/

So basically revert that earlier fix and go with a variant of the
alternative mentioned in the commit. Promote TASK_PARKED to special
state to avoid the store-store issue on task-&gt;state leading to the
WARN in kthread_unpark() -&gt; __kthread_bind().

But in addition, add wait_task_inactive() to kthread_park() to ensure
the task really is PARKED when we return from kthread_park(). This
avoids the whole kthread still gets migrated nonsense -- although it
would be really good to get this done differently.

Reported-by: Gaurav Kohli &lt;gkohli@codeaurora.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/util_est: Fix util_est_dequeue() for throttled cfs_rq</title>
<updated>2018-08-24T11:06:57Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2018-06-14T10:33:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d2bc46d5694003a0679a5a16f804d69853ce3736'/>
<id>urn:sha1:d2bc46d5694003a0679a5a16f804d69853ce3736</id>
<content type='text'>
[ Upstream commit 3482d98bbc730758b63a5d1cf41d05ea17481412 ]

When a cfs_rq is throttled, parent cfs_rq-&gt;nr_running is decreased and
everything happens at cfs_rq level. Currently util_est stays unchanged
in such case and it keeps accounting the utilization of throttled tasks.
This can somewhat make sense as we don't dequeue tasks but only throttled
cfs_rq.

If a task of another group is enqueued/dequeued and root cfs_rq becomes
idle during the dequeue, util_est will be cleared whereas it was
accounting util_est of throttled tasks before. So the behavior of util_est
is not always the same regarding throttled tasks and depends of side
activity. Furthermore, util_est will not be updated when the cfs_rq is
unthrottled as everything happens at cfs_rq level. Main results is that
util_est will stay null whereas we now have running tasks. We have to wait
for the next dequeue/enqueue of the previously throttled tasks to get an
up to date util_est.

Remove the assumption that cfs_rq's estimated utilization of a CPU is 0
if there is no running task so the util_est of a task remains until the
latter is dequeued even if its cfs_rq has been throttled.

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
Link: http://lkml.kernel.org/r/1528972380-16268-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix bandwidth timer clock drift condition</title>
<updated>2018-08-24T11:06:57Z</updated>
<author>
<name>Xunlei Pang</name>
<email>xlpang@linux.alibaba.com</email>
</author>
<published>2018-06-20T10:18:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=431ad472e09f9e6c850652eb14c712bf420a8089'/>
<id>urn:sha1:431ad472e09f9e6c850652eb14c712bf420a8089</id>
<content type='text'>
[ Upstream commit 512ac999d2755d2b7109e996a76b6fb8b888631d ]

I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.

It's very simple to reproduce:

  mkdir /sys/fs/cgroup/cpu/test
  cd /sys/fs/cgroup/cpu/test
  echo 100000 &gt; cpu.cfs_quota_us
  echo $$ &gt; tasks

then repeat:

  cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}-&gt;runtime_expires" after period timer is re-armed.

The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:

  a9cf55b28610 ("sched: Expire invalid runtime")

... but was changed to be the current implementation due to its locking bug.

This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).

Signed-off-by: Xunlei Pang &lt;xlpang@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b-&gt;quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/nohz: Skip remote tick on idle task entirely</title>
<updated>2018-08-24T11:06:57Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2018-06-28T16:29:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d9cad981850cbd160f0392167d18ee5cd75df025'/>
<id>urn:sha1:d9cad981850cbd160f0392167d18ee5cd75df025</id>
<content type='text'>
[ Upstream commit d9c0ffcabd6aae7ff1e34e8078354c13bb9f1183 ]

Some people have reported that the warning in sched_tick_remote()
occasionally triggers, especially in favour of some RCU-Torture
pressure:

	WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0
	Modules linked in:
	CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	Workqueue: events_unbound sched_tick_remote
	RIP: 0010:sched_tick_remote+0xb6/0xc0
	Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa &lt;0f&gt; 0b eb
	+c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6
	Call Trace:
	 process_one_work+0x1df/0x3b0
	 worker_thread+0x44/0x3d0
	 kthread+0xf3/0x130
	 ? set_worker_desc+0xb0/0xb0
	 ? kthread_create_worker_on_cpu+0x70/0x70
	 ret_from_fork+0x35/0x40

This happens when the remote tick applies on an idle task. Usually the
idle_cpu() check avoids that, but it is performed before we lock the
runqueue and it is therefore racy. It was intended to be that way in
order to prevent from useless runqueue locks since idle task tick
callback is a no-op.

Now if the racy check slips out of our hands and we end up remotely
ticking an idle task, the empty task_tick_idle() is harmless. Still
it won't pass the WARN_ON_ONCE() test that ensures rq_clock_task() is
not too far from curr-&gt;se.exec_start because update_curr_idle() doesn't
update the exec_start value like other scheduler policies. Hence the
reported false positive.

So let's have another check, while the rq is locked, to make sure we
don't remote tick on an idle task. The lockless idle_cpu() still applies
to avoid unecessary rq lock contention.

Reported-by: Jacek Tomaka &lt;jacekt@dug.com&gt;
Reported-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reported-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1530203381-31234-1-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
