<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched, branch v4.1.18</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.18</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.18'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-10-27T00:51:58Z</updated>
<entry>
<title>sched/preempt: Fix cond_resched_lock() and cond_resched_softirq()</title>
<updated>2015-10-27T00:51:58Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-07-15T09:52:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=98197d3de58a62785be3e421864d6145955f197d'/>
<id>urn:sha1:98197d3de58a62785be3e421864d6145955f197d</id>
<content type='text'>
commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream.

These functions check should_resched() before unlocking spinlock/bh-enable:
preempt_count always non-zero =&gt; should_resched() always returns false.
cond_resched_lock() worked iff spin_needbreak is set.

This patch adds argument "preempt_offset" to should_resched().

preempt_count offset constants for that:

  PREEMPT_DISABLE_OFFSET  - offset after preempt_disable()
  PREEMPT_LOCK_OFFSET     - offset after spin_lock()
  SOFTIRQ_DISABLE_OFFSET  - offset after local_bh_distable()
  SOFTIRQ_LOCK_OFFSET     - offset after spin_lock_bh()

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alexander Graf &lt;agraf@suse.de&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Prevent throttling in early pick_next_task_fair()</title>
<updated>2015-10-22T21:43:20Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2015-04-06T22:28:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c04b33c93dc0c566b02ac573528bb82ac6065666'/>
<id>urn:sha1:c04b33c93dc0c566b02ac573528bb82ac6065666</id>
<content type='text'>
commit 54d27365cae88fbcc853b391dcd561e71acb81fa upstream.

The optimized task selection logic optimistically selects a new task
to run without first doing a full put_prev_task(). This is so that we
can avoid a put/set on the common ancestors of the old and new task.

Similarly, we should only call check_cfs_rq_runtime() to throttle
eligible groups if they're part of the common ancestry, otherwise it
is possible to end up with no eligible task in the simple task
selection.

Imagine:
		/root
	/prev		/next
	/A		/B

If our optimistic selection ends up throttling /next, we goto simple
and our put_prev_task() ends up throttling /prev, after which we're
going to bug out in set_next_entity() because there aren't any tasks
left.

Avoid this scenario by only throttling common ancestors.

Reported-by: Mohammed Naser &lt;mnaser@vexxhost.com&gt;
Reported-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
[ munged Changelog ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin &lt;klamm@yandex-team.ru&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: pjt@google.com
Fixes: 678d5718d8d0 ("sched/fair: Optimize cgroup pick_next_task_fair()")
Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/core: Fix TASK_DEAD race in finish_task_switch()</title>
<updated>2015-10-22T21:43:14Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-09-29T12:45:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8210b9199f66447513b63c9a5aef988ee60c4319'/>
<id>urn:sha1:8210b9199f66447513b63c9a5aef988ee60c4319</id>
<content type='text'>
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A-&gt;pi_lock
                                          A-&gt;on_cpu == 0
        finish_task_switch(A)
          prev_state = A-&gt;state  &lt;-.
          WMB                      |
          A-&gt;on_cpu = 0;           |
          UNLOCK rq0-&gt;lock         |
                                   |    context_switch(C, A)
                                   `--  A-&gt;state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A-&gt;state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A-&gt;state on CPU0
to cross over and observe CPU1's store of A-&gt;state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A-&gt;state while holding rq-&gt;lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq-&gt;lock; it takes A-&gt;pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&amp;A-&gt;on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq-&gt;lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: access local runqueue directly in single_task_running</title>
<updated>2015-10-22T21:43:12Z</updated>
<author>
<name>Dominik Dingel</name>
<email>dingel@linux.vnet.ibm.com</email>
</author>
<published>2015-09-18T09:27:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=feada04ec6c916fa3779d736e75c8453ee3ddc58'/>
<id>urn:sha1:feada04ec6c916fa3779d736e75c8453ee3ddc58</id>
<content type='text'>
commit 00cc1633816de8c95f337608a1ea64e228faf771 upstream.

Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).

With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.

To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.

Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8
Signed-off-by: Dominik Dingel &lt;dingel@linux.vnet.ibm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix cpu_active_mask/cpu_online_mask race</title>
<updated>2015-09-21T17:05:29Z</updated>
<author>
<name>Jan H. Schönherr</name>
<email>jschoenh@amazon.de</email>
</author>
<published>2015-08-12T19:35:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=668327e49d2aa4782da771c4b1c6c152bf5084a1'/>
<id>urn:sha1:668327e49d2aa4782da771c4b1c6c152bf5084a1</id>
<content type='text'>
commit dd9d3843755da95f63dd3a376f62b3e45c011210 upstream.

There is a race condition in SMP bootup code, which may result
in

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()
or
    kernel BUG at kernel/smpboot.c:135!

It can be triggered with a bit of luck in Linux guests running
on busy hosts.

	CPU0                        CPUn
	====                        ====

	_cpu_up()
	  __cpu_up()
				    start_secondary()
				      set_cpu_online()
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_online_bits));
	  cpu_notify(CPU_ONLINE)
	    &lt;do stuff, see below&gt;
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_active_bits));

During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.

Variant 1:

  cpu_notify(CPU_ONLINE)
    workqueue_cpu_up_callback()
      rebind_workers()
        set_cpus_allowed_ptr()

  This call fails because it requires an active CPU; rebind_workers()
  ends with a warning:

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()

Variant 2:

  cpu_notify(CPU_ONLINE)
    smpboot_thread_call()
      smpboot_unpark_threads()
       ..
        __kthread_unpark()
          __kthread_bind()
          wake_up_state()
           ..
            select_task_rq()
              select_fallback_rq()

  The -&gt;wake_cpu of the unparked thread is not allowed, making a call
  to select_fallback_rq() necessary. Then, select_fallback_rq() cannot
  find an allowed, active CPU and promptly resets the allowed CPUs, so
  that the task in question ends up on CPU0.

  When those unparked tasks are eventually executed, they run
  immediately into a BUG:

    kernel BUG at kernel/smpboot.c:135!

Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb96976f ("sched: Fix hotplug vs.
set_cpus_allowed_ptr()"), thus, reintroducing that particular
problem.

Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online")
- commit 5fbd036b552f ("sched: Cleanup cpu_active madness")

Together, these give us the following non-working solutions:

  - secondary CPU sets active before online, because active is assumed to
    be a subset of online;

  - secondary CPU sets online before active, because the primary CPU
    assumes that an online CPU is also active;

  - secondary CPU sets online and waits for primary CPU to set active,
    because it might deadlock.

Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are
active &amp; online") introduces an arch-specific solution to this
arch-independent problem.

Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.

set_cpus_allowed_ptr()")

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Wilson &lt;msw@amazon.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Link: http://lkml.kernel.org/r/1439408156-18840-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings</title>
<updated>2015-06-10T23:43:43Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2015-06-10T18:15:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8e76d4eecf7afeec9328e21cd5880e281838d0d6'/>
<id>urn:sha1:8e76d4eecf7afeec9328e21cd5880e281838d0d6</id>
<content type='text'>
Jovi Zhangwei reported the following problem

  Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
  with GFP_COMP flag.

  [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping:          (null) index:0x0
  [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
  [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) &amp;&amp; !PageTransHuge(page))
  [Mon May 25 05:29:33 2015] ------------[ cut here ]------------
  [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
  [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP

In this case it was triggered by running tcpdump but it's not necessary
reproducible on all systems.

  sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap

Compound pages cannot be migrated and it was not expected that such pages
be marked for NUMA balancing.  This did not take into account that drivers
such as net/packet/af_packet.c may insert compound pages into userspace
with vm_insert_page.  This patch tells the NUMA balancing protection
scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
compound pages are marked for migration.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reported-by: Jovi Zhangwei &lt;jovi@cloudflare.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.dk/linux-block</title>
<updated>2015-05-22T22:15:30Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-05-22T22:15:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1c8df7bd48347a707b437cfd0dad6b08a3b89ab6'/>
<id>urn:sha1:1c8df7bd48347a707b437cfd0dad6b08a3b89ab6</id>
<content type='text'>
Pull block fixes from Jens Axboe:
 "Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio
</content>
</entry>
<entry>
<title>sched: always use blk_schedule_flush_plug in io_schedule_out</title>
<updated>2015-05-18T22:06:41Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-05-08T17:51:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=10d784eae2b41e25d8fc6a88096cd27286093c84'/>
<id>urn:sha1:10d784eae2b41e25d8fc6a88096cd27286093c84</id>
<content type='text'>
block plug callback could sleep, so we introduce a parameter
'from_schedule' and corresponding drivers can use it to destinguish a
schedule plug flush or a plug finish. Unfortunately io_schedule_out
still uses blk_flush_plug(). This causes below output (Note, I added a
might_sleep() in raid1_unplug to make it trigger faster, but the whole
thing doesn't matter if I add might_sleep). In raid1/10, this can cause
deadlock.

This patch makes io_schedule_out always uses blk_schedule_flush_plug.
This should only impact drivers (as far as I know, raid 1/10) which are
sensitive to the 'from_schedule' parameter.

[  370.817949] ------------[ cut here ]------------
[  370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90()
[  370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [&lt;ffffffff81092fcf&gt;] prepare_to_wait+0x2f/0x90
[  370.817971] Modules linked in: raid1
[  370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G        W       4.0.0+ #361
[  370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[  370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  370.817985]  ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001
[  370.817988]  ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8
[  370.817990]  ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28
[  370.817993] Call Trace:
[  370.817999]  [&lt;ffffffff819dd7af&gt;] dump_stack+0x4f/0x7b
[  370.818002]  [&lt;ffffffff81051afc&gt;] warn_slowpath_common+0x8c/0xd0
[  370.818004]  [&lt;ffffffff81051b86&gt;] warn_slowpath_fmt+0x46/0x50
[  370.818006]  [&lt;ffffffff81092fcf&gt;] ? prepare_to_wait+0x2f/0x90
[  370.818008]  [&lt;ffffffff81092fcf&gt;] ? prepare_to_wait+0x2f/0x90
[  370.818010]  [&lt;ffffffff810776ef&gt;] __might_sleep+0x7f/0x90
[  370.818014]  [&lt;ffffffffa0000c03&gt;] raid1_unplug+0xd3/0x170 [raid1]
[  370.818024]  [&lt;ffffffff81421d9a&gt;] blk_flush_plug_list+0x8a/0x1e0
[  370.818028]  [&lt;ffffffff819e3550&gt;] ? bit_wait+0x50/0x50
[  370.818031]  [&lt;ffffffff819e21b0&gt;] io_schedule_timeout+0x130/0x140
[  370.818033]  [&lt;ffffffff819e3586&gt;] bit_wait_io+0x36/0x50
[  370.818034]  [&lt;ffffffff819e31b5&gt;] __wait_on_bit+0x65/0x90
[  370.818041]  [&lt;ffffffff8125b67c&gt;] ? ext4_read_block_bitmap_nowait+0xbc/0x630
[  370.818043]  [&lt;ffffffff819e3550&gt;] ? bit_wait+0x50/0x50
[  370.818045]  [&lt;ffffffff819e3302&gt;] out_of_line_wait_on_bit+0x72/0x80
[  370.818047]  [&lt;ffffffff810935e0&gt;] ? autoremove_wake_function+0x40/0x40
[  370.818050]  [&lt;ffffffff811de744&gt;] __wait_on_buffer+0x44/0x50
[  370.818053]  [&lt;ffffffff8125ae80&gt;] ext4_wait_block_bitmap+0xe0/0xf0
[  370.818058]  [&lt;ffffffff812975d6&gt;] ext4_mb_init_cache+0x206/0x790
[  370.818062]  [&lt;ffffffff8114bc6c&gt;] ? lru_cache_add+0x1c/0x50
[  370.818064]  [&lt;ffffffff81297c7e&gt;] ext4_mb_init_group+0x11e/0x200
[  370.818066]  [&lt;ffffffff81298231&gt;] ext4_mb_load_buddy+0x341/0x360
[  370.818068]  [&lt;ffffffff8129a1a3&gt;] ext4_mb_find_by_goal+0x93/0x2f0
[  370.818070]  [&lt;ffffffff81295b54&gt;] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818072]  [&lt;ffffffff8129ab67&gt;] ext4_mb_regular_allocator+0x67/0x460
[  370.818074]  [&lt;ffffffff81295b54&gt;] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818076]  [&lt;ffffffff8129ca4b&gt;] ext4_mb_new_blocks+0x4cb/0x620
[  370.818079]  [&lt;ffffffff81290956&gt;] ext4_ext_map_blocks+0x4c6/0x14d0
[  370.818081]  [&lt;ffffffff812a4d4e&gt;] ? ext4_es_lookup_extent+0x4e/0x290
[  370.818085]  [&lt;ffffffff8126399d&gt;] ext4_map_blocks+0x14d/0x4f0
[  370.818088]  [&lt;ffffffff81266fbd&gt;] ext4_writepages+0x76d/0xe50
[  370.818094]  [&lt;ffffffff81149691&gt;] do_writepages+0x21/0x50
[  370.818097]  [&lt;ffffffff811d5c00&gt;] __writeback_single_inode+0x60/0x490
[  370.818099]  [&lt;ffffffff811d630a&gt;] writeback_sb_inodes+0x2da/0x590
[  370.818103]  [&lt;ffffffff811abf4b&gt;] ? trylock_super+0x1b/0x50
[  370.818105]  [&lt;ffffffff811abf4b&gt;] ? trylock_super+0x1b/0x50
[  370.818107]  [&lt;ffffffff811d665f&gt;] __writeback_inodes_wb+0x9f/0xd0
[  370.818109]  [&lt;ffffffff811d69db&gt;] wb_writeback+0x34b/0x3c0
[  370.818111]  [&lt;ffffffff811d70df&gt;] bdi_writeback_workfn+0x23f/0x550
[  370.818116]  [&lt;ffffffff8106bbd8&gt;] process_one_work+0x1c8/0x570
[  370.818117]  [&lt;ffffffff8106bb5b&gt;] ? process_one_work+0x14b/0x570
[  370.818119]  [&lt;ffffffff8106c09b&gt;] worker_thread+0x11b/0x470
[  370.818121]  [&lt;ffffffff8106bf80&gt;] ? process_one_work+0x570/0x570
[  370.818124]  [&lt;ffffffff81071868&gt;] kthread+0xf8/0x110
[  370.818126]  [&lt;ffffffff81071770&gt;] ? kthread_create_on_node+0x210/0x210
[  370.818129]  [&lt;ffffffff819e9322&gt;] ret_from_fork+0x42/0x70
[  370.818131]  [&lt;ffffffff81071770&gt;] ? kthread_create_on_node+0x210/0x210
[  370.818132] ---[ end trace 7b4deb71e68b6605 ]---

V2: don't change -&gt;in_iowait

Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>sched/core: Fix regression in cpuset_cpu_inactive() for suspend</title>
<updated>2015-05-08T09:53:56Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@osandov.com</email>
</author>
<published>2015-05-04T10:09:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=533445c6e53368569e50ab3fb712230c03d523f3'/>
<id>urn:sha1:533445c6e53368569e50ab3fb712230c03d523f3</id>
<content type='text'>
Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

	$ ./a.out
	sched_setaffinity: Success
	$ systemctl suspend
	$ ./a.out
	sched_setaffinity: Invalid argument

... where ./a.out is:

	#define _GNU_SOURCE
	#include &lt;errno.h&gt;
	#include &lt;sched.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;string.h&gt;
	#include &lt;unistd.h&gt;

	int main(void)
	{
		long num_cores;
		cpu_set_t cpu_set;
		int ret;

		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
		CPU_ZERO(&amp;cpu_set);
		CPU_SET(num_cores - 1, &amp;cpu_set);
		errno = 0;
		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &amp;cpu_set);
		perror("sched_setaffinity");
		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
	}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval &lt;osandov@osandov.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Juri Lelli &lt;juri.lelli@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Handle priority boosted tasks proper in setscheduler()</title>
<updated>2015-05-08T09:53:55Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2015-05-05T17:49:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0782e63bc6fe7e2d3408d250df11d388b7799c6b'/>
<id>urn:sha1:0782e63bc6fe7e2d3408d250df11d388b7799c6b</id>
<content type='text'>
Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus &lt;ronny.meeus@gmail.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
