<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/fork.c, branch v5.10.83</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.10.83</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.10.83'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-11-18T13:04:29Z</updated>
<entry>
<title>posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()</title>
<updated>2021-11-18T13:04:29Z</updated>
<author>
<name>Michael Pratt</name>
<email>mpratt@google.com</email>
</author>
<published>2021-11-01T21:06:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f67f6eb717ae6277ddd6c1685b0e85ef17420096'/>
<id>urn:sha1:f67f6eb717ae6277ddd6c1685b0e85ef17420096</id>
<content type='text'>
commit ca7752caeaa70bd31d1714af566c9809688544af upstream.

copy_process currently copies task_struct.posix_cputimers_work as-is. If a
timer interrupt arrives while handling clone and before dup_task_struct
completes then the child task will have:

1. posix_cputimers_work.scheduled = true
2. posix_cputimers_work.work queued.

copy_process clears task_struct.task_works, so (2) will have no effect and
posix_cpu_timers_work will never run (not to mention it doesn't make sense
for two tasks to share a common linked list).

Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is
never cleared. Since scheduled is set, future timer interrupts will skip
scheduling work, with the ultimate result that the task will never receive
timer expirations.

Together, the complete flow is:

1. Task 1 calls clone(), enters kernel.
2. Timer interrupt fires, schedules task work on Task 1.
   2a. task_struct.posix_cputimers_work.scheduled = true
   2b. task_struct.posix_cputimers_work.work added to
       task_struct.task_works.
3. dup_task_struct() copies Task 1 to Task 2.
4. copy_process() clears task_struct.task_works for Task 2.
5. Future timer interrupts on Task 2 see
   task_struct.posix_cputimers_work.scheduled = true and skip scheduling
   work.

Fix this by explicitly clearing contents of task_struct.posix_cputimers_work
in copy_process(). This was never meant to be shared or inherited across
tasks in the first place.

Fixes: 1fb497dd0030 ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work")
Reported-by: Rhys Hiltner &lt;rhys@justin.tv&gt;
Signed-off-by: Michael Pratt &lt;mpratt@google.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/sched: Fix sched_fork() access an invalid sched_task_group</title>
<updated>2021-11-18T13:04:07Z</updated>
<author>
<name>Zhang Qiao</name>
<email>zhangqiao22@huawei.com</email>
</author>
<published>2021-09-15T06:40:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c85c6fadbef0a3eab41540ea628fa8fe8928c820'/>
<id>urn:sha1:c85c6fadbef0a3eab41540ea628fa8fe8928c820</id>
<content type='text'>
[ Upstream commit 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a ]

There is a small race between copy_process() and sched_fork()
where child-&gt;sched_task_group point to an already freed pointer.

	parent doing fork()      | someone moving the parent
				 | to another cgroup
  -------------------------------+-------------------------------
  copy_process()
      + dup_task_struct()&lt;1&gt;
				  parent move to another cgroup,
				  and free the old cgroup. &lt;2&gt;
      + sched_fork()
	+ __set_task_cpu()&lt;3&gt;
	+ task_fork_fair()
	  + sched_slice()&lt;4&gt;

In the worst case, this bug can lead to "use-after-free" and
cause panic as shown above:

  (1) parent copy its sched_task_group to child at &lt;1&gt;;

  (2) someone move the parent to another cgroup and free the old
      cgroup at &lt;2&gt;;

  (3) the sched_task_group and cfs_rq that belong to the old cgroup
      will be accessed at &lt;3&gt; and &lt;4&gt;, which cause a panic:

  [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  [] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0
  [] Oops: 0000 [#1] SMP PTI
  [] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0.x86_64+ #1
  [] RIP: 0010:sched_slice+0x84/0xc0

  [] Call Trace:
  []  task_fork_fair+0x81/0x120
  []  sched_fork+0x132/0x240
  []  copy_process.part.5+0x675/0x20e0
  []  ? __handle_mm_fault+0x63f/0x690
  []  _do_fork+0xcd/0x3b0
  []  do_syscall_64+0x5d/0x1d0
  []  entry_SYSCALL_64_after_hwframe+0x65/0xca
  [] RIP: 0033:0x7f04418cd7e1

Between cgroup_can_fork() and cgroup_post_fork(), the cgroup
membership and thus sched_task_group can't change. So update child's
sched_task_group at sched_post_fork() and move task_fork() and
__set_task_cpu() (where accees the sched_task_group) from sched_fork()
to sched_post_fork().

Fixes: 8323f26ce342 ("sched: Fix race in task_group")
Signed-off-by: Zhang Qiao &lt;zhangqiao22@huawei.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210915064030.2231-1-zhangqiao22@huawei.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: initialize hugetlb_usage in mm_init</title>
<updated>2021-09-18T11:40:36Z</updated>
<author>
<name>Liu Zixian</name>
<email>liuzixian4@huawei.com</email>
</author>
<published>2021-09-09T01:10:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2d2d8b0eca6473eac0a142edea4c2b038da08cbd'/>
<id>urn:sha1:2d2d8b0eca6473eac0a142edea4c2b038da08cbd</id>
<content type='text'>
commit 13db8c50477d83ad3e3b9b0ae247e5cd833a7ae4 upstream.

After fork, the child process will get incorrect (2x) hugetlb_usage.  If
a process uses 5 2MB hugetlb pages in an anonymous mapping,

	HugetlbPages:	   10240 kB

and then forks, the child will show,

	HugetlbPages:	   20480 kB

The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent
to child.  Child will have 2x actual usage.

Fix this by adding hugetlb_count_init in mm_init.

Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes: 5d317b2b6536 ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
Signed-off-by: Liu Zixian &lt;liuzixian4@huawei.com&gt;
Reviewed-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Reviewed-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "Add a reference to ucounts for each cred"</title>
<updated>2021-09-08T06:49:00Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2021-09-03T14:06:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ae16b7c668378ea00eb60ab9d29e0d46b0e7aa15'/>
<id>urn:sha1:ae16b7c668378ea00eb60ab9d29e0d46b0e7aa15</id>
<content type='text'>
This reverts commit b2c4d9a33cc2dec7466f97eba2c4dd571ad798a5 which is
commit 905ae01c4ae2ae3df05bb141801b1db4b7d83c61 upstream.

This commit should not have been applied to the 5.10.y stable tree, so
revert it.

Reported-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Link: https://lore.kernel.org/r/87v93k4bl6.fsf@disp2133
Cc: Alexey Gladkov &lt;legion@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Initialize the idle task with preemption disabled</title>
<updated>2021-07-14T14:55:50Z</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2021-05-12T09:46:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c51d82d0b7862d7d246016c74b4390fb1fa1f11'/>
<id>urn:sha1:3c51d82d0b7862d7d246016c74b4390fb1fa1f11</id>
<content type='text'>
[ Upstream commit f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674 ]

As pointed out by commit

  de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread")

init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.

As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().

Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -&gt; idle_init().

Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().

Secondary startups were patched via coccinelle:

  @begone@
  @@

  -preempt_disable();
  ...
  cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);

Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Add a reference to ucounts for each cred</title>
<updated>2021-07-14T14:55:48Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>legion@kernel.org</email>
</author>
<published>2021-04-22T12:27:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b2c4d9a33cc2dec7466f97eba2c4dd571ad798a5'/>
<id>urn:sha1:b2c4d9a33cc2dec7466f97eba2c4dd571ad798a5</id>
<content type='text'>
[ Upstream commit 905ae01c4ae2ae3df05bb141801b1db4b7d83c61 ]

For RLIMIT_NPROC and some other rlimits the user_struct that holds the
global limit is kept alive for the lifetime of a process by keeping it
in struct cred. Adding a pointer to ucounts in the struct cred will
allow to track RLIMIT_NPROC not only for user in the system, but for
user in the user_namespace.

Updating ucounts may require memory allocation which may fail. So, we
cannot change cred.ucounts in the commit_creds() because this function
cannot fail and it should always return 0. For this reason, we modify
cred.ucounts before calling the commit_creds().

Changelog

v6:
* Fix null-ptr-deref in is_ucounts_overlimit() detected by trinity. This
  error was caused by the fact that cred_alloc_blank() left the ucounts
  pointer empty.

Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Signed-off-by: Alexey Gladkov &lt;legion@kernel.org&gt;
Link: https://lkml.kernel.org/r/b37aaef28d8b9b0d757e07ba6dd27281bbe39259.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/fork: clear PASID for new mm</title>
<updated>2021-03-30T12:31:52Z</updated>
<author>
<name>Fenghua Yu</name>
<email>fenghua.yu@intel.com</email>
</author>
<published>2021-03-13T05:07:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7077d5e7f07439a45d2b645ba1ed4ca67592a835'/>
<id>urn:sha1:7077d5e7f07439a45d2b645ba1ed4ca67592a835</id>
<content type='text'>
[ Upstream commit 82e69a121be4b1597ce758534816a8ee04c8b761 ]

When a new mm is created, its PASID should be cleared, i.e.  the PASID is
initialized to its init state 0 on both ARM and X86.

This patch was part of the series introducing mm-&gt;pasid, but got lost
along the way [1].  It still makes sense to have it, because each address
space has a different PASID.  And the IOMMU code in
iommu_sva_alloc_pasid() expects the pasid field of a new mm struct to be
cleared.

[1] https://lore.kernel.org/linux-iommu/YDgh53AcQHT+T3L0@otcwcpicx3.sc.intel.com/

Link: https://lkml.kernel.org/r/20210302103837.2562625-1-jean-philippe@linaro.org
Signed-off-by: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Jean-Philippe Brucker &lt;jean-philippe@linaro.org&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Jacob Pan &lt;jacob.jun.pan@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>exec: Transform exec_update_mutex into a rw_semaphore</title>
<updated>2021-01-09T12:46:24Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-12-03T20:12:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ab7709b551de24e7bebf44946120e6740b1e28db'/>
<id>urn:sha1:ab7709b551de24e7bebf44946120e6740b1e28db</id>
<content type='text'>
[ Upstream commit f7cfd871ae0c5008d94b6f66834e7845caa93c15 ]

Recently syzbot reported[0] that there is a deadlock amongst the users
of exec_update_mutex.  The problematic lock ordering found by lockdep
was:

   perf_event_open  (exec_update_mutex -&gt; ovl_i_mutex)
   chown            (ovl_i_mutex       -&gt; sb_writes)
   sendfile         (sb_writes         -&gt; p-&gt;lock)
     by reading from a proc file and writing to overlayfs
   proc_pid_syscall (p-&gt;lock           -&gt; exec_update_mutex)

While looking at possible solutions it occured to me that all of the
users and possible users involved only wanted to state of the given
process to remain the same.  They are all readers.  The only writer is
exec.

There is no reason for readers to block on each other.  So fix
this deadlock by transforming exec_update_mutex into a rw_semaphore
named exec_update_lock that only exec takes for writing.

Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Bernd Edlinger &lt;bernd.edlinger@hotmail.de&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Christopher Yeoh &lt;cyeoh@au1.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Cc: Sargun Dhillon &lt;sargun@sargun.me&gt;
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex")
[0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com
Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/gup: prevent gup_fast from racing with COW during fork</title>
<updated>2020-12-30T10:53:54Z</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2020-12-15T03:05:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=537946556cc971c9f4405cd76c2434fb78b0e992'/>
<id>urn:sha1:537946556cc971c9f4405cd76c2434fb78b0e992</id>
<content type='text'>
[ Upstream commit 57efa1fe5957694fa541c9062de0a127f0b9acb0 ]

Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during
fork() for ptes") pages under a FOLL_PIN will not be write protected
during COW for fork.  This means that pages returned from
pin_user_pages(FOLL_WRITE) should not become write protected while the pin
is active.

However, there is a small race where get_user_pages_fast(FOLL_PIN) can
establish a FOLL_PIN at the same time copy_present_page() is write
protecting it:

        CPU 0                             CPU 1
   get_user_pages_fast()
    internal_get_user_pages_fast()
                                       copy_page_range()
                                         pte_alloc_map_lock()
                                           copy_present_page()
                                             atomic_read(has_pinned) == 0
					     page_maybe_dma_pinned() == false
     atomic_set(has_pinned, 1);
     gup_pgd_range()
      gup_pte_range()
       pte_t pte = gup_get_pte(ptep)
       pte_access_permitted(pte)
       try_grab_compound_head()
                                             pte = pte_wrprotect(pte)
	                                     set_pte_at();
                                         pte_unmap_unlock()
      // GUP now returns with a write protected page

The first attempt to resolve this by using the write protect caused
problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid
early COW write protect games during fork()")

Instead wrap copy_p4d_range() with the write side of a seqcount and check
the read side around gup_pgd_range().  If there is a collision then
get_user_pages_fast() fails and falls back to slow GUP.

Slow GUP is safe against this race because copy_page_range() is only
called while holding the exclusive side of the mmap_lock on the src
mm_struct.

[akpm@linux-foundation.org: coding style fixes]
  Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com

Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com
Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: John Hubbard &lt;jhubbard@nvidia.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Acked-by: "Ahmed S. Darwish" &lt;a.darwish@linutronix.de&gt;	[seqcount_t parts]
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Kirill Shutemov &lt;kirill@shutemov.name&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fork: fix copy_process(CLONE_PARENT) race with the exiting -&gt;real_parent</title>
<updated>2020-11-08T19:18:39Z</updated>
<author>
<name>Eddy Wu</name>
<email>itseddy0402@gmail.com</email>
</author>
<published>2020-11-07T06:47:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b4e00444cab4c3f3fec876dc0cccc8cbb0d1a948'/>
<id>urn:sha1:b4e00444cab4c3f3fec876dc0cccc8cbb0d1a948</id>
<content type='text'>
current-&gt;group_leader-&gt;exit_signal may change during copy_process() if
current-&gt;real_parent exits.

Move the assignment inside tasklist_lock to avoid the race.

Signed-off-by: Eddy Wu &lt;eddy_wu@trendmicro.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
