<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/sched, branch v5.18.6</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.18.6</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.18.6'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-06-09T08:30:00Z</updated>
<entry>
<title>signal: Deliver SIGTRAP on perf event asynchronously if blocked</title>
<updated>2022-06-09T08:30:00Z</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2022-04-04T11:12:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3acee9cb8cb8f9072b7f7f67975c864e21f7e50f'/>
<id>urn:sha1:3acee9cb8cb8f9072b7f7f67975c864e21f7e50f</id>
<content type='text'>
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ]

With SIGTRAP on perf events, we have encountered termination of
processes due to user space attempting to block delivery of SIGTRAP.
Consider this case:

    &lt;set up SIGTRAP on a perf event&gt;
    ...
    sigset_t s;
    sigemptyset(&amp;s);
    sigaddset(&amp;s, SIGTRAP | &lt;and others&gt;);
    sigprocmask(SIG_BLOCK, &amp;s, ...);
    ...
    &lt;perf event triggers&gt;

When the perf event triggers, while SIGTRAP is blocked, force_sig_perf()
will force the signal, but revert back to the default handler, thus
terminating the task.

This makes sense for error conditions, but not so much for explicitly
requested monitoring. However, the expectation is still that signals
generated by perf events are synchronous, which will no longer be the
case if the signal is blocked and delivered later.

To give user space the ability to clearly distinguish synchronous from
asynchronous signals, introduce siginfo_t::si_perf_flags and
TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is
required in future).

The resolution to the problem is then to (a) no longer force the signal
(avoiding the terminations), but (b) tell user space via si_perf_flags
if the signal was synchronous or not, so that such signals can be
handled differently (e.g. let user space decide to ignore or consider
the data imprecise).

The alternative of making the kernel ignore SIGTRAP on perf events if
the signal is blocked may work for some usecases, but likely causes
issues in others that then have to revert back to interception of
sigprocmask() (which we want to avoid). [ A concrete example: when using
breakpoint perf events to track data-flow, in a region of code where
signals are blocked, data-flow can no longer be tracked accurately.
When a relevant asynchronous signal is received after unblocking the
signal, the data-flow tracking logic needs to know its state is
imprecise. ]

Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Tested-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>kthread: Don't allocate kthread_struct for init and umh</title>
<updated>2022-06-09T08:29:30Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2022-04-11T16:40:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9538e524714c1f35fd918cf241bd844374188115'/>
<id>urn:sha1:9538e524714c1f35fd918cf241bd844374188115</id>
<content type='text'>
commit 343f4c49f2438d8920f1f76fa823ee59b91f02e4 upstream.

If kthread_is_per_cpu runs concurrently with free_kthread_struct the
kthread_struct that was just freed may be read from.

This bug was introduced by commit 40966e316f86 ("kthread: Ensure
struct kthread is present for all kthreads").  When kthread_struct
started to be allocated for all tasks that have PF_KTHREAD set.  This
in turn required the kthread_struct to be freed in kernel_execve and
violated the assumption that kthread_struct will have the same
lifetime as the task.

Looking a bit deeper this only applies to callers of kernel_execve
which is just the init process and the user mode helper processes.
These processes really don't want to be kernel threads but are for
historical reasons.  Mostly that copy_thread does not know how to take
a kernel mode function to the process with for processes without
PF_KTHREAD or PF_IO_WORKER set.

Solve this by not allocating kthread_struct for the init process and
the user mode helper processes.

This is done by adding a kthread member to struct kernel_clone_args.
Setting kthread in fork_idle and kernel_thread.  Adding
user_mode_thread that works like kernel_thread except it does not set
kthread.  In fork only allocating the kthread_struct if .kthread is set.

I have looked at kernel/kthread.c and since commit 40966e316f86
("kthread: Ensure struct kthread is present for all kthreads") there
have been no assumptions added that to_kthread or __to_kthread will
not return NULL.

There are a few callers of to_kthread or __to_kthread that assume a
non-NULL struct kthread pointer will be returned.  These functions are
kthread_data(), kthread_parmme(), kthread_exit(), kthread(),
kthread_park(), kthread_unpark(), kthread_stop().  All of those functions
can reasonably expected to be called when it is know that a task is a
kthread so that assumption seems reasonable.

Cc: stable@vger.kernel.org
Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads")
Reported-by: Максим Кутявин &lt;maximkabox13@gmail.com&gt;
Link: https://lkml.kernel.org/r/20220506141512.516114-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm, hugetlb: allow for "high" userspace addresses</title>
<updated>2022-04-22T03:01:09Z</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2022-04-21T23:35:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5f24d5a579d1eace79d505b148808a850b417d4c'/>
<id>urn:sha1:5f24d5a579d1eace79d505b148808a850b417d4c</id>
<content type='text'>
This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.

This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).

Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.

So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area().  To allow that, move those two macros out
of mm/mmap.c into include/linux/sched/mm.h

If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.

For the time being, only ARM64 is affected by this change.

Catalin (ARM64) said
 "We should have fixed hugetlb_get_unmapped_area() as well when we added
  support for 52-bit VA. The reason for commit f6795053dac8 was to
  prevent normal mmap() from returning addresses above 48-bit by default
  as some user-space had hard assumptions about this.

  It's a slight ABI change if you do this for hugetlb_get_unmapped_area()
  but I doubt anyone would notice. It's more likely that the current
  behaviour would cause issues, so I'd rather have them consistent.

  Basically when arm64 gained support for 52-bit addresses we did not
  want user-space calling mmap() to suddenly get such high addresses,
  otherwise we could have inadvertently broken some programs (similar
  behaviour to x86 here). Hence we added commit f6795053dac8. But we
  missed hugetlbfs which could still get such high mmap() addresses. So
  in theory that's a potential regression that should have bee addressed
  at the same time as commit f6795053dac8 (and before arm64 enabled
  52-bit addresses)"

Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu
Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Steve Capper &lt;steve.capper@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.0.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2022-03-29T00:29:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-29T00:29:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1930a6e739c4b4a654a69164dbe39e554d228915'/>
<id>urn:sha1:1930a6e739c4b4a654a69164dbe39e554d228915</id>
<content type='text'>
Pull ptrace cleanups from Eric Biederman:
 "This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h
</content>
</entry>
<entry>
<title>Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-03-27T17:17:23Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-27T17:17:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7001052160d172f6de06adeffde24dde9935ece8'/>
<id>urn:sha1:7001052160d172f6de06adeffde24dde9935ece8</id>
<content type='text'>
Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
 "Add support for Intel CET-IBT, available since Tigerlake (11th gen),
  which is a coarse grained, hardware based, forward edge
  Control-Flow-Integrity mechanism where any indirect CALL/JMP must
  target an ENDBR instruction or suffer #CP.

  Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
  is limited to 2 instructions (and typically fewer) on branch targets
  not starting with ENDBR. CET-IBT also limits speculation of the next
  sequential instruction after the indirect CALL/JMP [1].

  CET-IBT is fundamentally incompatible with retpolines, but provides,
  as described above, speculation limits itself"

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  kvm/emulate: Fix SETcc emulation for ENDBR
  x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld &gt;= 14.0.0
  x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang &gt;= 14.0.0
  kbuild: Fixup the IBT kbuild changes
  x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
  x86: Remove toolchain check for X32 ABI capability
  x86/alternative: Use .ibt_endbr_seal to seal indirect calls
  objtool: Find unused ENDBR instructions
  objtool: Validate IBT assumptions
  objtool: Add IBT/ENDBR decoding
  objtool: Read the NOENDBR annotation
  x86: Annotate idtentry_df()
  x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
  x86: Annotate call_on_stack()
  objtool: Rework ASM_REACHABLE
  x86: Mark __invalid_creds() __noreturn
  exit: Mark do_group_exit() __noreturn
  x86: Mark stop_this_cpu() __noreturn
  objtool: Ignore extra-symbol code
  objtool: Rename --duplicate to --lto
  ...
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2022-03-22T23:11:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-22T23:11:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3bf03b9a0839c9fb06927ae53ebd0f960b19d408'/>
<id>urn:sha1:3bf03b9a0839c9fb06927ae53ebd0f960b19d408</id>
<content type='text'>
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
</content>
</entry>
<entry>
<title>NUMA balancing: optimize page placement for memory tiering system</title>
<updated>2022-03-22T22:57:09Z</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2022-03-22T21:46:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c574bbe917036c8968b984c82c7b13194fe5ce98'/>
<id>urn:sha1:c574bbe917036c8968b984c82c7b13194fe5ce98</id>
<content type='text'>
With the advent of various new memory types, some machines will have
multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are usually
different.

In such system, because of the memory accessing pattern changing etc,
some pages in the slow memory may become hot globally.  So in this
patch, the NUMA balancing mechanism is enhanced to optimize the page
placement among the different memory types according to hot/cold
dynamically.

In a typical memory tiering system, there are CPUs, fast memory and slow
memory in each physical NUMA node.  The CPUs and the fast memory will be
put in one logical node (called fast memory node), while the slow memory
will be put in another (faked) logical node (called slow memory node).
That is, the fast memory is regarded as local while the slow memory is
regarded as remote.  So it's possible for the recently accessed pages in
the slow memory node to be promoted to the fast memory node via the
existing NUMA balancing mechanism.

The original NUMA balancing mechanism will stop to migrate pages if the
free memory of the target node becomes below the high watermark.  This
is a reasonable policy if there's only one memory type.  But this makes
the original NUMA balancing mechanism almost do not work to optimize
page placement among different memory types.  Details are as follows.

It's the common cases that the working-set size of the workload is
larger than the size of the fast memory nodes.  Otherwise, it's
unnecessary to use the slow memory at all.  So, there are almost always
no enough free pages in the fast memory nodes, so that the globally hot
pages in the slow memory node cannot be promoted to the fast memory
node.  To solve the issue, we have 2 choices as follows,

a. Ignore the free pages watermark checking when promoting hot pages
   from the slow memory node to the fast memory node.  This will
   create some memory pressure in the fast memory node, thus trigger
   the memory reclaiming.  So that, the cold pages in the fast memory
   node will be demoted to the slow memory node.

b. Define a new watermark called wmark_promo which is higher than
   wmark_high, and have kswapd reclaiming pages until free pages reach
   such watermark.  The scenario is as follows: when we want to promote
   hot-pages from a slow memory to a fast memory, but fast memory's free
   pages would go lower than high watermark with such promotion, we wake
   up kswapd with wmark_promo watermark in order to demote cold pages and
   free us up some space.  So, next time we want to promote hot-pages we
   might have a chance of doing so.

The choice "a" may create high memory pressure in the fast memory node.
If the memory pressure of the workload is high, the memory pressure
may become so high that the memory allocation latency of the workload
is influenced, e.g.  the direct reclaiming may be triggered.

The choice "b" works much better at this aspect.  If the memory
pressure of the workload is high, the hot pages promotion will stop
earlier because its allocation watermark is higher than that of the
normal memory allocation.  So in this patch, choice "b" is implemented.
A new zone watermark (WMARK_PROMO) is added.  Which is larger than the
high watermark and can be controlled via watermark_scale_factor.

In addition to the original page placement optimization among sockets,
the NUMA balancing mechanism is extended to be used to optimize page
placement according to hot/cold among different memory types.  So the
sysctl user space interface (numa_balancing) is extended in a backward
compatible way as follow, so that the users can enable/disable these
functionality individually.

The sysctl is converted from a Boolean value to a bits field.  The
definition of the flags is,

- 0: NUMA_BALANCING_DISABLED
- 1: NUMA_BALANCING_NORMAL
- 2: NUMA_BALANCING_MEMORY_TIERING

We have tested the patch with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent
Memory Model.  The test results shows that the pmbench score can
improve up to 95.9%.

Thanks Andrew Morton to help fix the document format error.

Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: zhongjiang-ali &lt;zhongjiang-ali@linux.alibaba.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-03-22T21:39:12Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-22T21:39:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3fe2f7446f1e029b220f7f650df6d138f91651f2'/>
<id>urn:sha1:3fe2f7446f1e029b220f7f650df6d138f91651f2</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - Cleanups for SCHED_DEADLINE

 - Tracing updates/fixes

 - CPU Accounting fixes

 - First wave of changes to optimize the overhead of the scheduler
   build, from the fast-headers tree - including placeholder *_api.h
   headers for later header split-ups.

 - Preempt-dynamic using static_branch() for ARM64

 - Isolation housekeeping mask rework; preperatory for further changes

 - NUMA-balancing: deal with CPU-less nodes

 - NUMA-balancing: tune systems that have multiple LLC cache domains per
   node (eg. AMD)

 - Updates to RSEQ UAPI in preparation for glibc usage

 - Lots of RSEQ/selftests, for same

 - Add Suren as PSI co-maintainer

* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
  sched/headers: ARM needs asm/paravirt_api_clock.h too
  sched/numa: Fix boot crash on arm64 systems
  headers/prep: Fix header to build standalone: &lt;linux/psi.h&gt;
  sched/headers: Only include &lt;linux/entry-common.h&gt; when CONFIG_GENERIC_ENTRY=y
  cgroup: Fix suspicious rcu_dereference_check() usage warning
  sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
  sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
  sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
  sched/deadline,rt: Remove unused functions for !CONFIG_SMP
  sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
  sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
  sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
  sched/deadline: Remove unused def_dl_bandwidth
  sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
  sched/tracing: Don't re-read p-&gt;state when emitting sched_switch event
  sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
  sched/cpuacct: Remove redundant RCU read lock
  sched/cpuacct: Optimize away RCU read lock
  sched/cpuacct: Fix charge percpu cpuusage
  sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
  ...
</content>
</entry>
<entry>
<title>Merge tag 'core-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-03-21T19:37:33Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-21T19:37:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bba90e096468a3185649e9ab75873722ae4d6f96'/>
<id>urn:sha1:bba90e096468a3185649e9ab75873722ae4d6f96</id>
<content type='text'>
Pull core process handling RT latency updates from Thomas Gleixner:

 - Reduce the amount of work to release a task stack in context switch.
   There is no real reason to do cgroup accounting and memory freeing in
   this performance sensitive context.

   Aside of this the invoked functions cannot be called from this
   preemption disabled context on PREEMPT_RT enabled kernels. Solve this
   by moving the accounting into do_exit() and delaying the freeing of
   the stack unless the vmap stack can be cached.

 - Provide a mechanism to delay raising signals from atomic context on
   PREEMPT_RT enabled kernels as sighand::lock cannot be acquired. Store
   the information in the task struct and raise it in the exit path.

* tag 'core-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  signal, x86: Delay calling signals in atomic on RT enabled kernels
  fork: Use IS_ENABLED() in account_kernel_stack()
  fork: Only cache the VMAP stack in finish_task_switch()
  fork: Move task stack accounting to do_exit()
  fork: Move memcg_charge_kernel_stack() into CONFIG_VMAP_STACK
  fork: Don't assign the stack pointer in dup_task_struct()
  fork, IA64: Provide alloc_thread_stack_node() for IA64
  fork: Duplicate task_struct before stack allocation
  fork: Redo ifdefs around task stack handling
</content>
</entry>
<entry>
<title>Merge tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-03-21T19:28:13Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-21T19:28:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3fd33273a4675e819544ccbbf8d769e14672666b'/>
<id>urn:sha1:3fd33273a4675e819544ccbbf8d769e14672666b</id>
<content type='text'>
Pull x86 PASID support from Thomas Gleixner:
 "Reenable ENQCMD/PASID support:

   - Simplify the PASID handling to allocate the PASID once, associate
     it to the mm of a process and free it on mm_exit().

     The previous attempt of refcounted PASIDs and dynamic
     alloc()/free() turned out to be error prone and too complex. The
     PASID space is 20bits, so the case of resource exhaustion is a pure
     academic concern.

   - Populate the PASID MSR on demand via #GP to avoid racy updates via
     IPIs.

   - Reenable ENQCMD and let objtool check for the forbidden usage of
     ENQCMD in the kernel.

   - Update the documentation for Shared Virtual Addressing accordingly"

* tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Update documentation for SVA (Shared Virtual Addressing)
  tools/objtool: Check for use of the ENQCMD instruction in the kernel
  x86/cpufeatures: Re-enable ENQCMD
  x86/traps: Demand-populate PASID MSR via #GP
  sched: Define and initialize a flag to identify valid PASID in the task
  x86/fpu: Clear PASID when copying fpstate
  iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
  kernel/fork: Initialize mm's PASID
  iommu/ioasid: Introduce a helper to check for valid PASIDs
  mm: Change CONFIG option for mm-&gt;pasid field
  iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
</content>
</entry>
</feed>
