user/sven/linux.git/include/linux/cpumask.h, branch v5.13.19

Merge tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2021-04-29T18:41:43Z

Pull x86 tlb updates from Ingo Molnar: "The x86 MM changes in this cycle were: - Implement concurrent TLB flushes, which overlaps the local TLB flush with the remote TLB flush. In testing this improved sysbench performance measurably by a couple of percentage points, especially if TLB-heavy security mitigations are active. - Further micro-optimizations to improve the performance of TLB flushes" * tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Micro-optimize smp_call_function_many_cond() smp: Inline on_each_cpu_cond() and on_each_cpu() x86/mm/tlb: Remove unnecessary uses of the inline keyword cpumask: Mark functions as pure x86/mm/tlb: Do not make is_lazy dirty for no reason x86/mm/tlb: Privatize cpu_tlbstate x86/mm/tlb: Flush remote and local TLBs concurrently x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() smp: Run functions concurrently in smp_call_function_many_cond()

cpumask: Introduce DYING mask

2021-04-16T15:06:32Z

Introduce a cpumask that indicates (for each CPU) what direction the CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when an up fails and we do a roll-back down, it will accurately reflect the direction. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.org

cpumask: Make cpu_{online,possible,present,active}() inline

2021-04-16T15:06:32Z

Prepare for addition of another mask. Primarily a code movement to avoid having to create more #ifdef, but while there, convert everything with an argument to an inline function. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210310150109.045447765@infradead.org

cpumask: Mark functions as pure

2021-03-06T11:59:10Z

cpumask_next_and() and cpumask_any_but() are pure, and marking them as such seems to generate different and presumably better code for native_flush_tlb_multi(). Signed-off-by: Nadav Amit Signed-off-by: Ingo Molnar Reviewed-by: Dave Hansen Link: https://lore.kernel.org/r/20210220231712.2475218-8-namit@vmware.com

sched,rt: Use cpumask_any*_distribute()

2020-11-10T17:39:00Z

Replace a bunch of cpumask_any*() instances with cpumask_any*_distribute(), by injecting this little bit of random in cpu selection, we reduce the chance two competing balance operations working off the same lowest_mask pick the same CPU. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Valentin Schneider Reviewed-by: Daniel Bristot de Oliveira Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org

sched/core: Distribute tasks within affinity masks

2020-03-20T12:06:18Z

Currently, when updating the affinity of tasks via either cpusets.cpus, or, sched_setaffinity(); tasks not currently running within the newly specified mask will be arbitrarily assigned to the first CPU within the mask. This (particularly in the case that we are restricting masks) can result in many tasks being assigned to the first CPUs of their new masks. This: 1) Can induce scheduling delays while the load-balancer has a chance to spread them between their new CPUs. 2) Can antogonize a poor load-balancer behavior where it has a difficult time recognizing that a cross-socket imbalance has been forced by an affinity mask. This change adds a new cpumask interface to allow iterated calls to distribute within the intersection of the provided masks. The cases that this mainly affects are: - modifying cpuset.cpus - when tasks join a cpuset - when modifying a task's affinity via sched_setaffinity(2) Signed-off-by: Paul Turner Signed-off-by: Josh Don Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Qais Yousef Tested-by: Qais Yousef Link: https://lkml.kernel.org/r/20200311010113.136465-1-joshdon@google.com

include/linux/cpumask.h: don't calculate length of the input string

2020-02-04T03:05:27Z

New design of inner bitmap_parse() allows to avoid calculating the size of a null-terminated string. Link: http://lkml.kernel.org/r/20200102043031.30357-8-yury.norov@gmail.com Signed-off-by: Yury Norov Reviewed-by: Andy Shevchenko Cc: Amritha Nambiar Cc: Arnaldo Carvalho de Melo Cc: Chris Wilson Cc: Kees Cook Cc: Matthew Wilcox Cc: Miklos Szeredi Cc: Rasmus Villemoes Cc: Steffen Klassert Cc: "Tobin C . Harding" Cc: Vineet Gupta Cc: Will Deacon Cc: Willem de Bruijn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cpumask: nicer for_each_cpumask_and() signature

2019-09-26T00:51:40Z

Mask arguments can be swapped without changing anything. Make arguments names reflect that: #define for_each_cpu_and(cpu, mask1, mask2) Link: http://lkml.kernel.org/r/20190724183350.GA15041@avx2 Signed-off-by: Alexey Dobriyan Reviewed-by: Andrew Morton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cpu/hotplug: Cache number of online CPUs

2019-07-25T13:48:01Z

Re-evaluating the bitmap wheight of the online cpus bitmap in every invocation of num_online_cpus() over and over is a pretty useless exercise. Especially when num_online_cpus() is used in code paths like the IPI delivery of x86 or the membarrier code. Cache the number of online CPUs in the core and just return the cached variable. The accessor function provides only a snapshot when used without protection against concurrent CPU hotplug. The storage needs to use an atomic_t because the kexec and reboot code (ab)use set_cpu_online() in their 'shutdown' handlers without any form of serialization as pointed out by Mathieu. Regular CPU hotplug usage is properly serialized. Signed-off-by: Thomas Gleixner Reviewed-by: Mathieu Desnoyers Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091622590.1634@nanos.tec.linutronix.de

cpumask: Implement cpumask_or_equal()

2019-07-25T13:47:37Z

The IPI code of x86 needs to evaluate whether the target cpumask is equal to the cpu_online_mask or equal except for the calling CPU. To replace the current implementation which requires the usage of a temporary cpumask, which might involve allocations, add a new function which compares a cpumask to the result of two other cpumasks which are or'ed together before comparison. This allows to make the required decision in one go and the calling code then can check for the calling CPU being set in the target mask with cpumask_test_cpu(). Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190722105220.585449120@linutronix.de