user/sven/linux.git/include/linux/find.h, branch v6.1.96

cpumask: Introduce for_each_cpu_andnot()

2022-10-06T12:57:36Z

for_each_cpu_and() is very convenient as it saves having to allocate a temporary cpumask to store the result of cpumask_and(). The same issue applies to cpumask_andnot() which doesn't actually need temporary storage for iteration purposes. Following what has been done for for_each_cpu_and(), introduce for_each_cpu_andnot(). Signed-off-by: Valentin Schneider

lib/find_bit: Introduce find_next_andnot_bit()

2022-10-06T12:57:36Z

In preparation of introducing for_each_cpu_andnot(), add a variant of find_next_bit() that negate the bits in @addr2 when ANDing them with the bits in @addr1. Signed-off-by: Valentin Schneider

lib/find: optimize for_each() macros

2022-10-01T17:22:58Z

Moving an iterator of the macros inside conditional part of for-loop helps to generate a better code. It had been first implemented in commit 7baac8b91f9871ba ("cpumask: make for_each_cpu_mask a bit smaller"). Now that cpumask for-loops are the aliases to bitmap loops, it's worth to optimize them the same way. Bloat-o-meter says: add/remove: 8/12 grow/shrink: 147/592 up/down: 4876/-24416 (-19540) Signed-off-by: Yury Norov

lib/bitmap: introduce for_each_set_bit_wrap() macro

2022-10-01T17:22:57Z

Add for_each_set_bit_wrap() macro and use it in for_each_cpu_wrap(). The new macro is based on __for_each_wrap() iterator, which is simpler and smaller than cpumask_next_wrap(). Signed-off-by: Yury Norov

lib/find_bit: add find_next{,_and}_bit_wrap

2022-10-01T17:22:57Z

The helper is better optimized for the worst case: in case of empty cpumask, current code traverses 2 * size: next = cpumask_next_and(prev, src1p, src2p); if (next >= nr_cpu_ids) next = cpumask_first_and(src1p, src2p); At bitmap level we can stop earlier after checking 'size + offset' bits. Signed-off-by: Yury Norov

cpumask: switch for_each_cpu{,_not} to use for_each_bit()

2022-10-01T17:22:57Z

The difference between for_each_cpu() and for_each_set_bit() is that the latter uses cpumask_next() instead of find_next_bit(), and so calls cpumask_check(). This check is useless because the iterator value is not provided by user. It generates false-positives for the very last iteration of for_each_cpu(). Signed-off-by: Yury Norov

lib: add find_nth{,_and,_andnot}_bit()

2022-09-26T19:19:12Z

Kernel lacks for a function that searches for Nth bit in a bitmap. Usually people do it like this: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; We can do it more efficiently, if we: 1. find a word containing Nth bit, using hweight(); and 2. find the bit, using a helper fns(), that works similarly to __ffs() and ffz(). fns() is implemented as a simple loop. For x86_64, there's PDEP instruction to do that: ret = clz(pdep(1 << idx, num)). However, for large bitmaps the most of improvement comes from using hweight(), so I kept fns() simple. New find_nth_bit() is ~70 times faster on x86_64/kvm in find_bit benchmark: find_nth_bit: 7154190 ns, 16411 iterations for_each_bit: 505493126 ns, 16315 iterations With all that, a family of 3 new functions is added, and used where appropriate in the following patches. Signed-off-by: Yury Norov

lib/find_bit: optimize find_next_bit() functions

2022-09-21T19:21:32Z

Over the past couple years, the function _find_next_bit() was extended with parameters that modify its behavior to implement and- zero- and le- flavors. The parameters are passed at compile time, but current design prevents a compiler from optimizing out the conditionals. As find_next_bit() API grows, I expect that more parameters will be added. Current design would require more conditional code in _find_next_bit(), which would bloat the helper even more and make it barely readable. This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds a set of wrappers, so that the compile-time optimizations become possible. The common logic is moved to the new macro, and all flavors may be generated by providing a FETCH macro parameter, like in this example: #define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ... find_next_xornot_and_bit(addr1, addr2, addr3, size, start) { return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx], /* nop */, size, start); } The FETCH may be of any complexity, as soon as it only refers the bitmap(s) and an iterator idx. MUNGE is here to support _le code generation for BE builds. May be empty. I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top of 6.0-rc2 + this series. The results for kvm/x86_64 are: v6.0-rc2 Optimized Difference Z-score Random dense bitmap ns ns ns % find_next_bit: 787735 670546 117189 14.9 3.97 find_next_zero_bit: 777492 664208 113284 14.6 10.51 find_last_bit: 830925 687573 143352 17.3 2.35 find_first_bit: 3874366 3306635 567731 14.7 1.84 find_first_and_bit: 40677125 37739887 2937238 7.2 1.36 find_next_and_bit: 347865 304456 43409 12.5 1.35 Random sparse bitmap find_next_bit: 19816 14021 5795 29.2 6.10 find_next_zero_bit: 1318901 1223794 95107 7.2 1.41 find_last_bit: 14573 13514 1059 7.3 6.92 find_first_bit: 1313321 1249024 64297 4.9 1.53 find_first_and_bit: 8921 8098 823 9.2 4.56 find_next_and_bit: 9796 7176 2620 26.7 5.39 Where the statistics is significant (z-score > 3), the improvement is ~15%. According to the bloat-o-meter, the Image size is 10-11K less: x86_64/defconfig: add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177) arm64/defconfig: add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948) Suggested-by: Linus Torvalds Signed-off-by: Yury Norov

lib/find_bit: create find_first_zero_bit_le()

2022-09-21T19:17:18Z

find_first_zero_bit_le() is an alias to find_next_zero_bit_le(), despite that 'next' is known to be slower than 'first' version. Now that we have common FIND_FIRST_BIT() macro helper, it's trivial to implement find_first_zero_bit_le() as a real function. Reviewed-by: Valentin Schneider Signed-off-by: Yury Norov

include/linux/find: Fix documentation

2022-06-03T13:52:57Z

The order of the arguments in function documentation doesn't fit the implementation. Change the documentation so that it corresponds to the code. This prevent people to get confused when reading the documentation. Signed-off-by: Anna-Maria Behnsen Reviewed-by: Andy Shevchenko Signed-off-by: Yury Norov