user/sven/linux.git/include/linux/bitops.h, branch v4.11.1

mm/vmalloc.c: fix align value calculation error

2016-10-08T01:46:26Z

It causes double align requirement for __get_vm_area_node() if parameter size is power of 2 and VM_IOREMAP is set in parameter flags, for example size=0x10000 -> fls_long(0x10000)=17 -> align=0x20000 get_count_order_long() is implemented and can be used instead of fls_long() for fixing the bug, for example size=0x10000 -> get_count_order_long(0x10000)=16 -> align=0x10000 [akpm@linux-foundation.org: s/get_order_long()/get_count_order_long()/] [zijun_hu@zoho.com: fixes] Link: http://lkml.kernel.org/r/57AABC8B.1040409@zoho.com [akpm@linux-foundation.org: locate get_count_order_long() next to get_count_order()] [akpm@linux-foundation.org: move get_count_order[_long] definitions to pick up fls_long()] [zijun_hu@htc.com: move out get_count_order[_long]() from __KERNEL__ scope] Link: http://lkml.kernel.org/r/57B2C4CE.80303@zoho.com Link: http://lkml.kernel.org/r/fc045ecf-20fa-0722-b3ac-9a6140488fad@zoho.com Signed-off-by: zijun_hu Cc: Tejun Heo Cc: Johannes Weiner Cc: Minchan Kim Cc: David Rientjes Signed-off-by: zijun_hu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

md: set MD_CHANGE_PENDING in a atomic region

2016-05-09T16:24:02Z

Some code waits for a metadata update by: 1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN) 2. setting MD_CHANGE_PENDING and waking the management thread 3. waiting for MD_CHANGE_PENDING to be cleared If the first two are done without locking, the code in md_update_sb() which checks if it needs to repeat might test if an update is needed before step 1, then clear MD_CHANGE_PENDING after step 2, resulting in the wait returning early. So make sure all places that set MD_CHANGE_PENDING are atomicial, and bit_clear_unless (suggested by Neil) is introduced for the purpose. Cc: Martin Kepplinger Cc: Andrew Morton Cc: Denys Vlasenko Cc: Sasha Levin Cc: Reviewed-by: NeilBrown Signed-off-by: Guoqing Jiang Signed-off-by: Shaohua Li

bitops.h: correctly handle rol32 with 0 byte shift

2015-12-09T18:35:16Z

ROL on a 32 bit integer with a shift of 32 or more is undefined and the result is arch-dependent. Avoid this by handling the trivial case of roling by 0 correctly. The trivial solution of checking if shift is 0 breaks gcc's detection of this code as a ROL instruction, which is unacceptable. This bug was reported and fixed in GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157): The standard rotate idiom, (x << n) | (x >> (32 - n)) is recognized by gcc (for concreteness, I discuss only the case that x is an uint32_t here). However, this is portable C only for n in the range 0 < n < 32. For n == 0, we get x >> 32 which gives undefined behaviour according to the C standard (6.5.7, Bitwise shift operators). To portably support n == 0, one has to write the rotate as something like (x << n) | (x >> ((-n) & 31)) And this is apparently not recognized by gcc. Note that this is broken on older GCCs and will result in slower ROL. Acked-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Linus Torvalds

bitops.h: add sign_extend64()

2015-11-07T01:50:42Z

Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289 The result was the 64-bit version being "likely fine", "valuable" and "correct". The discussion fell asleep but since there are possible users, let's add it. Signed-off-by: Martin Kepplinger Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: "H. Peter Anvin" Cc: George Spelvin Cc: Rasmus Villemoes Cc: Maxime Coquelin Cc: Denys Vlasenko Cc: Yury Norov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

bitops.h: improve sign_extend32()'s documentation

2015-11-07T01:50:42Z

It is often overlooked that sign_extend32(), despite its name, is safe to use for 16 and 8 bit types as well. This should help prevent sign extension being done manually some other way. Signed-off-by: Martin Kepplinger Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: "H. Peter Anvin" Cc: George Spelvin Cc: Rasmus Villemoes Cc: Maxime Coquelin Cc: Denys Vlasenko Cc: Yury Norov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

linux/bitmap: Force inlining of bitmap weight functions

2015-08-05T07:38:08Z

With this config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os gcc-4.7.2 generates many copies of these tiny functions: bitmap_weight (55 copies): 55 push %rbp 48 89 e5 mov %rsp,%rbp e8 3f 3a 8b 00 callq __bitmap_weight 5d pop %rbp c3 retq hweight_long (23 copies): 55 push %rbp e8 b5 65 8e 00 callq __sw_hweight64 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 This patch fixes this via s/inline/__always_inline/ While at it, replaced two "__inline__" with usual "inline" (the rest of the source file uses the latter). text data bss dec filename 86971357 17195880 36659200 140826437 vmlinux.before 86971120 17195912 36659200 140826232 vmlinux Signed-off-by: Denys Vlasenko Cc: Andrew Morton Cc: David Rientjes Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Thomas Graf Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1438697716-28121-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar

lib: find_*_bit reimplementation

2015-04-17T13:03:53Z

This patchset does rework to find_bit function family to achieve better performance, and decrease size of text. All rework is done in patch 1. Patches 2 and 3 are about code moving and renaming. It was boot-tested on x86_64 and MIPS (big-endian) machines. Performance tests were ran on userspace with code like this: /* addr[] is filled from /dev/urandom */ start = clock(); while (ret < nbits) ret = find_next_bit(addr, nbits, ret + 1); end = clock(); printf("%ld\t", (unsigned long) end - start); On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for find_next_bit, nbits is 8M, for find_first_bit - 80K) find_next_bit: find_first_bit: new current new current 26932 43151 14777 14925 26947 43182 14521 15423 26507 43824 15053 14705 27329 43759 14473 14777 26895 43367 14847 15023 26990 43693 15103 15163 26775 43299 15067 15232 27282 42752 14544 15121 27504 43088 14644 14858 26761 43856 14699 15193 26692 43075 14781 14681 27137 42969 14451 15061 ... ... find_next_bit performance gain is 35-40%; find_first_bit - no measurable difference. On ARM machine, there is arch-specific implementation for find_bit. Thanks a lot to George Spelvin and Rasmus Villemoes for hints and helpful discussions. This patch (of 3): New implementations takes less space in source file (see diffstat) and in object. For me it's 710 vs 453 bytes of text. It also shows better performance. find_last_bit description fixed due to obvious typo. [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus] Signed-off-by: Yury Norov Reviewed-by: Rasmus Villemoes Reviewed-by: George Spelvin Cc: Alexey Klimov Cc: David S. Miller Cc: Daniel Borkmann Cc: Hannes Frederic Sowa Cc: Lai Jiangshan Cc: Mark Salter Cc: AKASHI Takahiro Cc: Thomas Graf Cc: Valentin Rothberg Cc: Chris Wilson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

bitops: Fix shift overflow in GENMASK macros

2014-11-16T08:55:39Z

On some 32 bits architectures, including x86, GENMASK(31, 0) returns 0 instead of the expected ~0UL. This is the same on some 64 bits architectures with GENMASK_ULL(63, 0). This is due to an overflow in the shift operand, 1 << 32 for GENMASK, 1 << 64 for GENMASK_ULL. Reported-by: Eric Paire Suggested-by: Rasmus Villemoes Signed-off-by: Maxime Coquelin Signed-off-by: Peter Zijlstra (Intel) Cc: # v3.13+ Cc: linux@rasmusvillemoes.dk Cc: gong.chen@linux.intel.com Cc: John Sullivan Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Theodore Ts'o Fixes: 10ef6b0dffe4 ("bitops: Introduce a more generic BITMASK macro") Link: http://lkml.kernel.org/r/1415267659-10563-1-git-send-email-maxime.coquelin@st.com Signed-off-by: Ingo Molnar

locking: Remove deprecated smp_mb__() barriers

2014-08-13T08:31:57Z

Its been a while and there are no in-tree users left, so remove the deprecated barriers. Signed-off-by: Peter Zijlstra Cc: Chen, Gong Cc: Jacob Pan Cc: Joe Perches Cc: John Sullivan Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Srinivas Pandruvada Cc: Theodore Ts'o Signed-off-by: Ingo Molnar

arch: Prepare for smp_mb__{before,after}_atomic()

2014-04-18T09:40:30Z

Since the smp_mb__{before,after}*() ops are fundamentally dependent on how an arch can implement atomics it doesn't make sense to have 3 variants of them. They must all be the same. Furthermore, the 3 variants suggest they're only valid for those 3 atomic ops, while we have many more where they could be applied. So move away from smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}() and reduce the interface to just the two: smp_mb__{before,after}_atomic(). This patch prepares the way by introducing default implementations in asm-generic/barrier.h that default to a full barrier and providing __deprecated inlines for the previous 6 barriers if they're not provided by the arch. This should allow for a mostly painless transition (lots of deprecated warns in the interim). Signed-off-by: Peter Zijlstra Acked-by: Paul E. McKenney Link: http://lkml.kernel.org/n/tip-wr59327qdyi9mbzn6x937s4e@git.kernel.org Cc: Arnd Bergmann Cc: "Chen, Gong" Cc: John Sullivan Cc: Linus Torvalds Cc: Mauro Carvalho Chehab Cc: Srinivas Pandruvada Cc: "Theodore Ts'o" Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar