<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/bitops.h, branch v4.11.1</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.11.1</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.11.1'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-10-08T01:46:26Z</updated>
<entry>
<title>mm/vmalloc.c: fix align value calculation error</title>
<updated>2016-10-08T01:46:26Z</updated>
<author>
<name>zijun_hu</name>
<email>zijun_hu@htc.com</email>
</author>
<published>2016-10-07T23:57:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=252e5c6e2e5b4557599ef86ea5d02b0395e9056c'/>
<id>urn:sha1:252e5c6e2e5b4557599ef86ea5d02b0395e9056c</id>
<content type='text'>
It causes double align requirement for __get_vm_area_node() if parameter
size is power of 2 and VM_IOREMAP is set in parameter flags, for example
size=0x10000 -&gt; fls_long(0x10000)=17 -&gt; align=0x20000

get_count_order_long() is implemented and can be used instead of
fls_long() for fixing the bug, for example size=0x10000 -&gt;
get_count_order_long(0x10000)=16 -&gt; align=0x10000

[akpm@linux-foundation.org: s/get_order_long()/get_count_order_long()/]
[zijun_hu@zoho.com: fixes]
 Link: http://lkml.kernel.org/r/57AABC8B.1040409@zoho.com
[akpm@linux-foundation.org: locate get_count_order_long() next to get_count_order()]
[akpm@linux-foundation.org: move get_count_order[_long] definitions to pick up fls_long()]
[zijun_hu@htc.com: move out get_count_order[_long]() from __KERNEL__ scope]
 Link: http://lkml.kernel.org/r/57B2C4CE.80303@zoho.com
Link: http://lkml.kernel.org/r/fc045ecf-20fa-0722-b3ac-9a6140488fad@zoho.com
Signed-off-by: zijun_hu &lt;zijun_hu@htc.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: zijun_hu &lt;zijun_hu@htc.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>md: set MD_CHANGE_PENDING in a atomic region</title>
<updated>2016-05-09T16:24:02Z</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-05-04T02:22:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=85ad1d13ee9b3db00615ea24b031c15e5ba14fd1'/>
<id>urn:sha1:85ad1d13ee9b3db00615ea24b031c15e5ba14fd1</id>
<content type='text'>
Some code waits for a metadata update by:

1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
2. setting MD_CHANGE_PENDING and waking the management thread
3. waiting for MD_CHANGE_PENDING to be cleared

If the first two are done without locking, the code in md_update_sb()
which checks if it needs to repeat might test if an update is needed
before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
in the wait returning early.

So make sure all places that set MD_CHANGE_PENDING are atomicial, and
bit_clear_unless (suggested by Neil) is introduced for the purpose.

Cc: Martin Kepplinger &lt;martink@posteo.de&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: &lt;linux-kernel@vger.kernel.org&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>bitops.h: correctly handle rol32 with 0 byte shift</title>
<updated>2015-12-09T18:35:16Z</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2015-12-04T03:04:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d7e35dfa2531b53618b9e6edcd8752ce988ac555'/>
<id>urn:sha1:d7e35dfa2531b53618b9e6edcd8752ce988ac555</id>
<content type='text'>
ROL on a 32 bit integer with a shift of 32 or more is undefined and the
result is arch-dependent. Avoid this by handling the trivial case of
roling by 0 correctly.

The trivial solution of checking if shift is 0 breaks gcc's detection
of this code as a ROL instruction, which is unacceptable.

This bug was reported and fixed in GCC
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

	The standard rotate idiom,

	  (x &lt;&lt; n) | (x &gt;&gt; (32 - n))

	is recognized by gcc (for concreteness, I discuss only the case that x
	is an uint32_t here).

	However, this is portable C only for n in the range 0 &lt; n &lt; 32. For n
	== 0, we get x &gt;&gt; 32 which gives undefined behaviour according to the
	C standard (6.5.7, Bitwise shift operators). To portably support n ==
	0, one has to write the rotate as something like

	  (x &lt;&lt; n) | (x &gt;&gt; ((-n) &amp; 31))

	And this is apparently not recognized by gcc.

Note that this is broken on older GCCs and will result in slower ROL.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitops.h: add sign_extend64()</title>
<updated>2015-11-07T01:50:42Z</updated>
<author>
<name>Martin Kepplinger</name>
<email>martink@posteo.de</email>
</author>
<published>2015-11-07T00:31:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=48e203e21b29cd4b2c58403fe8bca68e2e854895'/>
<id>urn:sha1:48e203e21b29cd4b2c58403fe8bca68e2e854895</id>
<content type='text'>
Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289
The result was the 64-bit version being "likely fine", "valuable" and
"correct".  The discussion fell asleep but since there are possible users,
let's add it.

Signed-off-by: Martin Kepplinger &lt;martin.kepplinger@theobroma-systems.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: George Spelvin &lt;linux@horizon.com&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Maxime Coquelin &lt;maxime.coquelin@st.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Yury Norov &lt;yury.norov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitops.h: improve sign_extend32()'s documentation</title>
<updated>2015-11-07T01:50:42Z</updated>
<author>
<name>Martin Kepplinger</name>
<email>martink@posteo.de</email>
</author>
<published>2015-11-07T00:30:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e2eb53aa96754b97d158eff884dde88abbad925e'/>
<id>urn:sha1:e2eb53aa96754b97d158eff884dde88abbad925e</id>
<content type='text'>
It is often overlooked that sign_extend32(), despite its name, is safe to
use for 16 and 8 bit types as well.  This should help prevent sign
extension being done manually some other way.

Signed-off-by: Martin Kepplinger &lt;martin.kepplinger@theobroma-systems.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: George Spelvin &lt;linux@horizon.com&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Maxime Coquelin &lt;maxime.coquelin@st.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Yury Norov &lt;yury.norov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>linux/bitmap: Force inlining of bitmap weight functions</title>
<updated>2015-08-05T07:38:08Z</updated>
<author>
<name>Denys Vlasenko</name>
<email>dvlasenk@redhat.com</email>
</author>
<published>2015-08-04T14:15:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1a1d48a4a8fde49aedc045d894efe67173d59fe0'/>
<id>urn:sha1:1a1d48a4a8fde49aedc045d894efe67173d59fe0</id>
<content type='text'>
With this config:

  http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os

gcc-4.7.2 generates many copies of these tiny functions:

	bitmap_weight (55 copies):
	55                      push   %rbp
	48 89 e5                mov    %rsp,%rbp
	e8 3f 3a 8b 00          callq  __bitmap_weight
	5d                      pop    %rbp
	c3                      retq

	hweight_long (23 copies):
	55                      push   %rbp
	e8 b5 65 8e 00          callq  __sw_hweight64
	48 89 e5                mov    %rsp,%rbp
	5d                      pop    %rbp
	c3                      retq

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

This patch fixes this via s/inline/__always_inline/

While at it, replaced two "__inline__" with usual "inline"
(the rest of the source file uses the latter).

	    text     data      bss       dec  filename
	86971357 17195880 36659200 140826437  vmlinux.before
	86971120 17195912 36659200 140826232  vmlinux

Signed-off-by: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1438697716-28121-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib: find_*_bit reimplementation</title>
<updated>2015-04-17T13:03:53Z</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2015-04-16T19:43:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2c57a0e233d72f8c2e2404560dcf0188ac3cf5d7'/>
<id>urn:sha1:2c57a0e233d72f8c2e2404560dcf0188ac3cf5d7</id>
<content type='text'>
This patchset does rework to find_bit function family to achieve better
performance, and decrease size of text.  All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret &lt; nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.

This patch (of 3):

New implementations takes less space in source file (see diffstat) and in
object.  For me it's 710 vs 453 bytes of text.  It also shows better
performance.

find_last_bit description fixed due to obvious typo.

[akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: George Spelvin &lt;linux@horizon.com&gt;
Cc: Alexey Klimov &lt;klimov.linux@gmail.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: AKASHI Takahiro &lt;takahiro.akashi@linaro.org&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Cc: Valentin Rothberg &lt;valentinrothberg@gmail.com&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitops: Fix shift overflow in GENMASK macros</title>
<updated>2014-11-16T08:55:39Z</updated>
<author>
<name>Maxime COQUELIN</name>
<email>maxime.coquelin@st.com</email>
</author>
<published>2014-11-06T09:54:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=00b4d9a14125f1e51874def2b9de6092e007412d'/>
<id>urn:sha1:00b4d9a14125f1e51874def2b9de6092e007412d</id>
<content type='text'>
On some 32 bits architectures, including x86, GENMASK(31, 0) returns 0
instead of the expected ~0UL.

This is the same on some 64 bits architectures with GENMASK_ULL(63, 0).

This is due to an overflow in the shift operand, 1 &lt;&lt; 32 for GENMASK,
1 &lt;&lt; 64 for GENMASK_ULL.

Reported-by: Eric Paire &lt;eric.paire@st.com&gt;
Suggested-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Maxime Coquelin &lt;maxime.coquelin@st.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v3.13+
Cc: linux@rasmusvillemoes.dk
Cc: gong.chen@linux.intel.com
Cc: John Sullivan &lt;jsrhbz@kanargh.force9.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Fixes: 10ef6b0dffe4 ("bitops: Introduce a more generic BITMASK macro")
Link: http://lkml.kernel.org/r/1415267659-10563-1-git-send-email-maxime.coquelin@st.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking: Remove deprecated smp_mb__() barriers</title>
<updated>2014-08-13T08:31:57Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-08-04T10:07:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e39465abc4b7856a0ea6fcf4f6b4668bb5db877'/>
<id>urn:sha1:2e39465abc4b7856a0ea6fcf4f6b4668bb5db877</id>
<content type='text'>
Its been a while and there are no in-tree users left, so remove the
deprecated barriers.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Chen, Gong &lt;gong.chen@linux.intel.com&gt;
Cc: Jacob Pan &lt;jacob.jun.pan@linux.intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Sullivan &lt;jsrhbz@kanargh.force9.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>arch: Prepare for smp_mb__{before,after}_atomic()</title>
<updated>2014-04-18T09:40:30Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-02-06T17:16:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=febdbfe8a91ce0d11939d4940b592eb0dba8d663'/>
<id>urn:sha1:febdbfe8a91ce0d11939d4940b592eb0dba8d663</id>
<content type='text'>
Since the smp_mb__{before,after}*() ops are fundamentally dependent on
how an arch can implement atomics it doesn't make sense to have 3
variants of them. They must all be the same.

Furthermore, the 3 variants suggest they're only valid for those 3
atomic ops, while we have many more where they could be applied.

So move away from
smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}() and reduce the
interface to just the two: smp_mb__{before,after}_atomic().

This patch prepares the way by introducing default implementations in
asm-generic/barrier.h that default to a full barrier and providing
__deprecated inlines for the previous 6 barriers if they're not
provided by the arch.

This should allow for a mostly painless transition (lots of deprecated
warns in the interim).

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/n/tip-wr59327qdyi9mbzn6x937s4e@git.kernel.org
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Chen, Gong" &lt;gong.chen@linux.intel.com&gt;
Cc: John Sullivan &lt;jsrhbz@kanargh.force9.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mauro Carvalho Chehab &lt;m.chehab@samsung.com&gt;
Cc: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
