<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/bitmap.h, branch v4.13.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.13.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.13.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-07-10T23:32:34Z</updated>
<entry>
<title>bitmap: use memcmp optimisation in more situations</title>
<updated>2017-07-10T23:32:34Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-07-10T22:51:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2c6deb01525ac11cc03c44fe31e3f45ce2cadaf9'/>
<id>urn:sha1:2c6deb01525ac11cc03c44fe31e3f45ce2cadaf9</id>
<content type='text'>
Commit 7dd968163f7c ("bitmap: bitmap_equal memcmp optimization") was
rather more restrictive than necessary; we can use memcmp() to implement
bitmap_equal() as long as the number of bits can be proved to be a
multiple of 8.  And architectures other than s390 may be able to make
good use of this optimisation.

[arnd@arndb.de: fix build: add a memcmp() declaration]
  Link: http://lkml.kernel.org/r/20170630153908.3439707-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628153221.11322-5-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>include/linux/bitmap.h: turn bitmap_set and bitmap_clear into memset when possible</title>
<updated>2017-07-10T23:32:34Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-07-10T22:51:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2a98dc028f911a7c59c87d11d4eed6626be1605b'/>
<id>urn:sha1:2a98dc028f911a7c59c87d11d4eed6626be1605b</id>
<content type='text'>
Several callers have constant 'start' and an 'nbits' that is a multiple
of 8, so we can turn them into calls to memset.  We don't need the
entirety of 'start' and 'nbits' to be constant, we just need to know
whether they're divisible by 8.

Link: http://lkml.kernel.org/r/20170628153221.11322-4-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitmap: optimise bitmap_set and bitmap_clear of a single bit</title>
<updated>2017-07-10T23:32:34Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-07-10T22:51:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e5af323c9badd5dc09af7ccf9d45616ebffc623c'/>
<id>urn:sha1:e5af323c9badd5dc09af7ccf9d45616ebffc623c</id>
<content type='text'>
We have eight users calling bitmap_clear for a single bit and seventeen
calling bitmap_set for a single bit.  Rather than fix all of them to
call __clear_bit or __set_bit, turn bitmap_clear and bitmap_set into
inline functions and make this special case efficient.

Link: http://lkml.kernel.org/r/20170628153221.11322-3-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitmap.h, perf/core: Fix the mask in perf_output_sample_regs()</title>
<updated>2016-08-18T08:44:20Z</updated>
<author>
<name>Madhavan Srinivasan</name>
<email>maddy@linux.vnet.ibm.com</email>
</author>
<published>2016-08-17T09:36:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=29dd3288705f26cc27663e79061209dabce2d5b9'/>
<id>urn:sha1:29dd3288705f26cc27663e79061209dabce2d5b9</id>
<content type='text'>
When decoding the perf_regs mask in perf_output_sample_regs(),
we loop through the mask using find_first_bit and find_next_bit functions.

While the exisiting code works fine in most of the case, the logic
is broken for big-endian 32-bit kernels.

When reading a u64 mask using (u32 *)(&amp;val)[0], find_*_bit() assumes
that it gets the lower 32 bits of u64, but instead it gets the upper
32 bits - which is wrong.

The fix is to swap the words of the u64 to handle this case.
This is _not_ a regular endianness swap.

Suggested-by: Yury Norov &lt;ynorov@caviumnetworks.com&gt;
Signed-off-by: Madhavan Srinivasan &lt;maddy@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Yury Norov &lt;ynorov@caviumnetworks.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1471426568-31051-2-git-send-email-maddy@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>include/linux/bitmap.h: cleanup</title>
<updated>2016-08-04T12:50:07Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2016-08-03T20:45:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4b9d314ce9168b37a2824e6d7820aa6e66f52642'/>
<id>urn:sha1:4b9d314ce9168b37a2824e6d7820aa6e66f52642</id>
<content type='text'>
Remove two unneeded `else's.

Cc: David Hildenbrand &lt;dahi@linux.vnet.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitmap: bitmap_equal memcmp optimization</title>
<updated>2016-06-13T13:58:21Z</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2016-05-25T07:32:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7dd968163f7c12bcb2132792bf873133b397a2d2'/>
<id>urn:sha1:7dd968163f7c12bcb2132792bf873133b397a2d2</id>
<content type='text'>
The bitmap_equal function has optimized code for small bitmaps with less
than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
__bitmap_equal is called.

For a constant number of bits divisible by BITS_PER_LONG the memcmp
function can be used. For s390 gcc knows how to optimize this function,
memcmp calls with up to 256 bytes / 2048 bits are translated into a
single instruction.

Reviewed-by: David Hildenbrand &lt;dahi@linux.vnet.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>lib/bitmap.c: conversion routines to/from u32 array</title>
<updated>2016-02-20T03:54:09Z</updated>
<author>
<name>David Decotigny</name>
<email>decot@googlers.com</email>
</author>
<published>2016-02-19T14:23:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e52bc7c28ac9f54db6f86b19ed65c599def18c98'/>
<id>urn:sha1:e52bc7c28ac9f54db6f86b19ed65c599def18c98</id>
<content type='text'>
Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic
way.

Tested:
  unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE,
  ARM.

Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Reviewed-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>linux/bitmap: Force inlining of bitmap weight functions</title>
<updated>2015-08-05T07:38:08Z</updated>
<author>
<name>Denys Vlasenko</name>
<email>dvlasenk@redhat.com</email>
</author>
<published>2015-08-04T14:15:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1a1d48a4a8fde49aedc045d894efe67173d59fe0'/>
<id>urn:sha1:1a1d48a4a8fde49aedc045d894efe67173d59fe0</id>
<content type='text'>
With this config:

  http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os

gcc-4.7.2 generates many copies of these tiny functions:

	bitmap_weight (55 copies):
	55                      push   %rbp
	48 89 e5                mov    %rsp,%rbp
	e8 3f 3a 8b 00          callq  __bitmap_weight
	5d                      pop    %rbp
	c3                      retq

	hweight_long (23 copies):
	55                      push   %rbp
	e8 b5 65 8e 00          callq  __sw_hweight64
	48 89 e5                mov    %rsp,%rbp
	5d                      pop    %rbp
	c3                      retq

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

This patch fixes this via s/inline/__always_inline/

While at it, replaced two "__inline__" with usual "inline"
(the rest of the source file uses the latter).

	    text     data      bss       dec  filename
	86971357 17195880 36659200 140826437  vmlinux.before
	86971120 17195912 36659200 140826232  vmlinux

Signed-off-by: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1438697716-28121-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/bitmap.c: bitmap_[empty,full]: remove code duplication</title>
<updated>2015-04-17T13:03:56Z</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2015-04-16T19:44:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2afe27c718b669b551895595873611ac39cc31e3'/>
<id>urn:sha1:2afe27c718b669b551895595873611ac39cc31e3</id>
<content type='text'>
bitmap_empty() has its own implementation.  But it's clearly as simple as:

	find_first_bit(src, nbits) == nbits

The same is true for 'bitmap_full'.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Cc: George Spelvin &lt;linux@horizon.com&gt;
Cc: Alexey Klimov &lt;klimov.linux@gmail.com&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK</title>
<updated>2015-04-15T23:35:24Z</updated>
<author>
<name>Rasmus Villemoes</name>
<email>linux@rasmusvillemoes.dk</email>
</author>
<published>2015-04-15T23:17:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=89c1e79eb302349fcaf0697bc9116a4ff16bfeb0'/>
<id>urn:sha1:89c1e79eb302349fcaf0697bc9116a4ff16bfeb0</id>
<content type='text'>
The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional,
which will generally lead to slightly better generated code (221 bytes
saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL).  As a small
bonus, this also ensures that the nbits parameter is expanded exactly
once.

In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed
to assume it is positive (or divisible by BITS_PER_LONG), and hence just
do the simple mask.  It doesn't seem to use this, and even on an
architecture like x86 where the shift only depends on the lower 5 or 6
bits, and these bits are not affected by the signedness of the expression,
gcc still generates code to compute the C99 mandated value of start %
BITS_PER_LONG.  So just use a mask explicitly, also for consistency with
BITMAP_LAST_WORD_MASK.

Signed-off-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: George Spelvin &lt;linux@horizon.com&gt;
Cc: Yury Norov &lt;yury.norov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
