user/sven/linux.git/include/linux/compiler.h, branch v3.12.70

compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()

2016-10-28T18:49:19Z

commit 536fa402221f09633e7c5801b327055ab716a363 upstream. CPUs without single-byte and double-byte loads and stores place some "interesting" requirements on concurrent code. For example (adapted from Peter Hurley's test code), suppose we have the following structure: struct foo { spinlock_t lock1; spinlock_t lock2; char a; /* Protected by lock1. */ char b; /* Protected by lock2. */ }; struct foo *foop; Of course, it is common (and good) practice to place data protected by different locks in separate cache lines. However, if the locks are rarely acquired (for example, only in rare error cases), and there are a great many instances of the data structure, then memory footprint can trump false-sharing concerns, so that it can be better to place them in the same cache cache line as above. But if the CPU does not support single-byte loads and stores, a store to foop->a will do a non-atomic read-modify-write operation on foop->b, which will come as a nasty surprise to someone holding foop->lock2. So we now require CPUs to support single-byte and double-byte loads and stores. Therefore, this commit adjusts the definition of __native_word() to allow these sizes to be used by smp_load_acquire() and smp_store_release(). Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Signed-off-by: Jiri Slaby

locking: Remove atomicy checks from {READ,WRITE}_ONCE

2016-04-11T14:43:45Z

commit 7bd3e239d6c6d1cad276e8f130b386df4234dcd7 upstream. The fact that volatile allows for atomic load/stores is a special case not a requirement for {READ,WRITE}_ONCE(). Their primary purpose is to force the compiler to emit load/stores _once_. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Christian Borntraeger Acked-by: Will Deacon Acked-by: Linus Torvalds Cc: Davidlohr Bueso Cc: H. Peter Anvin Cc: Paul McKenney Cc: Stephen Rothwell Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Jiri Slaby

kernel: make READ_ONCE() valid on const arguments

2016-04-11T14:43:44Z

commit dd36929720f40f17685e841ae0d4c581c165ea60 upstream. The use of READ_ONCE() causes lots of warnings witht he pending paravirt spinlock fixes, because those ends up having passing a member to a 'const' structure to READ_ONCE(). There should certainly be nothing wrong with using READ_ONCE() with a const source, but the helper function __read_once_size() would cause warnings because it would drop the 'const' qualifier, but also because the destination would be marked 'const' too due to the use of 'typeof'. Use a union of types in READ_ONCE() to avoid this issue. Also make sure to use parenthesis around the macro arguments to avoid possible operator precedence issues. Tested-by: Ingo Molnar Cc: Christian Borntraeger Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby

kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)

2016-04-11T14:43:44Z

commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c upstream. Feedback has shown that WRITE_ONCE(x, val) is easier to use than ASSIGN_ONCE(val,x). There are no in-tree users yet, so lets change it for 3.19. Signed-off-by: Christian Borntraeger Acked-by: Peter Zijlstra Acked-by: Davidlohr Bueso Acked-by: Paul E. McKenney Signed-off-by: Jiri Slaby

kernel: Provide READ_ONCE and ASSIGN_ONCE

2016-03-30T14:14:14Z

commit 230fa253df6352af12ad0a16128760b5cb3f92df upstream. ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via scalar types as suggested by Linus Torvalds. Accesses larger than the machines word size cannot be guaranteed to be atomic. These macros will use memcpy and emit a build warning. Signed-off-by: Christian Borntraeger Signed-off-by: Jiri Slaby

tracing: Fix freak link error caused by branch tracer

2016-02-24T09:23:23Z

commit b33c8ff4431a343561e2319f17c14286f2aa52e2 upstream. In my randconfig tests, I came across a bug that involves several components: * gcc-4.9 through at least 5.3 * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files * CONFIG_PROFILE_ALL_BRANCHES overriding every if() * The optimized implementation of do_div() that tries to replace a library call with an division by multiplication * code in drivers/media/dvb-frontends/zl10353.c doing u32 adc_clock = 450560; /* 45.056 MHz */ if (state->config.adc_clock) adc_clock = state->config.adc_clock; do_div(value, adc_clock); In this case, gcc fails to determine whether the divisor in do_div() is __builtin_constant_p(). In particular, it concludes that __builtin_constant_p(adc_clock) is false, while __builtin_constant_p(!!adc_clock) is true. That in turn throws off the logic in do_div() that also uses __builtin_constant_p(), and instead of picking either the constant- optimized division, and the code in ilog2() that uses __builtin_constant_p() to figure out whether it knows the answer at compile time. The result is a link error from failing to find multiple symbols that should never have been called based on the __builtin_constant_p(): dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN' dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod' ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined! ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined! This patch avoids the problem by changing __trace_if() to check whether the condition is known at compile-time to be nonzero, rather than checking whether it is actually a constant. I see this one link error in roughly one out of 1600 randconfig builds on ARM, and the patch fixes all known instances. Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de Acked-by: Nicolas Pitre Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y") Signed-off-by: Arnd Bergmann Signed-off-by: Steven Rostedt Signed-off-by: Jiri Slaby

rcu: Move lockless_dereference() out of rcupdate.h

2015-08-25T14:57:00Z

commit 0a04b0166929405cd833c1cc40f99e862b965ddc upstream. I want to use lockless_dereference() from seqlock.h, which would mean including rcupdate.h from it, however rcupdate.h already includes seqlock.h. Avoid this by moving lockless_dereference() into compiler.h. This is somewhat tricky since it uses smp_read_barrier_depends() which isn't available there, but its a CPP macro so we can get away with it. The alternative would be moving it into asm/barrier.h, but that would be updating each arch (I can do if people feel that is more appropriate). Cc: Paul McKenney Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Rusty Russell Signed-off-by: Jiri Slaby

arch: Introduce smp_load_acquire(), smp_store_release()

2015-08-25T14:56:59Z

commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream. A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" Signed-off-by: Peter Zijlstra Acked-by: Will Deacon Cc: Benjamin Herrenschmidt Cc: Frederic Weisbecker Cc: Mathieu Desnoyers Cc: Michael Ellerman Cc: Michael Neuling Cc: Russell King Cc: Geert Uytterhoeven Cc: Heiko Carstens Cc: Linus Torvalds Cc: Martin Schwidefsky Cc: Victor Kaplansky Cc: Tony Luck Cc: Oleg Nesterov Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar Signed-off-by: Jiri Slaby

crypto: more robust crypto_memneq

2014-11-13T18:02:05Z

commit fe8c8a126806fea4465c43d62a1f9d273a572bf5 upstream. [Only use the compiler.h portion of this patch, to get the OPTIMIZER_HIDE_VAR() macro, which we need for other -stable patches - gregkh] Disabling compiler optimizations can be fragile, since a new optimization could be added to -O0 or -Os that breaks the assumptions the code is making. Instead of disabling compiler optimizations, use a dummy inline assembly (based on RELOC_HIDE) to block the problematic kinds of optimization, while still allowing other optimizations to be applied to the code. The dummy inline assembly is added after every OR, and has the accumulator variable as its input and output. The compiler is forced to assume that the dummy inline assembly could both depend on the accumulator variable and change the accumulator variable, so it is forced to compute the value correctly before the inline assembly, and cannot assume anything about its value after the inline assembly. This change should be enough to make crypto_memneq work correctly (with data-independent timing) even if it is inlined at its call sites. That can be done later in a followup patch. Compile-tested on x86_64. Signed-off-by: Cesar Eduardo Barros Acked-by: Daniel Borkmann Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby

kprobes: Move __kprobes definition into compiler.h

2013-04-08T15:28:34Z

Currently, __kprobes is defined in linux/kprobes.h which is too big to be included in small or basic headers that want to make use of this simple attribute. So move __kprobes definition into linux/compiler.h in which other compiler attributes are defined. Signed-off-by: Masami Hiramatsu Cc: Timo Juhani Lindfors Cc: Ananth N Mavinakayanahalli Cc: Pavel Emelyanov Cc: Jiri Kosina Cc: Nadia Yvette Chambers Cc: yrl.pp-manager.tt@hitachi.com Cc: David S. Miller Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20130404104049.21071.20908.stgit@mhiramat-M0-7522 [ Improved the attribute explanation a bit. ] Signed-off-by: Ingo Molnar