user/sven/linux.git/lib/vsprintf.c, branch leds/master

lib/vsprintf.c: improve put_dec_trunc8 slightly

2015-04-17T13:03:55Z

I hadn't had enough coffee when I wrote this. Currently, the final increment of buf depends on the value loaded from the table, and causes gcc to emit a cmov immediately before the return. It is smarter to let it depend on r, since the increment can then be computed in parallel with the final load/store pair. It also shaves 16 bytes of .text. Signed-off-by: Rasmus Villemoes Cc: Tejun Heo Cc: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf.c: even faster binary to decimal conversion

2015-04-17T13:03:54Z

The most expensive part of decimal conversion is the divisions by 10 (albeit done using reciprocal multiplication with appropriately chosen constants). I decided to see if one could eliminate around half of these multiplications by emitting two digits at a time, at the cost of a 200 byte lookup table, and it does indeed seem like there is something to be gained, especially on 64 bits. Microbenchmarking shows improvements ranging from -50% (for numbers uniformly distributed in [0, 2^64-1]) to -25% (for numbers heavily biased toward the smaller end, a more realistic distribution). On a larger scale, perf shows that top, one of the big consumers of /proc data, uses 0.5-1.0% fewer cpu cycles. I had to jump through some hoops to get the 32 bit code to compile and run on my 64 bit machine, so I'm not sure how relevant these numbers are, but just for comparison the microbenchmark showed improvements between -30% and -10%. The bloat-o-meter costs are around 150 bytes (the generated code is a little smaller, so it's not the full 200 bytes) on both 32 and 64 bit. I'm aware that extra cache misses won't show up in a microbenchmark as used above, but on the other hand decimal conversions often happen in bulk (for example in the case of top). I have of course tested that the new code generates the same output as the old, for both the first and last 1e10 numbers in [0,2^64-1] and 4e9 'random' numbers in-between. Test and verification code on github: https://github.com/Villemoes/dec. Signed-off-by: Rasmus Villemoes Tested-by: Jeff Epler Cc: "Peter Zijlstra (Intel)" Cc: Tejun Heo Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/string_helpers.c: change semantics of string_escape_mem

2015-04-15T23:35:24Z

The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by: Rasmus Villemoes Acked-by: Andy Shevchenko Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf.c: fix potential NULL deref in hex_string

2015-04-15T23:35:23Z

The helper hex_string() is broken in two ways. First, it doesn't increment buf regardless of whether there is room to print, so callers such as kasprintf() that try to probe the correct storage to allocate will get a too small return value. But even worse, kasprintf() (and likely anyone else trying to find the size of the result) pass NULL for buf and 0 for size, so we also have end == NULL. But this means that the end-1 in hex_string() is (char*)-1, so buf < end-1 is true and we get a NULL pointer deref. I double-checked this with a trivial kernel module that just did a kasprintf(GFP_KERNEL, "%14ph", "CrashBoomBang"). Nobody seems to be using %ph with kasprintf, but we might as well fix it before it hits someone. Signed-off-by: Rasmus Villemoes Acked-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf: add %pC{,n,r} format specifiers for clocks

2015-04-15T23:35:23Z

Add format specifiers for printing struct clk: - '%pC' or '%pCn': name (Common Clock Framework) or address (legacy clock framework) of the clock, - '%pCr': rate of the clock. [akpm@linux-foundation.org: omit code if !CONFIG_HAVE_CLK] Signed-off-by: Geert Uytterhoeven Cc: Jonathan Corbet Cc: Mike Turquette Cc: Stephen Boyd Cc: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf.c: another small hack

2015-04-15T23:35:23Z

Making ZEROPAD == '0'-' ', we can eliminate a few more instructions. Signed-off-by: Rasmus Villemoes Cc: Peter Zijlstra Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf.c: eliminate duplicate hex string array

2015-04-15T23:35:23Z

gcc doesn't merge or overlap const char[] objects with identical contents (probably language lawyers would also insist that these things have different addresses), but there's no reason to have the string "0123456789ABCDEF" occur in multiple places. hex_asc_upper is declared in kernel.h and defined in lib/hexdump.c, which is unconditionally compiled in. Signed-off-by: Rasmus Villemoes Cc: Peter Zijlstra Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf.c: reduce stack use in number()

2015-04-15T23:35:23Z

At least since the initial git commit, when base was passed as a separate parameter, number() has only been called with bases 8, 10 and 16. I'm guessing that 66 was to accommodate 64 0/1, a sign and a '\0', but the buffer is only used for the actual digits. Octal digits carry 3 bits of information, so 24 is enough. Spell that 3*sizeof(num) so one less place needs to be changed should long long ever be 128 bits. Also remove the commented-out code that would handle an arbitrary base. Signed-off-by: Rasmus Villemoes Cc: Peter Zijlstra Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf.c: eliminate some branches

2015-04-15T23:35:23Z

Since FORMAT_TYPE_INT is simply 1 more than FORMAT_TYPE_UINT, and similarly for BYTE/UBYTE, SHORT/USHORT, LONG/ULONG, we can eliminate a few instructions by making SIGN have the value 1 instead of 2, and then use arithmetic instead of branches for computing the right spec->type. It's a little hacky, but certainly in the same spirit as SMALL needing to have the value 0x20. For example for the spec->qualifier == 'l' case, gcc now generates 75e: 0f b6 53 01 movzbl 0x1(%rbx),%edx 762: 83 e2 01 and $0x1,%edx 765: 83 c2 09 add $0x9,%edx 768: 88 13 mov %dl,(%rbx) instead of 763: 0f b6 53 01 movzbl 0x1(%rbx),%edx 767: 83 e2 02 and $0x2,%edx 76a: 80 fa 01 cmp $0x1,%dl 76d: 19 d2 sbb %edx,%edx 76f: 83 c2 0a add $0xa,%edx 772: 88 13 mov %dl,(%rbx) Signed-off-by: Rasmus Villemoes Cc: Peter Zijlstra Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib/vsprintf: implement bitmap printing through '%*pb[l]'

2015-02-14T05:21:36Z

bitmap and its derivatives such as cpumask and nodemask currently only provide formatting functions which put the output string into the provided buffer; however, how long this buffer should be isn't defined anywhere and given that some of these bitmaps can be too large to be formatted into an on-stack buffer it users sometimes are unnecessarily forced to come up with creative solutions and compromises for the buffer just to printk these bitmaps. There have been a couple different attempts at making this easier. 1. Way back, PeterZ tried printk '%pb' extension with the precision for bit width - '%.*pb'. This was intuitive and made sense but unfortunately triggered a compile warning about using precision for a pointer. http://lkml.kernel.org/g/1336577562.2527.58.camel@twins 2. I implemented bitmap_pr_cont[_list]() and its wrappers for cpumask and nodemask. This works but PeterZ pointed out that pr_cont's tendency to produce broken lines when multiple CPUs are printing is bothering considering the usages. http://lkml.kernel.org/g/1418226774-30215-3-git-send-email-tj@kernel.org So, this patch is another attempt at teaching printk and friends how to print bitmaps. It's almost identical to what PeterZ tried with precision but it uses the field width for the number of bits instead of precision. The format used is '%*pb[l]', with the optional trailing 'l' specifying list format instead of hex masks. This is a valid format string and doesn't trigger compiler warnings; however, it does make it impossible to specify output field width when printing bitmaps. I think this is an acceptable trade-off given how much easier it makes printing bitmaps and that we don't have any in-kernel user which is using the field width specification. If any future user wants to use field width with a bitmap, it'd have to format the bitmap into a string buffer and then print that buffer with width spec, which isn't different from how it should be done now. This patch implements bitmap[_list]_string() which are called from the vsprintf pointer() formatting function. The implementation is mostly identical to bitmap_scn[list]printf() except that the output is performed in the vsprintf way. These functions handle formatting into too small buffers and sprintf() family of functions report the correct overrun output length. bitmap_scn[list]printf() are now thin wrappers around scnprintf(). Signed-off-by: Tejun Heo Acked-by: Peter Zijlstra (Intel) Cc: "David S. Miller" Cc: "James E.J. Bottomley" Cc: "John W. Linville" Cc: "Paul E. McKenney" Cc: Benjamin Herrenschmidt Cc: Chris Metcalf Cc: Chris Zankel Cc: Christoph Lameter Cc: Dmitry Torokhov Cc: Fenghua Yu Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: Li Zefan Cc: Max Filippov Cc: Mike Travis Cc: Pekka Enberg Cc: Russell King Cc: Rusty Russell Cc: Steffen Klassert Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Tony Luck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds