user/sven/linux.git, branch v3.10.31

Linux 3.10.31

2014-02-20T19:06:19Z

mm: fix process accidentally killed by mce because of huge page migration

2014-02-20T19:06:12Z

Based on c8721bbbdd36382de51cd6b7a56322e0acca2414 upstream, but only the bugfix portion pulled out. Hi Naoya or Greg, We found a bug in 3.10.x. The problem is that we accidentally have a hwpoisoned hugepage in free hugepage list. It could happend in the the following scenario: process A process B migrate_huge_page put_page (old hugepage) linked to free hugepage list hugetlb_fault hugetlb_no_page alloc_huge_page dequeue_huge_page_vma dequeue_huge_page_node (steal hwpoisoned hugepage) set_page_hwpoison_huge_page dequeue_hwpoisoned_huge_page (fail to dequeue) I tested this bug, one process keeps allocating huge page, and I use sysfs interface to soft offline a huge page, then received: "MCE: Killing UCP:2717 due to hardware memory corruption fault at 8200034" Upstream kernel is free from this bug because of these two commits: f15bdfa802bfa5eb6b4b5a241b97ec9fa1204a35 mm/memory-failure.c: fix memory leak in successful soft offlining c8721bbbdd36382de51cd6b7a56322e0acca2414 mm: memory-hotplug: enable memory hotplug to handle hugepage The first one, although the problem is about memory leak, this patch moves unset_migratetype_isolate(), which is important to avoid the race. The latter is not a bug fix and it's too big, so I rewrite a small one. The following patch can fix this bug.(please apply f15bdfa802bf first) Signed-off-by: Xishi Qiu Reviewed-by: Naoya Horiguchi Signed-off-by: Greg Kroah-Hartman

IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()

2014-02-20T19:06:12Z

commit 603e7729920e42b3c2f4dbfab9eef4878cb6e8fa upstream. qib_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function qib_user_sdma_queue_pkts() (and also qib_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Reviewed-by: Mike Marciniszyn Signed-off-by: Jan Kara Signed-off-by: Roland Dreier [Backported to 3.10: (Thanks to Ben Huthings) - Adjust context - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()] Signed-off-by: Ben Hutchings Signed-off-by: Mike Marciniszyn Signed-off-by: Greg Kroah-Hartman

mm/memory-failure.c: fix memory leak in successful soft offlining

2014-02-20T19:06:12Z

commit f15bdfa802bfa5eb6b4b5a241b97ec9fa1204a35 upstream. After a successful page migration by soft offlining, the source page is not properly freed and it's never reusable even if we unpoison it afterward. This is caused by the race between freeing page and setting PG_hwpoison. In successful soft offlining, the source page is put (and the refcount becomes 0) by putback_lru_page() in unmap_and_move(), where it's linked to pagevec and actual freeing back to buddy is delayed. So if PG_hwpoison is set for the page before freeing, the freeing does not functions as expected (in such case freeing aborts in free_pages_prepare() check.) This patch tries to make sure to free the source page before setting PG_hwpoison on it. To avoid reallocating, the page keeps MIGRATE_ISOLATE until after setting PG_hwpoison. This patch also removes obsolete comments about "keeping elevated refcount" because what they say is not true. Unlike memory_failure(), soft_offline_page() uses no special page isolation code, and the soft-offlined pages have no elevated. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Xishi Qiu Signed-off-by: Greg Kroah-Hartman

pinctrl: protect pinctrl_list add

2014-02-20T19:06:11Z

commit 7b320cb1ed2dbd2c5f2a778197baf76fd6bf545a upstream. We have few fedora bug reports about list corruption on pinctrl, for example: https://bugzilla.redhat.com/show_bug.cgi?id=1051918 Most likely corruption happen due lack of protection of pinctrl_list when adding new nodes to it. Patch corrects that. Fixes: 42fed7ba44e ("pinctrl: move subsystem mutex to pinctrl_dev struct") Signed-off-by: Stanislaw Gruszka Acked-by: Stephen Warren Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman

pinctrl: vt8500: Change devicetree data parsing

2014-02-20T19:06:11Z

commit f17248ed868767567298e1cdf06faf8159a81f7c upstream. Due to an assumption in the VT8500 pinctrl driver, the value passed from devicetree for 'wm,pull' was not explicitly translated before being passed to pinconf. Since v3.10, changes to 'enum pin_config_param', PIN_CONFIG_BIAS_PULL_(UP/DOWN) no longer map 1-to-1 with the expected values in devicetree. This patch adds a small translation between the devicetree values (0..2) and the enum pin_config_param equivalent values. Signed-off-by: Tony Prisk Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman

x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y

2014-02-20T19:06:11Z

commit 6583327c4dd55acbbf2a6f25e775b28b3abf9a42 upstream. Commit d61931d89b, "x86: Add optimized popcnt variants" introduced compile flag -fcall-saved-rdi for lib/hweight.c. When combined with options -fprofile-arcs and -O2, this flag causes gcc to generate broken constructor code. As a result, a 64 bit x86 kernel compiled with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create file" and runs into sproadic BUGs during boot. The gcc people indicate that these kinds of problems are endemic when using ad hoc calling conventions. It is therefore best to treat any file compiled with ad hoc calling conventions as an isolated environment and avoid things like profiling or coverage analysis, since those subsystems assume a "normal" calling conventions. This patch avoids the bug by excluding lib/hweight.o from coverage profiling. Reported-by: Meelis Roos Cc: Andrew Morton Signed-off-by: Peter Oberparleiter Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman

mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset

2014-02-20T19:06:11Z

commit 13e1b87c986100169b0695aeb26970943665eda9 upstream. Fix the following build error: drivers/media/usb/dvb-usb-v2/ mxl111sf-tuner.h:72:9: error: expected ‘;’, ‘,’ or ‘)’ before ‘struct’ struct mxl111sf_tuner_config *cfg) Signed-off-by: Dave Jones Signed-off-by: Michael Krufky Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman

af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2

2014-02-20T19:06:11Z

commit f2e4c5e004691dfe37d0e4b363296f28abdb9bc7 upstream. Add USB ID [2040:f900] for Hauppauge WinTV-MiniStick 2. Device is build upon IT9135 chipset. Tested-by: Stefan Becker Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman

x86: mm: change tlb_flushall_shift for IvyBridge

2014-02-20T19:06:11Z

commit f98b7a772ab51b52ca4d2a14362fc0e0c8a2e0f3 upstream. There was a large performance regression that was bisected to commit 611ae8e3 ("x86/tlb: enable tlb flush range support for x86"). This patch simply changes the default balance point between a local and global flush for IvyBridge. In the interest of allowing the tests to be reproduced, this patch was tested using mmtests 0.15 with the following configurations configs/config-global-dhp__tlbflush-performance configs/config-global-dhp__scheduler-performance configs/config-global-dhp__network-performance Results are from two machines Ivybridge 4 threads: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz Ivybridge 8 threads: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Page fault microbenchmark showed nothing interesting. Ebizzy was configured to run multiple iterations and threads. Thread counts ranged from 1 to NR_CPUS*2. For each thread count, it ran 100 iterations and each iteration lasted 10 seconds. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 6395.44 ( 0.00%) 6789.09 ( 6.16%) Mean 2 7012.85 ( 0.00%) 8052.16 ( 14.82%) Mean 3 6403.04 ( 0.00%) 6973.74 ( 8.91%) Mean 4 6135.32 ( 0.00%) 6582.33 ( 7.29%) Mean 5 6095.69 ( 0.00%) 6526.68 ( 7.07%) Mean 6 6114.33 ( 0.00%) 6416.64 ( 4.94%) Mean 7 6085.10 ( 0.00%) 6448.51 ( 5.97%) Mean 8 6120.62 ( 0.00%) 6462.97 ( 5.59%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 7336.65 ( 0.00%) 7787.02 ( 6.14%) Mean 2 8218.41 ( 0.00%) 9484.13 ( 15.40%) Mean 3 7973.62 ( 0.00%) 8922.01 ( 11.89%) Mean 4 7798.33 ( 0.00%) 8567.03 ( 9.86%) Mean 5 7158.72 ( 0.00%) 8214.23 ( 14.74%) Mean 6 6852.27 ( 0.00%) 7952.45 ( 16.06%) Mean 7 6774.65 ( 0.00%) 7536.35 ( 11.24%) Mean 8 6510.50 ( 0.00%) 6894.05 ( 5.89%) Mean 12 6182.90 ( 0.00%) 6661.29 ( 7.74%) Mean 16 6100.09 ( 0.00%) 6608.69 ( 8.34%) Ebizzy hits the worst case scenario for TLB range flushing every time and it shows for these Ivybridge CPUs at least that the default choice is a poor on. The patch addresses the problem. Next was a tlbflush microbenchmark written by Alex Shi at http://marc.info/?l=linux-kernel&m=133727348217113 . It measures access costs while the TLB is being flushed. The expectation is that if there are always full TLB flushes that the benchmark would suffer and it benefits from range flushing There are 320 iterations of the test per thread count. The number of entries is randomly selected with a min of 1 and max of 512. To ensure a reasonably even spread of entries, the full range is broken up into 8 sections and a random number selected within that section. iteration 1, random number between 0-64 iteration 2, random number between 64-128 etc This is still a very weak methodology. When you do not know what are typical ranges, random is a reasonable choice but it can be easily argued that the opimisation was for smaller ranges and an even spread is not representative of any workload that matters. To improve this, we'd need to know the probability distribution of TLB flush range sizes for a set of workloads that are considered "common", build a synthetic trace and feed that into this benchmark. Even that is not perfect because it would not account for the time between flushes but there are limits of what can be reasonably done and still be doing something useful. If a representative synthetic trace is provided then this benchmark could be revisited and the shift values retuned. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 10.50 ( 0.00%) 10.50 ( 0.03%) Mean 2 17.59 ( 0.00%) 17.18 ( 2.34%) Mean 3 22.98 ( 0.00%) 21.74 ( 5.41%) Mean 5 47.13 ( 0.00%) 46.23 ( 1.92%) Mean 8 43.30 ( 0.00%) 42.56 ( 1.72%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 9.45 ( 0.00%) 9.36 ( 0.93%) Mean 2 9.37 ( 0.00%) 9.70 ( -3.54%) Mean 3 9.36 ( 0.00%) 9.29 ( 0.70%) Mean 5 14.49 ( 0.00%) 15.04 ( -3.75%) Mean 8 41.08 ( 0.00%) 38.73 ( 5.71%) Mean 13 32.04 ( 0.00%) 31.24 ( 2.49%) Mean 16 40.05 ( 0.00%) 39.04 ( 2.51%) For both CPUs, average access time is reduced which is good as this is the benchmark that was used to tune the shift values in the first place albeit it is now known *how* the benchmark was used. The scheduler benchmarks were somewhat inconclusive. They showed gains and losses and makes me reconsider how stable those benchmarks really are or if something else might be interfering with the test results recently. Network benchmarks were inconclusive. Almost all results were flat except for netperf-udp tests on the 4 thread machine. These results were unstable and showed large variations between reboots. It is unknown if this is a recent problems but I've noticed before that netperf-udp results tend to vary. Based on these results, changing the default for Ivybridge seems like a logical choice. Signed-off-by: Mel Gorman Tested-by: Davidlohr Bueso Reviewed-by: Alex Shi Reviewed-by: Rik van Riel Signed-off-by: Andrew Morton Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman