user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2011-02-17	xHCI: synchronize irq in xhci_suspend()	Andiry Xu
	commit 0029227f1bc30b6c809ae751f9e7af6cef900997 upstream. Synchronize the interrupts instead of free them in xhci_suspend(). This will prevent a double free when the host is suspended and then the card removed. Set the flag hcd->msix_enabled when using MSI-X, and check the flag in suspend_common(). MSI-X synchronization will be handled by xhci_suspend(), and MSI/INTx will be synchronized in suspend_common(). This patch should be queued for the 2.6.37 stable tree. Reported-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andiry Xu <andiry.xu@amd.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	ARM: oprofile: Fix backtraces in timer mode	Ari Kauppi
	commit d14dd7e20d5e526557f5d3cfef4046a642f80924 upstream. Always allow backtraces when using oprofile on ARM, even if a PMU isn't present. Restores functionality originally introduced in commit 1b7b56982fdcd9d85effd76f3928cf5d6eb26155 ("oprofile: Always allow backtraces on ARM") by Richard Purdie. It is not that obvious, but there is now only one oprofile_arch_init() function. So the .backtrace callback is available also in timer mode. Implemented by removing code and using stubs for oprofile_perf_{init, exit} provided by <linux/oprofile.h>. This allows cleaning of other architecture specific implementations too. Signed-off-by: Ari Kauppi <kauppi@papupata.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	ieee80211: correct IEEE80211_ADDBA_PARAM_BUF_SIZE_MASK macro	Amitkumar Karwar
	commit 8d661f1e462d50bd83de87ee628aaf820ce3c66c upstream. It is defined in include/linux/ieee80211.h. As per IEEE spec. bit6 to bit15 in block ack parameter represents buffer size. So the bitmask should be 0xFFC0. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	oprofile: Fix usage of CONFIG_HW_PERF_EVENTS for oprofile_perf_init and friends	Ari Kauppi
	commit 1ea1bdf7faa4d0b5293e605f2e1ef1c2c59f6b53 upstream. The implementations are flagged in Makefile with CONFIG_HW_PERF_EVENTS instead of CONFIG_PERF_EVENTS. Signed-off-by: Ari Kauppi <kauppi@papupata.org> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent	Andy Whitcroft
	commit 8c6a98b22b750c9eb52653ba643faa17db8d3881 upstream. Currently sysrq_enabled and __sysrq_enabled are initialised separately and inconsistently, leading to sysrq being actually enabled by reported as not enabled in sysfs. The first change to the sysfs configurable synchronises these two: static int __read_mostly sysrq_enabled = 1; static int __sysrq_enabled; Add a common define to carry the default for these preventing them becoming out of sync again. Default this to 1 to mirror previous behaviour. Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	memory hotplug: one more lock on memory hotplug	KAMEZAWA Hiroyuki
	commit 925268a06dc2b1ff7bfcc37419a6827a0e739639 upstream. Now, memory_hotplug_(un)lock() is used for add/remove/offline pages for avoiding races with hibernation. But this should be held in online_pages(), too. It seems asymmetric. There are cases where one has to avoid a race with memory hotplug notifier and his own local code, and hotplug v.s. hotplug. This will add a generic solution for avoiding races. In other view, having lock here has no big impacts. online pages is tend to be done by udev script at el against each memory section one by one. Then, it's better to have lock here, too. Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	workqueue: relax lockdep annotation on flush_work()	Tejun Heo
	commit e159489baa717dbae70f9903770a6a4990865887 upstream. Currently, the lockdep annotation in flush_work() requires exclusive access on the workqueue the target work is queued on and triggers warning if a work is trying to flush another work on the same workqueue; however, this is no longer true as workqueues can now execute multiple works concurrently. This patch adds lock_map_acquire_read() and make process_one_work() hold read access to the workqueue while executing a work and start_flush_work() check for write access if concurrnecy level is one or the workqueue has a rescuer (as only one execution resource - the rescuer - is guaranteed to be available under memory pressure), and read access if higher. This better represents what's going on and removes spurious lockdep warnings which are triggered by fake dependency chain created through flush_work(). * Peter pointed out that flushing another work from a WQ_MEM_RECLAIM wq breaks forward progress guarantee under memory pressure. Condition check accordingly updated. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl> Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	sched: Change wait_for_completion_*_timeout() to return a signed long	NeilBrown
	commit 6bf4123760a5aece6e4829ce90b70b6ffd751d65 upstream. wait_for_completion_*_timeout() can return: 0: if the wait timed out -ve: if the wait was interrupted +ve: if the completion was completed. As they currently return an 'unsigned long', the last two cases are not easily distinguished which can easily result in buggy code, as is the case for the recently added wait_for_completion_interruptible_timeout() call in net/sunrpc/cache.c So change them both to return 'long'. As MAX_SCHEDULE_TIMEOUT is LONG_MAX, a large +ve return value should never overflow. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110105125016.64ccab0e@notabene.brown> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	klist: Fix object alignment on 64-bit.	David Miller
	commit 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505 upstream. Commit c0e69a5bbc6f ("klist.c: bit 0 in pointer can't be used as flag") intended to make sure that all klist objects were at least pointer size aligned, but used the constant "4" which only works on 32-bit. Use "sizeof(void *)" which is correct in all cases. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	mm: page allocator: adjust the per-cpu counter threshold when memory is low	Mel Gorman
	commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream. Commit aa45484 ("calculate a better estimate of NR_FREE_PAGES when memory is low") noted that watermarks were based on the vmstat NR_FREE_PAGES. To avoid synchronization overhead, these counters are maintained on a per-cpu basis and drained both periodically and when a threshold is above a threshold. On large CPU systems, the difference between the estimate and real value of NR_FREE_PAGES can be very high. The system can get into a case where pages are allocated far below the min watermark potentially causing livelock issues. The commit solved the problem by taking a better reading of NR_FREE_PAGES when memory was low. Unfortately, as reported by Shaohua Li this accurate reading can consume a large amount of CPU time on systems with many sockets due to cache line bouncing. This patch takes a different approach. For large machines where counter drift might be unsafe and while kswapd is awake, the per-cpu thresholds for the target pgdat are reduced to limit the level of drift to what should be a safe level. This incurs a performance penalty in heavy memory pressure by a factor that depends on the workload and the machine but the machine should function correctly without accidentally exhausting all memory on a node. There is an additional cost when kswapd wakes and sleeps but the event is not expected to be frequent - in Shaohua's test case, there was one recorded sleep and wake event at least. To ensure that kswapd wakes up, a safe version of zone_watermark_ok() is introduced that takes a more accurate reading of NR_FREE_PAGES when called from wakeup_kswapd, when deciding whether it is really safe to go back to sleep in sleeping_prematurely() and when deciding if a zone is really balanced or not in balance_pgdat(). We are still using an expensive function but limiting how often it is called. When the test case is reproduced, the time spent in the watermark functions is reduced. The following report is on the percentage of time spent cumulatively spent in the functions zone_nr_free_pages(), zone_watermark_ok(), __zone_watermark_ok(), zone_watermark_ok_safe(), zone_page_state_snapshot(), zone_page_state(). vanilla 11.6615% disable-threshold 0.2584% David said: : We had to pull aa454840 "mm: page allocator: calculate a better estimate : of NR_FREE_PAGES when memory is low and kswapd is awake" from 2.6.36 : internally because tests showed that it would cause the machine to stall : as the result of heavy kswapd activity. I merged it back with this fix as : it is pending in the -mm tree and it solves the issue we were seeing, so I : definitely think this should be pushed to -stable (and I would seriously : consider it for 2.6.37 inclusion even at this late date). Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reported-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Tested-by: Nicolas Bareil <nico@chdir.org> Cc: David Rientjes <rientjes@google.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.	David S. Miller
	[ Upstream commit 3610cda53f247e176bcbb7a7cca64bc53b12acdb ] unix_release() can asynchornously set socket->sk to NULL, and it does so without holding the unix_state_lock() on "other" during stream connects. However, the reverse mapping, sk->sk_socket, is only transitioned to NULL under the unix_state_lock(). Therefore make the security hooks follow the reverse mapping instead of the forward mapping. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	mm: migration: use rcu_dereference_protected when dereferencing the radix ↵	Mel Gorman
	tree slot during file page migration commit 29c1f677d424e8c5683a837fc4f03fc9f19201d7 upstream. migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d ("fix rcu_read_lock() in page migraton"). The point of the RCU protection there is part of getting a stable reference to anon_vma and is only held for anon pages as file pages are locked which is sufficient protection against freeing. However, while a file page's mapping is being migrated, the radix tree is double checked to ensure it is the expected page. This uses radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held triggering the following warning. [ 173.674290] =================================================== [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] [ 173.676016] --------------------------------------------------- [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! [ 173.676016] [ 173.676016] other info that might help us debug this: [ 173.676016] [ 173.676016] [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 [ 173.676016] 1 lock held by hugeadm/2899: [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab [ 173.676016] [ 173.676016] stack backtrace: [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild [ 173.676016] Call Trace: [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa This patch introduces radix_tree_deref_slot_protected() which calls rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock that is protecting this dereference. Holding the tree lock protects against parallel updaters of the radix tree meaning that rcu_dereference_protected is allowable. [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Milton Miller <miltonm@bga.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	block: fix accounting bug on cross partition merges	Jerome Marchand
	commit 09e099d4bafea3b15be003d548bdf94b4b6e0e17 upstream. /proc/diskstats would display a strange output as follows. $ cat /proc/diskstats \|grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137 Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE. The detailed root cause is as follows. Assuming that there are two partition, sda1 and sda2. 1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1. \| hd_struct->in_flight --------------------------- sda1 \| 0 sda2 \| 1 --------------------------- 2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed. \| hd_struct->in_flight --------------------------- sda1 \| 0 sda2 \| 1 --------------------------- 3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented. \| hd_struct->in_flight --------------------------- sda1 \| -1 sda2 \| 1 --------------------------- The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do. Also add a refcount to struct hd_struct to keep the partition in memory as long as users exist. We use kref_test_and_get() to ensure we don't add a reference to a partition which is going away. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	kref: add kref_test_and_get	Jerome Marchand
	commit e4a683c899cd5a49f8d684a054c95bd115a0c005 upstream. Add kref_test_and_get() function, which atomically add a reference only if refcount is not zero. This prevent to add a reference to an object that is already being removed. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	dynamic debug: Fix build issue with older gcc	Jason Baron
	commit 2d75af2f2a7a6103a6d539a492fe81deacabde44 upstream. On older gcc (3.3) dynamic debug fails to compile: include/net/inet_connection_sock.h: In function `inet_csk_reset_xmit_timer': include/net/inet_connection_sock.h:236: error: duplicate label declaration `do_printk' include/net/inet_connection_sock.h:219: error: this is a previous declaration include/net/inet_connection_sock.h:236: error: duplicate label declaration `out' include/net/inet_connection_sock.h:219: error: this is a previous declaration include/net/inet_connection_sock.h:236: error: duplicate label `do_printk' include/net/inet_connection_sock.h:236: error: duplicate label `out' Fix, by reverting the usage of JUMP_LABEL() in dynamic debug for now. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	NFS: Don't use vm_map_ram() in readdir	Trond Myklebust
	commit 6650239a4b01077e80d5a4468562756d77afaa59 upstream. vm_map_ram() is not available on NOMMU platforms, and causes trouble on incoherrent architectures such as ARM when we access the page data through both the direct and the virtual mapping. The alternative is to use the direct mapping to access page data for the case when we are not crossing a page boundary, but to copy the data into a linear scratch buffer when we are accessing data that spans page boundaries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	cfg80211: fix disabling channels based on hints	Luis R. Rodriguez
	commit ca4ffe8f2848169a8ded0ea8a60b2d81925564c9 upstream. After a module loads you will have loaded the world roaming regulatory domain or a custom regulatory domain. Further regulatory hints are welcomed and should be respected unless the regulatory hint is coming from a country IE as the IEEE spec allows for a country IE to be a subset of what is allowed by the local regulatory agencies. So disable all channels that do not fit a regulatory domain sent from a unless the hint is from a country IE and the country IE had no information about the band we are currently processing. This fixes a few regulatory issues, for example for drivers that depend on CRDA and had no 5 GHz freqencies allowed were not properly disabling 5 GHz at all, furthermore it also allows users to restrict devices further as was intended. If you recieve a country IE upon association we will also disable the channels that are not allowed if the country IE had at least one channel on the respective band we are procesing. This was the original intention behind this design but it was completely overlooked... Cc: David Quan <david.quan@atheros.com> Cc: Jouni Malinen <jouni.malinen@atheros.com> cc: Easwar Krishnan <easwar.krishnan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17	USB: serial: handle Data Carrier Detect changes	Libor Pechacek
	commit d14fc1a74e846d7851f24fc9519fe87dc12a1231 upstream. Alan's commit 335f8514f200e63d689113d29cb7253a5c282967 introduced .carrier_raised function in several drivers. That also means tty_port_block_til_ready can now suspend the process trying to open the serial port when Carrier Detect is low and put it into tty_port.open_wait queue. We need to wake up the process when Carrier Detect goes high and trigger TTY hangup when CD goes low. Some of the devices do not report modem status line changes, or at least we don't understand the status message, so for those we remove .carrier_raised again. Signed-off-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-03	Merge branch 'fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: dmaengine: provide dummy functions for DMA_ENGINE=n mv_xor: fix race in tasklet function
2011-01-03	dmaengine: provide dummy functions for DMA_ENGINE=n	Guennadi Liakhovetski
	This lets drivers, optionally using the dmaengine, build with DMA_ENGINE unselected. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-12-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) ipv4: dont create routes on down devices epic100: hamachi: yellowfin: Fix skb allocation size sundance: Fix oopses with corrupted skb_shared_info Revert "ipv4: Allow configuring subnets as local addresses" USB: mcs7830: return negative if auto negotiate fails irda: prevent integer underflow in IRLMP_ENUMDEVICES tcp: fix listening_get_next() atl1c: Do not use legacy PCI power management mac80211: fix mesh forwarding MAINTAINERS: email address change net: Fix range checks in tcf_valid_offset(). net_sched: sch_sfq: fix allot handling hostap: remove netif_stop_queue from init mac80211/rt2x00: add ieee80211_tx_status_ni() typhoon: memory corruption in typhoon_get_drvinfo() net: Add USB PID for new MOSCHIP USB ethernet controller MCS7832 variant net_sched: always clone skbs ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed. netlink: fix gcc -Wconversion compilation warning asix: add USB ID for Logitec LAN-GTJ U2A ...
2010-12-24	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: print out alloc information with KERN_DEBUG instead of KERN_INFO kthread_work: make lockdep happy
2010-12-22	taskstats: pad taskstats netlink response for aligment issues on ia64	Jeff Mahoney
	The taskstats structure is internally aligned on 8 byte boundaries but the layout of the aggregrate reply, with two NLA headers and the pid (each 4 bytes), actually force the entire structure to be unaligned. This causes the kernel to issue unaligned access warnings on some architectures like ia64. Unfortunately, some software out there doesn't properly unroll the NLA packet and assumes that the start of the taskstats structure will always be 20 bytes from the start of the netlink payload. Aligning the start of the taskstats structure breaks this software, which we don't want. So, for now the alignment only happens on architectures that require it and those users will have to update to fixed versions of those packages. Space is reserved in the packet only when needed. This ifdef should be removed in several years e.g. 2012 once we can be confident that fixed versions are installed on most systems. We add the padding before the aggregate since the aggregate is already a defined type. Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems") previously addressed the alignment issues by padding out the pid field. This was supposed to be a compatible change but the circumstances described above mean that it wasn't. This patch backs out that change, since it was a hack, and introduces a new NULL attribute type to provide the padding. Padding the response with 4 bytes avoids allocating an aligned taskstats structure and copying it back. Since the structure weighs in at 328 bytes, it's too big to do it on the stack. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Brian Rogers <brian@xyzw.org> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Guillaume Chazarain <guichaz@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-22	include/linux/unaligned: pack the whole struct rather than just the field	Will Newton
	The current packed struct implementation of unaligned access adds the packed attribute only to the field within the unaligned struct rather than to the struct as a whole. This is not sufficient to enforce proper behaviour on architectures with a default struct alignment of more than one byte. For example, the current implementation of __get_unaligned_cpu16 when compiled for arm with gcc -O1 -mstructure-size-boundary=32 assumes the struct is on a 4 byte boundary so performs the load of the 16bit packed field as if it were on a 4 byte boundary: __get_unaligned_cpu16: ldrh r0, [r0, #0] bx lr Moving the packed attribute to the struct rather than the field causes the proper unaligned access code to be generated: __get_unaligned_cpu16: ldrb r3, [r0, #0] @ zero_extendqisi2 ldrb r0, [r0, #1] @ zero_extendqisi2 orr r0, r3, r0, asl #8 bx lr Signed-off-by: Will Newton <will.newton@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-22	kthread_work: make lockdep happy	Yong Zhang
	spinlock in kthread_worker and wait_queue_head in kthread_work both should be lockdep sensible, so change the interface to make it suiltable for CONFIG_LOCKDEP. tj: comment update Reported-by: Nicolas <nicolas.mailhot@laposte.net> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Andy Walls <awalls@md.metrocast.net> Tested-by: Andy Walls <awalls@md.metrocast.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-20	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path
2010-12-20	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds
	* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix cciss_revalidate panic block: max hardware sectors limit wrapper block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead blk-throttle: Correct the placement of smp_rmb() blk-throttle: Trim/adjust slice_end once a bio has been dispatched block: check for proper length of iov entries earlier in blk_rq_map_user_iov() drbd: fix for spin_lock_irqsave in endio callback drbd: don't recvmsg with zero length
2010-12-20	clarify a usage constraint for cnt32_to_63()	Nicolas Pitre
	The cnt32_to_63 algorithm relies on proper counter data evaluation ordering to work properly. This was missing from the provided documentation. Let's augment the documentation with the missing usage constraint and fix the only instance that got it wrong. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-19	Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure we can map all of lowmem if we need to x86, vt-d: Handle previous faults after enabling fault handling x86: Enable the intr-remap fault handling after local APIC setup x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() bootmem: Add alloc_bootmem_align() x86, gcc-4.6: Use gcc -m options when building vdso x86: HPET: Chose a paranoid safe value for the ETIME check x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix off by one in perf_swevent_init() perf: Fix duplicate events with multiple-pmu vs software events ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO scripts/tags.sh: Add magic for trace-events tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19	Merge branch 'sched-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix the irqtime code for 32bit sched: Fix the irqtime code to deal with u64 wraps nohz: Fix get_next_timer_interrupt() vs cpu hotplug Sched: fix skip_clock_update optimization sched: Cure more NO_HZ load average woes
2010-12-18	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86: avoid high BIOS area when allocating address space x86: avoid E820 regions when allocating address space x86: avoid low BIOS area when allocating address space resources: add arch hook for preventing allocation in reserved areas Revert "resources: support allocating space within a region from the top down" Revert "PCI: allocate bus resources from the top down" Revert "x86/PCI: allocate space from the end of a region, not the beginning" Revert "x86: allocate space within a region top-down" Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-17	netlink: fix gcc -Wconversion compilation warning	Dmitry V. Levin
	$ cat << EOF \| gcc -Wconversion -xc -S -o/dev/null - unsigned f(void) {return NLMSG_HDRLEN;} EOF <stdin>: In function 'f': <stdin>:3:26: warning: negative integer implicitly converted to unsigned type Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-17	resources: add arch hook for preventing allocation in reserved areas	Bjorn Helgaas
	This adds arch_remove_reservations(), which an arch can implement if it needs to protect part of the address space from allocation. Sometimes that can be done by just putting a region in the resource tree, but there are cases where that doesn't work well. For example, x86 BIOS E820 reservations are not related to devices, so they may overlap part of, all of, or more than a device resource, so they may not end up at the correct spot in the resource tree. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17	Revert "resources: support allocating space within a region from the top down"	Bjorn Helgaas
	This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17	ceph: mark user pages dirty on direct-io reads	Henry C Chang
	For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-17	Merge branch 'pm-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Runtime: Fix pm_runtime_suspended() PM / Hibernate: Restore old swap signature to avoid user space breakage PM / Hibernate: Fix PM_POST_* notification with user-space suspend
2010-12-17	block: max hardware sectors limit wrapper	Mike Snitzer
	Implement blk_limits_max_hw_sectors() and make blk_queue_max_hw_sectors() a wrapper around it. DM needs this to avoid setting queue_limits' max_hw_sectors and max_sectors directly. dm_set_device_limits() now leverages blk_limits_max_hw_sectors() logic to establish the appropriate max_hw_sectors minimum (PAGE_SIZE). Fixes issue where DM was incorrectly setting max_sectors rather than max_hw_sectors (which caused dm_merge_bvec()'s max_hw_sectors check to be ineffective). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-17	block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead	Martin K. Petersen
	When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-16	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify	Linus Torvalds
	* 'for-linus' of git://git.infradead.org/users/eparis/notify: fanotify: fill in the metadata_len field on struct fanotify_event_metadata fanotify: split version into version and metadata_len fanotify: Dont try to open a file descriptor for the overflow event fanotify: Introduce FAN_NOFD fanotify: do not leak user reference on allocation failure inotify: stop kernel memory leak on file creation failure fanotify: on group destroy allow all waiters to bypass permission check fanotify: Dont allow a mask of 0 if setting or removing a mark fanotify: correct broken ref counting in case adding a mark failed fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called fanotify: remove packed from access response message fanotify: deny permissions when no event was sent
2010-12-16	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus	Linus Torvalds
	* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (28 commits) MIPS: Add a CONFIG_FORCE_MAX_ZONEORDER Kconfig option. MIPS: LD/SD o32 macro GAS fix update MIPS: Alchemy: fix build with SERIAL_8250=n MIPS: Rename mips_dma_cache_sync back to dma_cache_sync MIPS: MT: Fix typo in comment. SSB: Fix nvram_get on BCM47xx platform MIPS: BCM47xx: Swap serial console if ttyS1 was specified. MIPS: BCM47xx: Use sscanf for parsing mac address MIPS: BCM47xx: Fill values for b43 into SSB sprom MIPS: BCM47xx: Do not read config from CFE MIPS: FDT size is a be32 MIPS: Fix CP0 COUNTER clockevent race MIPS: Fix regression on BCM4710 processor detection MIPS: JZ4740: Fix pcm device name MIPS: Separate two consecutive loads in memset.S MIPS: Send proper signal and siginfo on FP emulator faults. MIPS: AR7: Fix loops per jiffies on TNETD7200 devices MIPS: AR7: Fix double ar7_gpio_init declaration MIPS: Rework GENERIC_HARDIRQS Kconfig. MIPS: Alchemy: Add return value check for strict_strtoul() ...
2010-12-16	SSB: Fix nvram_get on BCM47xx platform	Hauke Mehrtens
	The nvram_get function was never in the mainline kernel, it only existed in an external OpenWrt patch. Use nvram_getenv function, which is in mainline and use an include instead of an extra function declaration. et0macaddr contains the mac address in text from like 00:11:22:33:44:55. We have to parse it before adding it into macaddr. nvram_parse_macaddr will be merged into asm/mach-bcm47xx/nvram.h through the MIPS git tree and will be available soon. It will not build now without nvram_parse_macaddr, but it hasn't before either. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> To: linux-mips@linux-mips.org Cc: mb@bu3sch.de Cc: netdev@vger.kernel.org Cc: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Michael Buesch <mb@bu3sch.de> Patchwork: https://patchwork.linux-mips.org/patch/1849/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-12-16	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 Input: wacom - add another Bamboo Pen ID (0xd4)
2010-12-16	PM / Runtime: Fix pm_runtime_suspended()	Rafael J. Wysocki
	There are some situations (e.g. in __pm_generic_call()), where pm_runtime_suspended() is used to decide whether or not to execute a device's (system) ->suspend() callback. The callback is not executed if pm_runtime_suspended() returns true, but it does so for devices that don't even support runtime PM, because the power.disable_depth device field is ignored by it. This leads to problems (i.e. devices are not suspened when they should), so rework pm_runtime_suspended() so that it returns false if the device's power.disable_depth field is different from zero. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2010-12-15	fanotify: split version into version and metadata_len	Alexey Zaytsev
	To implement per event type optional headers we are interested in knowing how long the metadata structure is. This patch slits the __u32 version field into a __u8 version and a __u16 metadata_len field (with __u8 left over). This should allow for backwards compat ABI. Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> [rewrote descrtion and changed object sizes and ordering - eparis] Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-14	Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2	Dmitry Torokhov
	The desire to keep old names for the EVIOCGKEYCODE/EVIOCSKEYCODE while extending them to support large scancodes was a mistake. While we tried to keep ABI intact (and we succeeded in doing that, programs compiled on older kernels will work on newer ones) there is still a problem with recompiling existing software with newer kernel headers. New kernel headers will supply updated ioctl numbers and kernel will expect that userspace will use struct input_keymap_entry to set and retrieve keymap data. But since the names of ioctls are still the same userspace will happily compile even if not adjusted to make use of the new structure and will start miraculously fail in the field. To avoid this issue let's revert EVIOCGKEYCODE/EVIOCSKEYCODE definitions and add EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 so that userspace can explicitly select the style of ioctls it wants to employ. Reviewed-by: Henrik Rydberg <rydberg@euromail.se> Acked-by: Jarod Wilson <jarod@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-12-14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (75 commits) pppoe.c: Fix kernel panic caused by __pppoe_xmit WAN: Fix a TX IRQ causing BUG() in PC300 and PCI200SYN drivers. bnx2x: Advance a version number to 1.60.01-0 bnx2x: Fixed a compilation warning bnx2x: LSO code was broken on BE platforms qlge: Fix deadlock when cancelling worker. net: fix skb_defer_rx_timestamp() cxgb4vf: Ingress Queue Entry Size needs to be 64 bytes phy: add the IC+ IP1001 driver atm: correct sysfs 'device' link creation and parent relationships MAINTAINERS: remove me from tulip SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address enic: Bug Fix: Pass napi reference to the isr that services receive queue ipv6: fix nl group when advertising a new link connector: add module alias net: Document the kernel_recvmsg() function r8169: Fix runtime power management hso: IP checksuming doesn't work on GE0301 option cards xfrm: Fix xfrm_state_migrate leak net: Convert netpoll blocking api in bonding driver to be a counter ...
2010-12-14	Merge branch 'release' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI / PM: Do not save/restore NVS on Sony Vaio VGN-NW130D ACPI/HEST: adjust section selection ACPI: eliminate unused variable warning for !ACPI_SLEEP ACPI/PNP: avoid section mismatch warning ACPI thermal: remove two unused functions ACPI: fix a section mismatch ACPI, APEI, use raw spinlock in ERST ACPI: video: fix build for CONFIG_ACPI=n ACPI: video: fix build for VIDEO_OUTPUT_CONTROL=n ACPI: fix allowing to add/remove multiple _OSI strings acpi: fix _OSI string setup regression ACPI: EC: Add another dmi match entry for MSI hardware ACPI battery: update status upon sysfs query ACPI ac: update AC status upon sysfs query ACPI / PM: Do not refcount power resources that can't be turned on ACPI / PM: Check device state before refcounting power resources
2010-12-14	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: HDA: Quirk for Dell Vostro 320 to make microphone work ALSA: hda - Reset sample sizes and max bitrates when reading ELD ALSA: hda - Always allow basic audio irrespective of ELD info ALSA: hda - Do not wrongly restrict min_channels based on ELD ASoC: Correct WM8962 interrupt mask register read ASoC: WM8580: Debug BCLK and sample size ASoC: Fix resource leak if soc_register_ac97_dai_link failed ASoC: Hold client_mutex while calling snd_soc_instantiate_cards() ASoC: Fix swap of left and right channels for WM8993/4 speaker boost gain ASoC: Fix off by one error in WM8994 EQ register bank size ALSA: hda: Use position_fix=1 for Acer Aspire 5538 to enable capture on internal mic ALSA: hda - Enable jack sense for Thinkpad Edge 13 ALSA: hda - Fix ThinkPad T410[s] docking station line-out ALSA: hda: Use model=lg quirk for LG P1 Express to enable playback and capture
2010-12-14	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6	Linus Torvalds
	* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix panic after nfs_umount() nfs: remove extraneous and problematic calls to nfs_clear_request nfs: kernel should return EPROTONOSUPPORT when not support NFSv4 NFS: Fix fcntl F_GETLK not reporting some conflicts nfs: Discard ACL cache on mode update NFS: Readdir cleanups NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found NFS: Fix a memory leak in nfs_readdir Call the filesystem back whenever a page is removed from the page cache NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
2010-12-13	bootmem: Add alloc_bootmem_align()	Suresh Siddha
	Add an alloc_bootmem_align() interface to allocate bootmem with specified alignment. This is necessary to be able to allocate the xsave area in a subsequent patch. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20101116212441.977574826@sbsiddha-MOBL3.sc.intel.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>