user/sven/linux.git/ipc, branch v3.2.65

ipc/msg: fix race around refcount

2014-04-09T01:20:45Z

In older kernels (before v3.10) ipc_rcu_hdr->refcount was non-atomic int. There was possuble double-free bug: do_msgsnd() calls ipc_rcu_putref() under msq->q_perm->lock and RCU, while freequeue() calls it while it holds only 'rw_mutex', so there is no sinchronization between them. Two function decrements '2' non-atomically, they both can get '0' as result. do_msgsnd() freequeue() msq = msg_lock_check(ns, msqid); ... ipc_rcu_getref(msq); msg_unlock(msq); schedule(); (caller locks spinlock) expunge_all(msq, -EIDRM); ss_wakeup(&msq->q_senders, 1); msg_rmid(ns, msq); msg_unlock(msq); ipc_lock_by_ptr(&msq->q_perm); ipc_rcu_putref(msq); ipc_rcu_putref(msq); < both may get get --(...)->refcount == 0 > This patch locks ipc_lock and RCU around ipc_rcu_putref in freequeue. ( RCU protects memory for spin_unlock() ) Similar bugs might be in other users of ipc_rcu_putref(). In the mainline this has been fixed in v3.10 indirectly in commmit 6062a8dc0517bce23e3c2f7d2fea5e22411269a3 ("ipc,sem: fine grained locking for semtimedop") by Rik van Riel. That commit optimized locking and converted refcount into atomic. I'm not sure that anybody should care about this bug: it's very-very unlikely and no longer exists in actual mainline. I've found this just by looking into the code, probably this never happens in real life. Signed-off-by: Konstantin Khlebnikov Signed-off-by: Ben Hutchings

ipc, msg: fix message length check for negative values

2014-01-03T04:33:21Z

commit 4e9b45a19241354daec281d7a785739829b52359 upstream. On 64 bit systems the test for negative message sizes is bogus as the size, which may be positive when evaluated as a long, will get truncated to an int when passed to load_msg(). So a long might very well contain a positive value but when truncated to an int it would become negative. That in combination with a small negative value of msg_ctlmax (which will be promoted to an unsigned type for the comparison against msgsz, making it a big positive value and therefore make it pass the check) will lead to two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too small buffer as the addition of alen is effectively a subtraction. 2/ The copy_from_user() call in load_msg() will first overflow the buffer with userland data and then, when the userland access generates an access violation, the fixup handler copy_user_handle_tail() will try to fill the remainder with zeros -- roughly 4GB. That almost instantly results in a system crash or reset. ,-[ Reproducer (needs to be run as root) ]-- | #include | #include | #include | #include | | int main(void) { | long msg = 1; | int fd; | | fd = open("/proc/sys/kernel/msgmax", O_WRONLY); | write(fd, "-1", 2); | close(fd); | | msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT); | | return 0; | } '--- Fix the issue by preventing msgsz from getting truncated by consistently using size_t for the message length. This way the size checks in do_msgsnd() could still be passed with a negative value for msg_ctlmax but we would fail on the buffer allocation in that case and error out. Also change the type of m_ts from int to size_t to avoid similar nastiness in other code paths -- it is used in similar constructs, i.e. signed vs. unsigned checks. It should never become negative under normal circumstances, though. Setting msg_ctlmax to a negative value is an odd configuration and should be prevented. As that might break existing userland, it will be handled in a separate commit so it could easily be reverted and reworked without reintroducing the above described bug. Hardening mechanisms for user copy operations would have catched that bug early -- e.g. checking slab object sizes on user copy operations as the usercopy feature of the PaX patch does. Or, for that matter, detect the long vs. int sign change due to truncation, as the size overflow plugin of the very same patch does. [akpm@linux-foundation.org: fix i386 min() warnings] Signed-off-by: Mathias Krause Cc: Pax Team Cc: Davidlohr Bueso Cc: Brad Spengler Cc: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.2: - Adjust context - Drop changes to alloc_msg() and copy_msg(), which don't exist] Signed-off-by: Ben Hutchings

ipc: sysv shared memory limited to 8TiB

2013-05-13T14:02:30Z

commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream. Trying to run an application which was trying to put data into half of memory using shmget(), we found that having a shmall value below 8EiB-8TiB would prevent us from using anything more than 8TiB. By setting kernel.shmall greater than 8EiB-8TiB would make the job work. In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX. ipc/shm.c: 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params) 459 { ... 465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT; ... 474 if (ns->shm_tot + numpages > ns->shm_ctlall) 475 return -ENOSPC; [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int] Signed-off-by: Robin Holt Reported-by: Alex Thorlton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings

SHM_UNLOCK: fix Unevictable pages stranded after swap

2012-01-26T00:13:59Z

commit 245132643e1cfcd145bbc86a716c1818371fcb93 upstream. Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in find_get_pages") correctly fixed an infinite loop; but left a problem that find_get_pages() on shmem would return 0 (appearing to callers to mean end of tree) when it meets a run of nr_pages swap entries. The only uses of find_get_pages() on shmem are via pagevec_lookup(), called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's scan_mapping_unevictable_pages(). The first is already commented, and not worth worrying about; but the second can leave pages on the Unevictable list after an unusual sequence of swapping and locking. Fix that by using shmem_find_get_pages_and_swap() (then ignoring the swap) instead of pagevec_lookup(). But I don't want to contaminate vmscan.c with shmem internals, nor shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into shmem.c, renaming it shmem_unlock_mapping(); and rename check_move_unevictable_page() to check_move_unevictable_pages(), looping down an array of pages, oftentimes under the same lock. Leave out the "rotate unevictable list" block: that's a leftover from when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed handling involved looking at pages at tail of LRU. Was there significance to the sequence first ClearPageUnevictable, then test page_evictable, then SetPageUnevictable here? I think not, we're under LRU lock, and have no barriers between those. Signed-off-by: Hugh Dickins Reviewed-by: KOSAKI Motohiro Cc: Minchan Kim Cc: Rik van Riel Cc: Shaohua Li Cc: Eric Dumazet Cc: Johannes Weiner Cc: Michel Lespinasse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

SHM_UNLOCK: fix long unpreemptible section

2012-01-26T00:13:59Z

commit 85046579bde15e532983438f86b36856e358f417 upstream. scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages evictable again once the shared memory is unlocked. It does this with pagevec_lookup()s across the whole object (which might occupy most of memory), and takes 300ms to unlock 7GB here. A cond_resched() every PAGEVEC_SIZE pages would be good. However, KOSAKI-san points out that this is called under shmem.c's info->lock, and it's also under shm.c's shm_lock(), both spinlocks. There is no strong reason for that: we need to take these pages off the unevictable list soonish, but those locks are not required for it. So move the call to scan_mapping_unevictable_pages() from shmem.c's unlock handling up to shm.c's unlock handling. Remove the recently added barrier, not needed now we have spin_unlock() before the scan. Use get_file(), with subsequent fput(), to make sure we have a reference to mapping throughout scan_mapping_unevictable_pages(): that's something that was previously guaranteed by the shm_lock(). Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK time, and we lazily discover them to be Unevictable later, so it serves no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since pages still on pagevec are not marked Unevictable. The original code avoided redundant rescans by checking VM_LOCKED flag at its level: now avoid them by checking shp's SHM_LOCKED. The original code called scan_mapping_unevictable_pages() on a locked area at shm_destroy() time: perhaps we once had accounting cross-checks which required that, but not now, so skip the overhead and just let inode eviction deal with them. Put check_move_unevictable_page() and scan_mapping_unevictable_pages() under CONFIG_SHMEM (with stub for the TINY case when ramfs is used), more as comment than to save space; comment them used for SHM_UNLOCK. Signed-off-by: Hugh Dickins Reviewed-by: KOSAKI Motohiro Cc: Minchan Kim Cc: Rik van Riel Cc: Shaohua Li Cc: Eric Dumazet Cc: Johannes Weiner Cc: Michel Lespinasse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

... and the same kind of leak for mqueue

2011-12-09T05:40:21Z

Signed-off-by: Al Viro

ipc/sem.c: remove private structures from public header file

2011-11-02T23:07:01Z

include/linux/sem.h contains several structures that are only used within ipc/sem.c. The patch moves them into ipc/sem.c - there is no need to expose the structures to the whole kernel. No functional changes, only whitespace cleanups and 80-char per line fixes. Signed-off-by: Manfred Spraul Acked-by: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

ipc/sem.c: handle spurious wakeups

2011-11-02T23:07:01Z

semtimedop() does not handle spurious wakeups, it returns -EINTR to user space. Most other schedule() users would just loop and not return to user space. The patch adds such a loop to semtimedop() Signed-off-by: Manfred Spraul Reported-by: Peter Zijlstra Acked-by: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID)

2011-11-02T23:07:01Z

sys_semtimedop() may return -EIDRM although the semaphore operation completed successfully: thread 1: thread 2: semtimedop(), sleeps semop(): * acquires sem_lock() semtimedop() woken up due to timeout sem_lock() loops * notices that thread 2 could be completed. * performs the operations that thread 2 is sleeping on. * marks the semaphore operation as IN_WAKEUP * drops sem_lock(), does wakeup, sets return code to 0 * thread delayed due to interrupt, whatever * returns to user space * thread still delayed semctl(IPC_RMID) * acquires sem_lock() * ipc_rmid(), ipcp->deleted=1 * drops sem_lock() * thread finally continues - but seem_lock() now fails due to ipcp->deleted == 1 * returns -EIDRM instead of 0 The fix is trivial: Always use the return code in queue.status. In real world, the race probably doesn't matter: If the semaphore array is destroyed, the app is probably not interested if the last operation succeeded or was already cancelled. Signed-off-by: Manfred Spraul Cc: Thomas Gleixner Cc: Mike Galbraith Acked-by: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

ipc/mqueue.c: fix wrong use of schedule_hrtimeout_range_clock()

2011-11-01T00:30:44Z

Fix the wrong use of schedule_hrtimeout_range_clock() in wq_sleep(), although it is harmless for the syscall mq_timed* now. It was introduced by 9ca7d8e ("mqueue: Convert message queue timeout to use hrtimers"). Signed-off-by: Wanlong Gao Cc: Carsten Emde Cc: Thomas Gleixner Cc: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds