user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2006-03-31	[PATCH] pidhash: Refactor the pid hash table	Eric W. Biederman
	Simplifies the code, reduces the need for 4 pid hash tables, and makes the code more capable. In the discussions I had with Oleg it was felt that to a large extent the cleanup itself justified the work. With struct pid being dynamically allocated meant we could create the hash table entry when the pid was allocated and free the hash table entry when the pid was freed. Instead of playing with the hash lists when ever a process would attach or detach to a process. For myself the fact that it gave what my previous task_ref patch gave for free with simpler code was a big win. The problem is that if you hold a reference to struct task_struct you lock in 10K of low memory. If you do that in a user controllable way like /proc does, with an unprivileged but hostile user space application with typical resource limits of 1000 fds and 100 processes I can trigger the OOM killer by consuming all of low memory with task structs, on a machine wight 1GB of low memory. If I instead hold a reference to struct pid which holds a pointer to my task_struct, I don't suffer from that problem because struct pid is 2 orders of magnitude smaller. In fact struct pid is small enough that most other kernel data structures dwarf it, so simply limiting the number of referring data structures is enough to prevent exhaustion of low memory. This splits the current struct pid into two structures, struct pid and struct pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one. struct pid_link is the per process linkage into the hash tables and lives in struct task_struct. struct pid is given an indepedent lifetime, and holds pointers to each of the pid types. The independent life of struct pid simplifies attach_pid, and detach_pid, because we are always manipulating the list of pids and not the hash table. In addition in giving struct pid an indpendent life it makes the concept much more powerful. Kernel data structures can now embed a struct pid * instead of a pid_t and not suffer from pid wrap around problems or from keeping unnecessarily large amounts of memory allocated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] pidhash: don't count idle threads	Oleg Nesterov
	fork_idle() does unhash_process() just after copy_process(). Contrary, boot_cpu's idle thread explicitely registers itself for each pid_type with nr = 0. copy_process() already checks p->pid != 0 before process_counts++, I think we can just skip attach_pid() calls and job control inits for idle threads and kill unhash_process(). We don't need to cleanup ->proc_dentry in fork_idle() because with this patch idle threads are never hashed in kernel/pid.c:pid_hash[]. We don't need to hash pid == 0 in pidmap_init(). free_pidmap() is never called with pid == 0 arg, so it will never be reused. So it is still possible to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV. However with this patch we don't hash pid == 0 for PIDTYPE_PID case. We still have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and kernel threads which don't call daemonize(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] pidhash: kill switch_exec_pids	Eric W. Biederman
	switch_exec_pids is only called from de_thread by way of exec, and it is only called when we are exec'ing from a non thread group leader. Currently switch_exec_pids gives the leader the pid of the thread and unhashes and rehashes all of the process groups. The leader is already in the EXIT_DEAD state so no one cares about it's pids. The only concern for the leader is that __unhash_process called from release_task will function correctly. If we don't touch the leader at all we know that __unhash_process will work fine so there is no need to touch the leader. For the task becomming the thread group leader, we just need to give it the pid of the old thread group leader, add it to the task list, and attach it to the session and the process group of the thread group. Currently de_thread is also adding the task to the task list which is just silly. Currently the only leader of __detach_pid besides detach_pid is switch_exec_pids because of the ugly extra work that was being performed. So this patch removes switch_exec_pids because it is doing too much, it is creating an unnecessary special case in pid.c, duing work duplicated in de_thread, and generally obscuring what it is going on. The necessary work is added to de_thread, and it seems to be a little clearer there what is going on. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08	[PATCH] RCU signal handling	Ingo Molnar
	RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked instead of tasklist_lock read-locked. This is a scalability improvement on SMP and a preemption-latency improvement under PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: William Irwin <wli@holomorphy.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-07	[PATCH] Lock initializer cleanup (Core)	Thomas Gleixner
	Kernel core files converted to use the new lock initializers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-19	[PATCH] detach_pid(): eliminate one find_pid() call	Oleg Nesterov
	Now there is no point in calling costly find_pid(type) if __detach_pid(type) returned non zero value. Acked-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-19	[PATCH] detach_pid(): restore optimization	Oleg Nesterov
	Kirill's kernel/pid.c rework broke optimization logic in detach_pid(). Non zero return from __detach_pid() was used to indicate, that this pid can probably be freed. Current version always (modulo idle threads) return non zero value, thus resulting in unneccesary pid_hash scanning. Also, uninlining __detach_pid() reduces pid.o text size from 2492 to 1600 bytes. Acked-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] pidhashing: enforce PID_MAX_LIMIT in sysctls	William Lee Irwin III
	The pid_max sysctl doesn't enforce PID_MAX_LIMIT or sane lower bounds. RESERVED_PIDS + 1 is the minimum pid_max that won't break alloc_pidmap(), and PID_MAX_LIMIT may not be aligned to 8*PAGE_SIZE boundaries for unusual values of PAGE_SIZE, so this also rounds up PID_MAX_LIMIT to it. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] pidhashing: retain older vendor copyright	William Lee Irwin III
	I was informed that the vendor component of the copyright can't be clobbered without more care, so this patch retains the older vendor, updating it only to reflect the appropriate time period. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] pidhashing: rewrite alloc_pidmap()	William Lee Irwin III
	Rewrite alloc_pidmap() to clarify control flow by eliminating all usage of goto, honor pid_max and first available pid after last_pid semantics, make only a single pass over the used portion of the pid bitmap, and update copyrights to reflect ongoing maintenance by Ingo and myself. Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-13	[PATCH] Allocate correct amount of memory for pid hash	Anton Blanchard
	We are now allocating twice as much memory as required. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-02	[PATCH] fixed pidhashing patch	Kirill Korotaev
	This patch fixes strange and obscure pid implementation in current kernels: - it removes calling of put_task_struct() from detach_pid() under tasklist_lock. This allows to use blocking calls in security_task_free() hooks (in __put_task_struct()). - it saves some space = 5*5 ints = 100 bytes in task_struct - it's smaller and tidy, more straigthforward and doesn't use any knowledge about pids using and assignment. - it removes pid_links and pid_struct doesn't hold reference counters on task_struct. instead, new pid_structs and linked altogether and only one of them is inserted in hash_list. Signed-off-by: Kirill Korotaev (kksx@mail.ru) Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-30	[PATCH] use hlist for pid hash	Nick Piggin
	Use hlists for the PID hashes. This halves the memory footprint of these hashes. No benchmarks, but I think this is a worthy improvement because the hashes are something that would be likely to have significant portions loaded into the cache of every CPU on some workloads. This comes at the "expense" of 1. reintroducing the memory prefetch into the hash traversal loop; 2. adding new pids to the head of the list instead of the tail. I suspect that if this was a big problem then the hash isn't sized well or could benefit from moving hot entries to the head. Also, account for all the pid hashes when reporting hash memory usage. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-30	[PATCH] fix PID hash sizing	Nick Piggin
	A 4GB, 4-way Opteron would create the smallest size table (16 entries) because pidhash_init is called before mem_init which is where x86-64 sets up max_pfn. nr_kernel_pages is setup by paging_init, called from setup_arch, which is also where i386 sets up max_pfn. So export nr_kernel_pages, nr_all_pages. Use nr_kernel_pages when sizing the PID hash. This fixes the problem. This also makes the pid hash dependant on the size of ZONE_NORMAL instead of total size of memory. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-04-11	[PATCH] move job control fields from task_struct to signal_struct	Andrew Morton
	From: Roland McGrath <roland@redhat.com> This patch moves all the fields relating to job control from task_struct to signal_struct, so that all this info is properly per-process rather than being per-thread.
2004-03-06	[PATCH] fastcall / regparm fixes	Andrew Morton
	From: Gerd Knorr <kraxel@suse.de> Current gcc's error out if a function's declaration and definition disagree about the register passing convention. The patch adds a new `fastcall' declatation primitive, and uses that in all the FASTCALL functions which we could find. A number of inconsistencies were fixed up along the way.
2003-10-09	Revert the process group accessor functions. They are buggy, and	Linus Torvalds
	cause NULL pointer references in /proc. Moreover, it's questionable whether the whole thing makes sense at all. Per-thread state is good. Cset exclude: davem@nuts.ninka.net\|ChangeSet\|20031005193942\|01097 Cset exclude: akpm@osdl.org[torvalds]\|ChangeSet\|20031005180420\|42200 Cset exclude: akpm@osdl.org[torvalds]\|ChangeSet\|20031005180411\|42211
2003-10-07	o kernel/ksyms.c: move remaining EXPORT_SYMBOLs, remove this file from the tree	Arnaldo Carvalho de Melo

2003-10-04	[PATCH] move job control fields from task_struct to	Andrew Morton
	From: Roland McGrath <roland@redhat.com> This patch completes what was started with the `process_group' accessor function, moving all the job control-related fields from task_struct into signal_struct and using process_foo accessor functions to read them. All these things are per-process in POSIX, none per-thread. Off hand it's hard to come up with the hairy MT scenarios in which the existing code would do insane things, but trust me, they're there. At any rate, all the uses being done via inline accessor functions now has got to be all good. I did a "make allyesconfig" build and caught the few random drivers and whatnot that referred to these fields. I was surprised to find how few references to ->tty there really were to fix up. I'm sure there will be a few more fixups needed in non-x86 code. The only actual testing of a running kernel with these patches I've done is on my normal minimal x86 config. Everything works fine as it did before as far as I can tell. One issue that may be of concern is the lack of any locking on multiple threads diddling these fields. I don't think it really matters, though there might be some obscure races that could produce inconsistent job control results. Nothing shattering, I'm sure; probably only something like a multi-threaded program calling setsid while its other threads do tty i/o, which never happens in reality. This is the same situation we get by using ->group_leader->foo without other synchronization, which seemed to be the trend and noone was worried about it.
2003-09-21	[PATCH] Fix setpgid and threads	Andrew Morton
	From: Jeremy Fitzhardinge <jeremy@goop.org> I'm resending my patch to fix this problem. To recap: every task_struct has its own copy of the thread group's pgrp. Only the thread group leader is allowed to change the tgrp's pgrp, but it only updates its own copy of pgrp, while all the other threads in the tgrp use the old value they inherited on creation. This patch simply updates all the other thread's pgrp when the tgrp leader changes pgrp. Ulrich has already expressed reservations about this patch since it is (1) incomplete (it doesn't cover the case of other ids which have similar problems), (2) racy (it doesn't synchronize with other threads looking at the task pgrp, so they could see an inconsistent view) and (3) slow (it takes linear time with respect to the number of threads in the tgrp). My reaction is that (1) it fixes the actual bug I'm encountering in a real program. (2) doesn't really matter for pgrp, since it is mostly an issue with respect to the terminal job-control code (which is even more broken without this patch. Regarding (3), I think there are very few programs which have a large number of threads which change process group id on a regular basis (a heavily multi-threaded job-control shell?). Ulrich also said he has a (proposed?) much better fix, which I've been looking forward to. I'm submitting this patch as a stop-gap fix for a real bug, and perhaps to prompt the improved patch. An alternative fix, at least for pgrp, is to change all references to ->pgrp to group_leader->pgrp. This may be sufficient on its own, but it would be a reasonably intrusive patch (I count 95 instances in 32 files in the 2.6.0-test3-mm3 tree).
2003-05-12	[PATCH] de_thread memory corruption fix	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> de_thread calls list_del(&current->tasks), but current->tasks was never added to the task list. The structure contains stale values from the parent. switch_exec_pid() transforms a normal thread to a thread group leader. Thread group leaders are included in the init_task.tasks linked list, non-leaders are not in that list. The patch adds the new thread group leader to the linked list, otherwise de_thread corrupts the task list.
2003-01-05	kernel/pid.c: Use proper size_t printf format string.	David S. Miller

2003-01-05	[PATCH] Dynamically size the pidhash hash table.	Andrew Morton
	Patch from Bill Irwin. Prodding from me. The hashtables in kernel/pid.c are 128 kbytes, which is far too large for very small machines. So we dynamically size them and allocate them from bootmem. From 16 buckets on the very smallest machine up to 4096 buckets (effectively half the current size) with one gigabyte of memory or more. The patch also switches the hashing from a custom hash over to the more powerful hash_long().
2002-11-08	Bit find operations return past the end on failure.	Linus Torvalds

2002-09-24	[PATCH] pidhash-2.5.38-A0	Ingo Molnar
	This removes the cmpxchg from the PID allocator and replaces it with a spinlock. This spinlock is hit only a couple of times per bootup, so it's not a performance issue.
2002-09-22	[PATCH] pidhash cleanups, tgid-2.5.38-F3	Ingo Molnar
	This does the following things: - removes the ->thread_group list and uses a new PIDTYPE_TGID pid class to handle thread groups. This cleans up lots of code in signal.c and elsewhere. - fixes sys_execve() if a non-leader thread calls it. (2.5.38 crashed in this case.) - renames list_for_each_noprefetch to __list_for_each. - cleans up delayed-leader parent notification. - introduces link_pid() to optimize PIDTYPE_TGID installation in the thread-group case. I've tested the patch with a number of threaded and non-threaded workloads, and it works just fine. Compiles & boots on UP and SMP x86. The session/pgrp bugs reported to lkml are probably still open, they are the next on my todo - now that we have a clean pidhash architecture they should be easier to fix.
2002-09-19	[PATCH] pidhash-fix-2.5.36-A0	Ingo Molnar
	the attached patch (against BK-curr) fixes a bug in the new PID allocator, which bug can cause incorrect hashing of the PID structure which causes infinite loops in find_pid(). [and potentially other problems.]
2002-09-19	kernel/pid.c:next_free_map Pass 3rd arg to cmpxchg as pointer.	David S. Miller

2002-09-19	[PATCH] generic-pidhash-2.5.36-J2, BK-curr	Ingo Molnar
	This is the latest version of the generic pidhash patch. The biggest change is the removal of separately allocated pid structures: they are now part of the task structure and the first task that uses a PID will provide the pid structure. Task refcounting is used to avoid the freeing of the task structure before every member of a process group or session has exited. This approach has a number of advantages besides the performance gains. Besides simplifying the whole hashing code significantly, attach_pid() is now fundamentally atomic and can be called during create_process() without worrying about task-list side-effects. It does not have to re-search the pidhash to find out about raced PID-adding either, and attach_pid() cannot fail due to OOM. detach_pid() can do a simple put_task_struct() instead of the kmem_cache_free(). The only minimal downside is the potential pending task structures after session leaders or group leaders have exited - but the number of orphan sessions and process groups is usually very low - and even if it's higher, this can be regarded as a slow execution of the final deallocation of the session leader, not some additional burden.