| Age | Commit message (Collapse) | Author |
|
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
instead of tasklist_lock read-locked. This is a scalability improvement on
SMP and a preemption-latency improvement under PREEMPT_RCU.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Kernel core files converted to use the new lock initializers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Now there is no point in calling costly find_pid(type) if
__detach_pid(type) returned non zero value.
Acked-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Kirill's kernel/pid.c rework broke optimization logic in detach_pid(). Non
zero return from __detach_pid() was used to indicate, that this pid can
probably be freed. Current version always (modulo idle threads) return non
zero value, thus resulting in unneccesary pid_hash scanning.
Also, uninlining __detach_pid() reduces pid.o text size from 2492 to 1600
bytes.
Acked-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The pid_max sysctl doesn't enforce PID_MAX_LIMIT or sane lower bounds.
RESERVED_PIDS + 1 is the minimum pid_max that won't break alloc_pidmap(), and
PID_MAX_LIMIT may not be aligned to 8*PAGE_SIZE boundaries for unusual values
of PAGE_SIZE, so this also rounds up PID_MAX_LIMIT to it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I was informed that the vendor component of the copyright can't be clobbered
without more care, so this patch retains the older vendor, updating it only to
reflect the appropriate time period.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Rewrite alloc_pidmap() to clarify control flow by eliminating all usage of
goto, honor pid_max and first available pid after last_pid semantics, make
only a single pass over the used portion of the pid bitmap, and update
copyrights to reflect ongoing maintenance by Ingo and myself.
Signed-off-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We are now allocating twice as much memory as required.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch fixes strange and obscure pid implementation in current kernels:
- it removes calling of put_task_struct() from detach_pid()
under tasklist_lock. This allows to use blocking calls
in security_task_free() hooks (in __put_task_struct()).
- it saves some space = 5*5 ints = 100 bytes in task_struct
- it's smaller and tidy, more straigthforward and doesn't use
any knowledge about pids using and assignment.
- it removes pid_links and pid_struct doesn't hold reference counters
on task_struct. instead, new pid_structs and linked altogether and
only one of them is inserted in hash_list.
Signed-off-by: Kirill Korotaev (kksx@mail.ru)
Signed-off-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use hlists for the PID hashes. This halves the memory footprint of these
hashes. No benchmarks, but I think this is a worthy improvement because
the hashes are something that would be likely to have significant portions
loaded into the cache of every CPU on some workloads.
This comes at the "expense" of
1. reintroducing the memory prefetch into the hash traversal loop;
2. adding new pids to the head of the list instead of the tail. I
suspect that if this was a big problem then the hash isn't sized
well or could benefit from moving hot entries to the head.
Also, account for all the pid hashes when reporting hash memory usage.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
A 4GB, 4-way Opteron would create the smallest size table (16 entries) because
pidhash_init is called before mem_init which is where x86-64 sets up max_pfn.
nr_kernel_pages is setup by paging_init, called from setup_arch, which is also
where i386 sets up max_pfn.
So export nr_kernel_pages, nr_all_pages. Use nr_kernel_pages when sizing the
PID hash. This fixes the problem.
This also makes the pid hash dependant on the size of ZONE_NORMAL instead of
total size of memory.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Roland McGrath <roland@redhat.com>
This patch moves all the fields relating to job control from task_struct to
signal_struct, so that all this info is properly per-process rather than
being per-thread.
|
|
From: Gerd Knorr <kraxel@suse.de>
Current gcc's error out if a function's declaration and definition disagree
about the register passing convention.
The patch adds a new `fastcall' declatation primitive, and uses that in all
the FASTCALL functions which we could find. A number of inconsistencies were
fixed up along the way.
|
|
cause NULL pointer references in /proc.
Moreover, it's questionable whether the whole thing makes sense at all.
Per-thread state is good.
Cset exclude: davem@nuts.ninka.net|ChangeSet|20031005193942|01097
Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20031005180420|42200
Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20031005180411|42211
|
|
|
|
From: Roland McGrath <roland@redhat.com>
This patch completes what was started with the `process_group' accessor
function, moving all the job control-related fields from task_struct into
signal_struct and using process_foo accessor functions to read them. All
these things are per-process in POSIX, none per-thread. Off hand it's hard
to come up with the hairy MT scenarios in which the existing code would do
insane things, but trust me, they're there. At any rate, all the uses
being done via inline accessor functions now has got to be all good.
I did a "make allyesconfig" build and caught the few random drivers and
whatnot that referred to these fields. I was surprised to find how few
references to ->tty there really were to fix up. I'm sure there will be a
few more fixups needed in non-x86 code. The only actual testing of a
running kernel with these patches I've done is on my normal minimal x86
config. Everything works fine as it did before as far as I can tell.
One issue that may be of concern is the lack of any locking on multiple
threads diddling these fields. I don't think it really matters, though
there might be some obscure races that could produce inconsistent job
control results. Nothing shattering, I'm sure; probably only something
like a multi-threaded program calling setsid while its other threads do tty
i/o, which never happens in reality. This is the same situation we get by
using ->group_leader->foo without other synchronization, which seemed to be
the trend and noone was worried about it.
|
|
From: Jeremy Fitzhardinge <jeremy@goop.org>
I'm resending my patch to fix this problem. To recap: every task_struct
has its own copy of the thread group's pgrp. Only the thread group
leader is allowed to change the tgrp's pgrp, but it only updates its own
copy of pgrp, while all the other threads in the tgrp use the old value
they inherited on creation.
This patch simply updates all the other thread's pgrp when the tgrp
leader changes pgrp. Ulrich has already expressed reservations about
this patch since it is (1) incomplete (it doesn't cover the case of
other ids which have similar problems), (2) racy (it doesn't synchronize
with other threads looking at the task pgrp, so they could see an
inconsistent view) and (3) slow (it takes linear time with respect to
the number of threads in the tgrp).
My reaction is that (1) it fixes the actual bug I'm encountering in a
real program. (2) doesn't really matter for pgrp, since it is mostly an
issue with respect to the terminal job-control code (which is even more
broken without this patch. Regarding (3), I think there are very few
programs which have a large number of threads which change process group
id on a regular basis (a heavily multi-threaded job-control shell?).
Ulrich also said he has a (proposed?) much better fix, which I've been
looking forward to. I'm submitting this patch as a stop-gap fix for a
real bug, and perhaps to prompt the improved patch.
An alternative fix, at least for pgrp, is to change all references to
->pgrp to group_leader->pgrp. This may be sufficient on its own, but it
would be a reasonably intrusive patch (I count 95 instances in 32 files
in the 2.6.0-test3-mm3 tree).
|
|
From: Manfred Spraul <manfred@colorfullife.com>
de_thread calls list_del(¤t->tasks), but current->tasks was never
added to the task list. The structure contains stale values from the parent.
switch_exec_pid() transforms a normal thread to a thread group leader.
Thread group leaders are included in the init_task.tasks linked list,
non-leaders are not in that list. The patch adds the new thread group
leader to the linked list, otherwise de_thread corrupts the task list.
|
|
|
|
Patch from Bill Irwin. Prodding from me.
The hashtables in kernel/pid.c are 128 kbytes, which is far too large for
very small machines.
So we dynamically size them and allocate them from bootmem. From 16 buckets
on the very smallest machine up to 4096 buckets (effectively half the current
size) with one gigabyte of memory or more.
The patch also switches the hashing from a custom hash over to the more
powerful hash_long().
|
|
|
|
This removes the cmpxchg from the PID allocator and replaces it with a
spinlock. This spinlock is hit only a couple of times per bootup, so
it's not a performance issue.
|
|
This does the following things:
- removes the ->thread_group list and uses a new PIDTYPE_TGID pid class
to handle thread groups. This cleans up lots of code in signal.c and
elsewhere.
- fixes sys_execve() if a non-leader thread calls it. (2.5.38 crashed in
this case.)
- renames list_for_each_noprefetch to __list_for_each.
- cleans up delayed-leader parent notification.
- introduces link_pid() to optimize PIDTYPE_TGID installation in the
thread-group case.
I've tested the patch with a number of threaded and non-threaded
workloads, and it works just fine. Compiles & boots on UP and SMP x86.
The session/pgrp bugs reported to lkml are probably still open, they are
the next on my todo - now that we have a clean pidhash architecture they
should be easier to fix.
|
|
the attached patch (against BK-curr) fixes a bug in the new PID allocator,
which bug can cause incorrect hashing of the PID structure which causes
infinite loops in find_pid(). [and potentially other problems.]
|
|
|
|
This is the latest version of the generic pidhash patch. The biggest
change is the removal of separately allocated pid structures: they are
now part of the task structure and the first task that uses a PID will
provide the pid structure. Task refcounting is used to avoid the
freeing of the task structure before every member of a process group or
session has exited.
This approach has a number of advantages besides the performance gains.
Besides simplifying the whole hashing code significantly, attach_pid()
is now fundamentally atomic and can be called during create_process()
without worrying about task-list side-effects. It does not have to
re-search the pidhash to find out about raced PID-adding either, and
attach_pid() cannot fail due to OOM. detach_pid() can do a simple
put_task_struct() instead of the kmem_cache_free().
The only minimal downside is the potential pending task structures after
session leaders or group leaders have exited - but the number of orphan
sessions and process groups is usually very low - and even if it's
higher, this can be regarded as a slow execution of the final
deallocation of the session leader, not some additional burden.
|