[PATCH] sys_exit_group(), threading, 2.5.34

This is another step to have better threading support under Linux, it implements the sys_exit_group() system call. It's a straightforward extension of the generic 'thread group' concept, which extension also comes handy to solve a number of problems when implementing POSIX threads. POSIX exit() [the C library function] has the following semantics: all thread have to exit and the waiting parent has to get the exit code that was specified for the exit() function. It also has to be ensured that every thread has truly finished its work by the time the parent gets the notification. The exit code has to be propagated properly to the parent thread even if not the thread group leader calls the exit() function. Normal single-thread exit is done via the pthread_exit() function, which calls sys_exit(). Previous incarnations of Linux POSIX threads implementations chose the following solution: send a 'thread management' signal to the thread group leader via tkill(), which thread goes around and kills every thread in the group (except itself), then calls sys_exit() with the proper exit code. Both old libpthreads and NGPT use this solution. This works to a certain degree, unless a userspace threading library uses the initial thread for normal thread work [like the new libpthreads], which 'work' can cause the initial thread to exit prematurely. At this point the threading library has to catch the group leader in pthread_exit() and has to keep the management thread 'hanging around' artificially, waiting for the management signal. Besides being slightly confusing to users ('why is this thread still around?') even this variant is unrobust: if the initial thread is killed by the kernel (SIGSEGV or any other thread-specific event that triggers do_exit()) then the thread goes away without the thread library having a chance to intervene. the sys_exit_group() syscall implements the mechanism within the kernel, which, besides robustness, is also *much* faster. Instead of the threading library having to tkill() every thread available, the kernel can use the already existing 'broadcast signal' capability. (the threading library cannot use broadcast signals because that would kill the initial thread as well.) as a side-effect of the completion mechanism used by sys_exit_group() it was also possible to make the initial thread hang around as a zombie until every other thread in the group has exited. A 'Z' state thread is much easier to understand by users - it's around because it has to wait for all other threads to exit first. and as a side-effect of the initial thread hanging around in a guaranteed way, there are three advantages: - signals sent to the thread group via sys_kill() work again. Previously if the initial thread exited then all subsequent sys_kill() calls to the group PID failed with a -ESRCH. - the get_pid() function got faster: it does not have to check for tgid collision anymore. - procps has an easier job displaying threaded applications - since the thread group leader is always around, no thread group can 'hide' from procps just because the thread group leader has exited. [ - NOTE: the same mechanism can/will also be used by the upcoming threaded-coredumps patch. ] there's also another (small) advantage for threading libraries: eg. the new libpthreads does not even have any notion of 'group of threads' anymore - it does not maintain any global list of threads. Via this syscall it can purely rely on the kernel to manage thread groups. the patch itself does some internal changes to the way a thread exits: now the unhashing of the PID and the signal-freeing is done atomically. This is needed to make sure the thread group leader unhashes itself precisely when the last thread group member has exited. (the sys_exit_group() syscall has been used by glibc's new libpthreads code for the past couple of weeks and the concept is working just fine.)
author: Ingo Molnar <mingo@elte.hu> 2002-09-10 18:09:33 -0700
committer: Jens Axboe <axboe@hera.kernel.org> 2002-09-10 18:09:33 -0700
commit: b62bf7327c1eef1c47688760ec51f2a48aa3c132 (patch)
tree: 7b5b6dd65c5b0c42948a80cf62fbf8244fddb79b /include/linux
parent: 4c21fddc2f83fabbbaae47891c461024a452235b (diff)
1 files changed, 9 insertions, 2 deletions
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 600035a0b715..bdce46f40af2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -27,6 +27,7 @@ extern unsigned long event;
 #include <linux/securebits.h>
 #include <linux/fs_struct.h>
 #include <linux/compiler.h>
+#include <linux/completion.h>
 
 struct exec_domain;
 
@@ -128,8 +129,6 @@ struct sched_param {
 	int sched_priority;
 };
 
-struct completion;
-
 #ifdef __KERNEL__
 
 #include <linux/spinlock.h>
@@ -216,6 +215,12 @@ struct signal_struct {
         task_t                  *curr_target;
 
 	struct sigpending	shared_pending;
+
+	/* thread group exit support */
+	int			group_exit;
+	int			group_exit_code;
+
+	struct completion	group_exit_done;
 };
 
 /*
@@ -555,6 +560,7 @@ extern void notify_parent(struct task_struct *, int);
 extern void do_notify_parent(struct task_struct *, int);
 extern void force_sig(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
+extern int __broadcast_thread_group(struct task_struct *p, int sig);
 extern int kill_pg(pid_t, int, int);
 extern int kill_sl(pid_t, int, int);
 extern int kill_proc(pid_t, int, int);
@@ -661,6 +667,7 @@ extern void exit_thread(void);
 extern void exit_mm(struct task_struct *);
 extern void exit_files(struct task_struct *);
 extern void exit_sighand(struct task_struct *);
+extern void __exit_sighand(struct task_struct *);
 extern void remove_thread_group(struct task_struct *tsk, struct signal_struct *sig);
 
 extern void reparent_to_init(void);
author	Ingo Molnar <mingo@elte.hu>	2002-09-10 18:09:33 -0700
committer	Jens Axboe <axboe@hera.kernel.org>	2002-09-10 18:09:33 -0700
commit	b62bf7327c1eef1c47688760ec51f2a48aa3c132 (patch)
tree	7b5b6dd65c5b0c42948a80cf62fbf8244fddb79b /include/linux
parent	4c21fddc2f83fabbbaae47891c461024a452235b (diff)