[PATCH] signal-fixes-2.5.59-A4

this is the current threading patchset, which accumulated up during the past two weeks. It consists of a biggest set of changes from Roland, to make threaded signals work. There were still tons of testcases and boundary conditions (mostly in the signal/exit/ptrace area) that we did not handle correctly. Roland's thread-signal semantics/behavior/ptrace fixes: - fix signal delivery race with do_exit() => signals are re-queued to the 'process' if do_exit() finds pending unhandled ones. This prevents signals getting lost upon thread-sys_exit(). - a non-main thread has died on one processor and gone to TASK_ZOMBIE, but before it's gotten to release_task a sys_wait4 on the other processor reaps it. It's only because it's ptraced that this gets through eligible_child. Somewhere in there the main thread is also dying so it reparents the child thread to hit that case. This means that there is a race where P might be totally invalid. - forget_original_parent is not doing the right thing when the group leader dies, i.e. reparenting threads to init when there is a zombie group leader. Perhaps it doesn't matter for any practical purpose without ptrace, though it makes for ppid=1 for each thread in core dumps, which looks funny. Incidentally, SIGCHLD here really should be p->exit_signal. - one of the gdb tests makes a questionable assumption about what kill will do when it has some threads stopped by ptrace and others running. exit races: 1. Processor A is in sys_wait4 case TASK_STOPPED considering task P. Processor B is about to resume P and then switch to it. While A is inside that case block, B starts running P and it clears P->exit_code, or takes a pending fatal signal and sets it to a new value. Depending on the interleaving, the possible failure modes are: a. A gets to its put_user after B has cleared P->exit_code => returns with WIFSTOPPED, WSTOPSIG==0 b. A gets to its put_user after B has set P->exit_code anew => returns with e.g. WIFSTOPPED, WSTOPSIG==SIGKILL A can spend an arbitrarily long time in that case block, because there's getrusage and put_user that can take page faults, and write_lock'ing of the tasklist_lock that can block. But even if it's short the race is there in principle. 2. This is new with NPTL, i.e. CLONE_THREAD. Two processors A and B are both in sys_wait4 case TASK_STOPPED considering task P. Both get through their tests and fetches of P->exit_code before either gets to P->exit_code = 0. => two threads return the same pid from waitpid. In other interleavings where one processor gets to its put_user after the other has cleared P->exit_code, it's like case 1(a). 3. SMP races with stop/cont signals First, take: kill(pid, SIGSTOP); kill(pid, SIGCONT); or: kill(pid, SIGSTOP); kill(pid, SIGKILL); It's possible for this to leave the process stopped with a pending SIGCONT/SIGKILL. That's a state that should never be possible. Moreover, kill(pid, SIGKILL) without any repetition should always be enough to kill a process. (Likewise SIGCONT when you know it's sequenced after the last stop signal, must be sufficient to resume a process.) 4. take: kill(pid, SIGKILL); // or any fatal signal kill(pid, SIGCONT); // or SIGKILL it's possible for this to cause pid to be reaped with status 0 instead of its true termination status. The equivalent scenario happens when the process being killed is in an _exit call or a trap-induced fatal signal before the kills. plus i've done stability fixes for bugs that popped up during beta-testing, and minor tidying of Roland's changes: - a rare tasklist corruption during exec, causing some very spurious and colorful crashes. - a copy_process()-related dereference of already freed thread structure if hit with a SIGKILL in the wrong moment. - SMP spinlock deadlocks in the signal code this patchset has been tested quite well in the 2.4 backport of the threading changes - and i've done some stresstesting on 2.5.59 SMP as well, and did an x86 UP testcompile + testboot as well.
author: Ingo Molnar <mingo@elte.hu> 2003-02-05 20:49:30 -0800
committer: Linus Torvalds <torvalds@home.transmeta.com> 2003-02-05 20:49:30 -0800
commit: ebf5ebe31d2cd1e0f13e5b65deb0b4af7afd9dc1 (patch)
tree: b6af9aa99995048ac5d1731dea69610086c3b8d3 /fs/exec.c
parent: 44a5a59c0b5d34ff01c685be87894f24132a8328 (diff)
1 files changed, 4 insertions, 2 deletions
diff --git a/fs/exec.c b/fs/exec.c
index 028fbda85a71..0b41239937b7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -587,7 +587,7 @@ static inline int de_thread(struct signal_struct *oldsig)
 		return -EAGAIN;
 	}
 	oldsig->group_exit = 1;
-	__broadcast_thread_group(current, SIGKILL);
+	zap_other_threads(current);
 
 	/*
 	 * Account for the thread group leader hanging around:
@@ -659,7 +659,8 @@ static inline int de_thread(struct signal_struct *oldsig)
 			current->ptrace = ptrace;
 			__ptrace_link(current, parent);
 		}
-		
+
+		list_del(&current->tasks);
 		list_add_tail(&current->tasks, &init_task.tasks);
 		current->exit_signal = SIGCHLD;
 		state = leader->state;
@@ -680,6 +681,7 @@ out:
 	newsig->group_exit = 0;
 	newsig->group_exit_code = 0;
 	newsig->group_exit_task = NULL;
+	newsig->group_stop_count = 0;
 	memcpy(newsig->action, current->sig->action, sizeof(newsig->action));
 	init_sigpending(&newsig->shared_pending);
author	Ingo Molnar <mingo@elte.hu>	2003-02-05 20:49:30 -0800
committer	Linus Torvalds <torvalds@home.transmeta.com>	2003-02-05 20:49:30 -0800
commit	ebf5ebe31d2cd1e0f13e5b65deb0b4af7afd9dc1 (patch)
tree	b6af9aa99995048ac5d1731dea69610086c3b8d3 /fs/exec.c
parent	44a5a59c0b5d34ff01c685be87894f24132a8328 (diff)