user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2005-03-07	[PATCH] make ITIMER_REAL per-process	Roland McGrath
	POSIX requires that setitimer, getitimer, and alarm work on a per-process basis. Currently, Linux implements these for individual threads. This patch fixes these semantics for the ITIMER_REAL timer (which generates SIGALRM), making it shared by all threads in a process (thread group). Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-11	[PATCH] cputime: introduce cputime	Martin Schwidefsky
	This patch introduces the concept of (virtual) cputime. Each architecture can define its method to measure cputime. The main idea is to define a cputime_t type and a set of operations on it (see asm-generic/cputime.h). Then use the type for utime, stime, cutime, cstime, it_virt_value, it_virt_incr, it_prof_value and it_prof_incr and use the cputime operations for each access to these variables. The default implementation is jiffies based and the effect of this patch for architectures which use the default implementation should be neglectible. There is a second type cputime64_t which is necessary for the kernel_stat cpu statistics. The default cputime_t is 32 bit and based on HZ, this will overflow after 49.7 days. This is not enough for kernel_stat (ihmo not enough for a processes too), so it is necessary to have a 64 bit type. The third thing that gets introduced by this patch is an additional field for the /proc/stat interface: cpu steal time. An architecture can account cpu steal time by calls to the account_stealtime function. The cpu which backs a virtual processor doesn't spent all of its time for the virtual cpu. To get meaningful cpu usage numbers this involuntary wait time needs to be accounted and exported to user space. From: Hugh Dickins <hugh@veritas.com> The p->signal check in account_system_time is insufficient. If the timer interrupt hits near the end of exit_notify, after EXIT_ZOMBIE has been set, another cpu may release_task (NULLifying p->signal) in between account_system_time's check and check_rlimit's dereference. Nor should account_it_prof risk send_sig. But surely account_user_time is safe? Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04	[PATCH] FRV: procfs changes for nommu changes	David Howells
	The attached patch splits some memory-related procfs files into MMU and !MMU versions and places them in separate conditionally-compiled files. A header file local to the fs/proc/ directory is used to declare functions and the like. Additionally, a !MMU-only proc file (/proc/maps) is provided so that master VMA list in a uClinux kernel is viewable. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-12-12	[PATCH] do_task_stat() use pid_alive()	Andrew Morton
	Use pid_alive() rather than testing for a zero value of ->pid. Is the right thing to do and addresses an oops dereferencing real_parent which one person reported. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-12-02	[PATCH] proc_pid_status() oops fix	Manfred Spraul
	proc_pid_status dereferences pointers in the task structure even if the task is already dead. This is probably the reason for the oops described in http://bugme.osdl.org/show_bug.cgi?id=3812 The attached patch removes the pointer dereferences by using pid_alive() for testing that the task structure contents is still valid before dereferencing them. The task structure itself is guaranteed to be valid - we hold a reference count. Signed-Off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-20	[PATCH] stat shows wrong ppid	Dinakar Guniguntala
	One more place in fs/proc/array.c where ppid is wrong, which I missed in my previous mail to lkml. Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-19	[PATCH] ps shows wrong ppid	Dinakar Guniguntala
	/proc shows the wrong PID as parent in the following case Process A creates Threads 1 & 2 (using pthread_create) Thread 2 then forks and execs process B getppid() for Process B shows Process A (rightly) as parent, however /proc/B/status shows Thread 3 as PPid (incorrect). Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] fix & clean up zombie/dead task handling & preemption	Ingo Molnar
	This patch fixes all the preempt-after-task->state-is-TASK_DEAD problems we had. Right now, the moment procfs does a down() that sleeps in proc_pid_flush() [it could] our TASK_DEAD state is zapped and we might be back to TASK_RUNNING to and we trigger this assert: schedule(); BUG(); /* Avoid "noreturn function does return". */ for (;;) ; I have split out TASK_ZOMBIE and TASK_DEAD into a separate p->exit_state field, to allow the detaching of exit-signal/parent/wait-handling from descheduling a dead task. Dead-task freeing is done via PF_DEAD. Tested the patch on x86 SMP and UP, but all architectures should work fine. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] show aggregate per-process counters in /proc/PID/stat 2	Lev Makhlis
	Add up resource usage counters for live and dead threads to show aggregate per-process usage in /proc/<pid>/stat. This mirrors the new getrusage() semantics. /proc/<pid>/task/<tid>/stat still has the per-thread usage. After moving the counter aggregation loop inside a task->sighand lock to avoid nasty race conditions, it has survived stress-testing with '(while true; do sleep 1 & done) & top -d 0.1' Signed-off-by: Lev Makhlis <mlev@despammed.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] distinct tgid/tid CPU usage	Albert Cahalan
	This patch adjusts /proc/*/stat to have distinct per-process and per-thread CPU usage, faults, and wchan. Signed-off-by: Albert Cahalan <albert@users.sf.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] make rlimit settings per-process instead of per-thread	Roland McGrath
	POSIX specifies that the limit settings provided by getrlimit/setrlimit are shared by the whole process, not specific to individual threads. This patch changes the behavior of those calls to comply with POSIX. I've moved the struct rlimit array from task_struct to signal_struct, as it has the correct sharing properties. (This reduces kernel memory usage per thread in multithreaded processes by around 100/200 bytes for 32/64 machines respectively.) I took a fairly minimal approach to the locking issues with the newly shared struct rlimit array. It turns out that all the code that is checking limits really just needs to look at one word at a time (one rlim_cur field, usually). It's only the few places like getrlimit itself (and fork), that require atomicity in accessing a whole struct rlimit, so I just used a spin lock for them and no locking for most of the checks. If it turns out that readers of struct rlimit need more atomicity where they are now cheap, or less overhead where they are now atomic (e.g. fork), then seqcount is certainly the right thing to use for them instead of readers using the spin lock. Though it's in signal_struct, I didn't use siglock since the access to rlimits never needs to disable irqs and doesn't overlap with other siglock uses. Instead of adding something new, I overloaded task_lock(task->group_leader) for this; it is used for other things that are not likely to happen simultaneously with limit tweaking. To me that seems preferable to adding a word, but it would be trivial (and arguably cleaner) to add a separate lock for these users (or e.g. just use seqlock, which adds two words but is optimal for readers). Most of the changes here are just the trivial s/->rlim/->signal->rlim/. I stumbled across what must be a long-standing bug, in reparent_to_init. It does: memcpy(current->rlim, init_task.rlim, sizeof((current->rlim))); when surely it was intended to be: memcpy(current->rlim, init_task.rlim, sizeof(current->rlim)); As rlim is an array, the in the sizeof expression gets the size of the first element, so this just changes the first limit (RLIMIT_CPU). This is for kernel threads, where it's clear that resetting all the rlimits is what you want. With that fixed, the setting of RLIMIT_FSIZE in nfsd is superfluous since it will now already have been reset to RLIM_INFINITY. The other subtlety is removing: tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY; in exit_notify, which was to avoid a race signalling during self-reaping exit. As the limit is now shared, a dying thread should not change it for others. Instead, I avoid that race by checking current->state before the RLIMIT_CPU check. (Adding one new conditional in that path is now required one way or another, since if not for this check there would also be a new race with self-reaping exit later on clearing current->signal that would have to be checked for.) The one loose end left by this patch is with process accounting. do_acct_process temporarily resets the RLIMIT_FSIZE limit while writing the accounting record. I left this as it was, but it is now changing a limit that might be shared by other threads still running. I left this in a dubious state because it seems to me that processing accounting may already be more generally a dubious state when it comes to NPTL threads. I would think you would want one record per process, with aggregate data about all threads that ever lived in it, not a separate record for each thread. I don't use process accounting myself, but if anyone is interested in testing it out I could provide a patch to change it this way. One final note, this is not 100% to POSIX compliance in regards to rlimits. POSIX specifies that RLIMIT_CPU refers to a whole process in aggregate, not to each individual thread. I will provide patches later on to achieve that change, assuming this patch goes in first. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-13	[PATCH] Fix reporting of process start times	Tim Schmielau
	Derive process start times from the posix_clock_monotonic notion of uptime instead of "jiffies", consistent with the earlier change to /proc/uptime itself. (http://linus.bkbits.net:8080/linux-2.5/cset@3ef4851dGg0fxX58R9Zv8SIq9fzNmQ?na%0Av=index.html\|src/.\|src/fs\|src/fs/proc\|related/fs/proc/proc_misc.c) Process start times are reported to userspace in units of 1/USER_HZ since boot, thus applications as procps need the value of "uptime" to convert them into absolute time. Currently "uptime" is derived from an ntp-corrected time base, but process start time is derived from the free-running "jiffies" counter. This results in inaccurate, drifting process start times as seen by the user, even if the exported number stays constant, because the users notion of "jiffies" changes in time. It's John Stultz's patch anyways, which I only messed up a bit, but since people started trading signed-off lines on lkml: Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-07	[PATCH] cleanup ptrace stops and remove notify_parent	Roland McGrath
	This adds a new state TASK_TRACED that is used in place of TASK_STOPPED when a thread stops because it is ptraced. Now ptrace operations are only permitted when the target is in TASK_TRACED state, not in TASK_STOPPED. This means that if a process is stopped normally by a job control signal and then you PTRACE_ATTACH to it, you will have to send it a SIGCONT before you can do any ptrace operations on it. (The SIGCONT will be reported to ptrace and then you can discard it instead of passing it through when you call PTRACE_CONT et al.) If a traced child gets orphaned while in TASK_TRACED state, it morphs into TASK_STOPPED state. This makes it again possible to resume or destroy the process with SIGCONT or SIGKILL. All non-signal tracing stops should now be done via ptrace_notify. I've updated the syscall tracing code in several architectures to do this instead of replicating the work by hand. I also fixed several that were unnecessarily repeating some of the checks in ptrace_check_attach. Calling ptrace_check_attach alone is sufficient, and the old checks repeated before are now incorrect, not just superfluous. I've closed a race in ptrace_check_attach. With this, we should have a robust guarantee that when ptrace starts operating, the task will be in TASK_TRACED state and won't come out of it. This is because the only way to resume from TASK_TRACED is via ptrace operations, and only the one parent thread attached as the tracer can do those. This patch also cleans up the do_notify_parent and do_notify_parent_cldstop code so that the dead and stopped cases are completely disjoint. The notify_parent function is gone. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-30	[PATCH] fix rusage semantics	Roland McGrath
	This patch changes the rusage bookkeeping and the semantics of the getrusage and times calls in a couple of ways. The first change is in the c* fields counting dead child processes. POSIX requires that children that have died be counted in these fields when they are reaped by a wait* call, and that if they are never reaped (e.g. because of ignoring SIGCHLD, or exitting yourself first) then they are never counted. These were counted in release_task for all threads. I've changed it so they are counted in wait_task_zombie, i.e. exactly when being reaped. POSIX also specifies for RUSAGE_CHILDREN that the report include the reaped child processes of the calling process, i.e. whole thread group in Linux, not just ones forked by the calling thread. POSIX specifies tms_c[us]time fields in the times call the same way. I've moved the c* fields that contain this information into signal_struct, where the single set of counters accumulates data from any thread in the group that calls wait*. Finally, POSIX specifies getrusage and times as returning cumulative totals for the whole process (aka thread group), not just the calling thread. I've added fields in signal_struct to accumulate the stats of detached threads as they die. The process stats are the sums of these records plus the stats of remaining each live/zombie thread. The times and getrusage calls, and the internal uses for filling in wait4 results and siginfo_t, now iterate over the threads in the thread group and sum up their stats along with the stats recorded for threads already dead and gone. I added a new value RUSAGE_GROUP (-3) for the getrusage system call rather than changing the behavior of the old RUSAGE_SELF (0). POSIX specifies RUSAGE_SELF to mean all threads, so the glibc getrusage call will just translate it to RUSAGE_GROUP for new kernels. I did this thinking that someone somewhere might want the old behavior with an old glibc and a new kernel (it is only different if they are using CLONE_THREAD anyway). However, I've changed the times system call to conform to POSIX as well and did not provide any backward compatibility there. In that case there is nothing easy like a parameter value to use, it would have to be a new system call number. That seems pretty pointless. Given that, I wonder if it is worth bothering to preserve the compatible RUSAGE_SELF behavior by introducing RUSAGE_GROUP instead of just changing RUSAGE_SELF's meaning. Comments? I've done some basic testing on x86 and x86-64, and all the numbers come out right after these fixes. (I have a test program that shows a few Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-26	[PATCH] O(1) proc_pid_statm()	William Lee Irwin III
	Merely removing down_read(&mm->mmap_sem) from task_vsize() is too half-assed to let stand. The following patch removes the vma iteration as well as the down_read(&mm->mmap_sem) from both task_mem() and task_statm() and callers for the CONFIG_MMU=y case in favor of accounting the various stats reported at the times of vma creation, destruction, and modification. Unlike the 2.4.x patches of the same name, this has no per-pte-modification overhead whatsoever. This patch quashes end user complaints of top(1) being slow as well as kernel hacker complaints of per-pte accounting overhead simultaneously. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-26	[PATCH] task_vsize() locking cleanup	William Lee Irwin III
	task_vsize() doesn't need mm->mmap_sem for the CONFIG_MMU case; the semaphore doesn't prevent mm->total_vm from going stale or getting inconsistent with other numbers regardless. Also, KSTK_EIP() and KSTK_ESP() don't want or need protection from mm->mmap_sem either. So this pushes mm->mmap_sem to task_vsize() in the CONFIG_MMU=n task_vsize(). Also, hoist the prototype of task_vsize() into proc_fs.h The net result of this is a small speedup of procps for CONFIG_MMU. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23	[PATCH] clarify get_task_mm (mmgrab)	Hugh Dickins
	Clarify mmgrab by collapsing it into get_task_mm (in fork.c not inline), and commenting on the special case it is guarding against: when use_mm in an AIO daemon temporarily adopts the mm while it's on its way out. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] proc fs task name locking fix	Mike Kravetz
	Races have been observed between excec-time overwriting of task->comm and /proc accesses to the same data. This causes environment string information to appear in /proc. Fix that up by taking task_lock() around updates to and accesses to task->comm. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-05	[PATCH] fix /proc printing of TASK_DEAD state	Roland McGrath
	I just stumbled across this patch that's been sitting in my tree for ages. I thought I'd sent this in before. It's a trivial fix for the printing of task state in /proc and sysrq dumps and such, so that TASK_DEAD shows up correctly. This state is pretty much only ever there to be seen when there are exit/reaping bugs, but it's not like that hasn't come up. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-24	[PATCH] Fix race condition with current->group_info	Andrew Morton
	From: Olaf Kirch <okir@suse.de> I have been chasing a corruption of current->group_info on PPC during NFS stress tests. The problem seems to be that nfsd is messing with its group_info quite a bit, while some monitoring processes look at /proc/<pid>/status and do a get_group_info/put_group_info without any locking. This problem can be reproduced on ppc platforms within a few seconds if you generate some NFS load and do a "cat /proc/XXX/status" of an nfsd thread in a tight loop. I therefore think changes to current->group_info, and querying it from a different process, needs to be protected using the task_lock. (akpm: task->group_info here is safe against exit() because the task holds a ref on group_info which is released in __put_task_struct, and the /proc file has a ref on the task_struct).
2004-04-26	[PATCH] fs/proc/array.c: workaround for gcc-2.96	Andrew Morton
	From: Alan Stern <stern@rowland.harvard.edu> This patch is needed to work around gcc-2.96's limited ability to cope with long long intermediate expression types. I don't know why the code compiled okay earlier and failed now.
2004-04-11	[PATCH] procfs LoadAVG/load_avg scaling fix	Andrew Morton
	From: Ingo Molnar <mingo@elte.hu> Dave reported that /proc/*/status sometimes shows 101% as LoadAVG, which makes no sense. the reason of the bug is slightly incorrect scaling of the load_avg value. The patch below fixes this.
2004-04-11	[PATCH] eliminate nswap and cnswap	Andrew Morton
	From: Matt Mackall <mpm@selenic.com> The nswap and cnswap variables counters have never been incremented as Linux doesn't do task swapping.
2004-04-11	[PATCH] move job control fields from task_struct to signal_struct	Andrew Morton
	From: Roland McGrath <roland@redhat.com> This patch moves all the fields relating to job control from task_struct to signal_struct, so that all this info is properly per-process rather than being per-thread.
2004-02-18	[PATCH] NGROUPS 2.6.2rc2 + fixups	Andrew Morton
	From: Tim Hockin <thockin@sun.com>, Neil Brown <neilb@cse.unsw.edu.au>, me New groups infrastructure. task->groups and task->ngroups are replaced by task->group_info. Group)info is a refcounted, dynamic struct with an array of pages. This allows for large numbers of groups. The current limit of 32 groups has been raised to 64k groups. It can be raised more by changing the NGROUPS_MAX constant in limits.h
2003-10-14	[PATCH] number of threads in /proc	Albert Cahalan
	Having the number-of-threads value easily available turns out to be very important for procps performance. The /proc/*/stat thing getting reused has been zero since the 2.2.xx days, and was the seldom-used timeout value before that.
2003-10-09	Revert the process group accessor functions. They are buggy, and	Linus Torvalds
	cause NULL pointer references in /proc. Moreover, it's questionable whether the whole thing makes sense at all. Per-thread state is good. Cset exclude: davem@nuts.ninka.net\|ChangeSet\|20031005193942\|01097 Cset exclude: akpm@osdl.org[torvalds]\|ChangeSet\|20031005180420\|42200 Cset exclude: akpm@osdl.org[torvalds]\|ChangeSet\|20031005180411\|42211
2003-10-04	[PATCH] move job control fields from task_struct to	Andrew Morton
	From: Roland McGrath <roland@redhat.com> This patch completes what was started with the `process_group' accessor function, moving all the job control-related fields from task_struct into signal_struct and using process_foo accessor functions to read them. All these things are per-process in POSIX, none per-thread. Off hand it's hard to come up with the hairy MT scenarios in which the existing code would do insane things, but trust me, they're there. At any rate, all the uses being done via inline accessor functions now has got to be all good. I did a "make allyesconfig" build and caught the few random drivers and whatnot that referred to these fields. I was surprised to find how few references to ->tty there really were to fix up. I'm sure there will be a few more fixups needed in non-x86 code. The only actual testing of a running kernel with these patches I've done is on my normal minimal x86 config. Everything works fine as it did before as far as I can tell. One issue that may be of concern is the lack of any locking on multiple threads diddling these fields. I don't think it really matters, though there might be some obscure races that could produce inconsistent job control results. Nothing shattering, I'm sure; probably only something like a multi-threaded program calling setsid while its other threads do tty i/o, which never happens in reality. This is the same situation we get by using ->group_leader->foo without other synchronization, which seemed to be the trend and noone was worried about it.
2003-09-23	[PATCH] 32-bit dev_t fixups	Alexander Viro
	Argh. A couple of places where we needed ..._encode_dev() had been lost in reordering the patchset - the most notable being ctty number in /proc/<pid>/stat. Fix follows:
2003-09-22	[PATCH] prepare for 32-bit dev_t: tty usage	Alexander Viro
	tty->device had been used only in a couple of places and can be calculated by tty->index and tty->driver. Field removed, its users switched to static inline dev_t tty_devnum(tty).
2003-09-21	[PATCH] scheduler infrastructure	Andrew Morton
	From: Ingo Molnar <mingo@elte.hu> the attached scheduler patch (against test2-mm2) adds the scheduling infrastructure items discussed on lkml. I got good feedback - and while i dont expect it to solve all problems, it does solve a number of bad ones: - test_starve.c code from David Mosberger - thud.c making the system unusuable due to unfairness - fair/accurate sleep average based on a finegrained clock - audio skipping way too easily other changes in sched-test2-mm2-A3: - ia64 sched_clock() code, from David Mosberger. - migration thread startup without relying on implicit scheduling behavior. While the current 2.6 code is correct (due to the cpu-up code adding CPUs one by one), but it's also fragile - and this code cannot be carried over into the 2.4 backports. So adding this method would clean up the startup and would make it easier to have 2.4 backports. and here's the original changelog for the scheduler changes: - cycle accuracy (nanosec resolution) timekeeping within the scheduler. This fixes a number of audio artifacts (skipping) i've reproduced. I dont think we can get away without going cycle accuracy - reading the cycle counter adds some overhead, but it's acceptable. The first nanosec-accuracy patch was done by Mike Galbraith - this patch is different but similar in nature. I went further in also changing the sleep_avg to be of nanosec resolution. - more finegrained timeslices: there's now a timeslice 'sub unit' of 50 usecs (TIMESLICE_GRANULARITY) - CPU hogs on the same priority level will roundrobin with this unit. This change is intended to make gaming latencies shorter. - include scheduling latency in sleep bonus calculation. This change extends the sleep-average calculation to the period of time a task spends on the runqueue but doesnt get scheduled yet, right after wakeup. Note that tasks that were preempted (ie. not woken up) and are still on the runqueue do not get this benefit. This change closes one of the last hole in the dynamic priority estimation, it should result in interactive tasks getting more priority under heavy load. This change also fixes the test-starve.c testcase from David Mosberger. The TSC-based scheduler clock is disabled on ia32 NUMA platforms. (ie. platforms that have unsynched TSC for sure.) Those platforms should provide the proper code to rely on the TSC in a global way. (no such infrastructure exists at the moment - the monotonic TSC-based clock doesnt deal with TSC offsets either, as far as i can tell.)
2003-09-21	[PATCH] Fix setpgid and threads	Andrew Morton
	From: Jeremy Fitzhardinge <jeremy@goop.org> I'm resending my patch to fix this problem. To recap: every task_struct has its own copy of the thread group's pgrp. Only the thread group leader is allowed to change the tgrp's pgrp, but it only updates its own copy of pgrp, while all the other threads in the tgrp use the old value they inherited on creation. This patch simply updates all the other thread's pgrp when the tgrp leader changes pgrp. Ulrich has already expressed reservations about this patch since it is (1) incomplete (it doesn't cover the case of other ids which have similar problems), (2) racy (it doesn't synchronize with other threads looking at the task pgrp, so they could see an inconsistent view) and (3) slow (it takes linear time with respect to the number of threads in the tgrp). My reaction is that (1) it fixes the actual bug I'm encountering in a real program. (2) doesn't really matter for pgrp, since it is mostly an issue with respect to the terminal job-control code (which is even more broken without this patch. Regarding (3), I think there are very few programs which have a large number of threads which change process group id on a regular basis (a heavily multi-threaded job-control shell?). Ulrich also said he has a (proposed?) much better fix, which I've been looking forward to. I'm submitting this patch as a stop-gap fix for a real bug, and perhaps to prompt the improved patch. An alternative fix, at least for pgrp, is to change all references to ->pgrp to group_leader->pgrp. This may be sufficient on its own, but it would be a reasonably intrusive patch (I count 95 instances in 32 files in the 2.6.0-test3-mm3 tree).
2003-08-20	[PATCH] fix /proc mm_struct refcounting bug	Andrew Morton
	From: Suparna Bhattacharya <suparna@in.ibm.com> The /proc code's bare atomic_inc(&mm->count) is racy against __exit_mm()'s mmput() on another CPU: it calls mmput() outside task_lock(tsk), and task_lock() isn't appropriate locking anyway. So what happens is: CPU0 CPU1 mmput() ->atomic_dec_and_lock(mm->mm_users) atomic_inc(mm->mm_users) ->list_del(mm->mmlist) mmput() ->atomic_dec_and_lock(mm->mm_users) ->list_del(mm->mmlist) And the double list_del() of course goes splat. So we use mmlist_lock to synchronise these steps. The patch implements a new mmgrab() routine which increments mm_users only if the mm isn't already going away. Changes get_task_mm() and proc_pid_stat() to call mmgrab() instead of a direct atomic_inc(&mm->mm_users). Hugh, there's some cruft in swapoff which looks like it should be using mmgrab()...
2003-04-23	[PATCH] tty cleanups (11/12)	Alexander Viro
	tty->device switched to dev_t There are very few uses of tty->device left by now; most of them actually want dev_t (process accounting, proc/<pid>/stat, several ioctls, slip.c logics, etc.) and the rest will go away shortly.
2003-02-24	[PATCH] make jiffies wrap 5 min after boot	Andrew Morton
	From Tim Schmielau <tim@physik3.uni-rostock.de> Force jiffies to start out at five-minutes-before-wrap. To find jiffy-wrapping bugs.
2003-02-16	It's usually considered stupid to lock the same spinlock twice in	Linus Torvalds
	close succession. However, for this once we'll just call it "inspired". But let's decide pair the lock with an unlock anyway, even if it is boring and "square".
2003-02-16	Do proper signal locking for the old-style /proc/stat too.	Linus Torvalds

2003-02-16	Clean up and fix locking around signal rendering	Linus Torvalds

2003-02-10	Report shared pending signals in /proc/<pid>/status	Linus Torvalds
	Patch from Roland McGrath.
2003-02-08	[PATCH] Fix Alt-SysRQ-T status, and comment	Russell King
	Fix wrong order of process status. It's #define TASK_RUNNING 0 #define TASK_INTERRUPTIBLE 1 #define TASK_UNINTERRUPTIBLE 2 #define TASK_STOPPED 4 #define TASK_ZOMBIE 8 #define TASK_DEAD 16 but SysRQ printout routines switch stopped and zombie around. So, for one more time, here's another mailing of the same patch to fix this brokenness. In addition, fix the wrong comment in fs/proc/array.c
2003-02-06	Split up "struct signal_struct" into "signal" and "sighand" parts.	Linus Torvalds
	This is required to get make the old LinuxThread semantics work together with the fixed-for-POSIX full signal sharing. A traditional CLONE_SIGHAND thread (LinuxThread) will not see any other shared signal state, while a new-style CLONE_THREAD thread will share all of it. This way the two methods don't confuse each other.
2003-02-03	[PATCH] use 64 bit jiffies: 64-bit process start time	Tim Schmielau
	This prevents reporting processes as having started in the future, after 32 bit jiffies wrap.
2003-01-14	[PATCH] more procfs bits for !CONFIG_MMU	Christoph Hellwig
	New version with all ifdef CONFIG_MMU gone from procfs. Instead, the conditional code is in either task_mmu.c/task_nommu.c, and the Makefile will select the proper file for inclusion depending on CONFIG_MMU.
2003-01-08	[PATCH] tracer pid.	Dave Jones
	Can't remember where this came from, but its been around for quite a while. Prints the parent (tracer) pid if its being traced.
2002-12-29	[PATCH] nommu systems can't support /proc/<pid>/maps	Christoph Hellwig
	So stub it out, similar to /proc/<pid>/wchan for !CONFIG_KALLSYMS
2002-11-25	[PATCH] shrink task_struct by removing per_cpu utime and stime	Andrew Morton
	Patch from Bill Irwin. It has the potential to break userspace monitoring tools a little bit, and I'm a rater uncertain about how useful the per-process per-cpu accounting is. Bill sent this out as an RFC on July 29: "These statistics severely bloat the task_struct and nothing in userspace can rely on them as they're conditional on CONFIG_SMP. If anyone is using them (or just wants them around), please speak up." And nobody spoke up. If we apply this, the contents of /proc/783/cpu will go from cpu 1 1 cpu0 0 0 cpu1 0 0 cpu2 1 1 cpu3 0 0 to cpu 1 1 And we shall save 256 bytes from the ia32 task_struct. On my SMP build with NR_CPUS=32: Without this patch, sizeof(task_struct) is 1824, slab uses a 1-order allocation and we are getting 2 task_structs per page. With this patch, sizeof(task_struct) is 1568, slab uses a 2-order allocation and we are getting 2.5 task_structs per page. So it seems worthwhile. (Maybe this highlights a shortcoming in slab. For the 1824-byte case it could have used a 0-order allocation)
2002-11-25	[PATCH] proc is in the wrong order	Alan Cox

2002-11-20	[PATCH] kill i_dev	Andries E. Brouwer
	The i_dev field is deleted and the few uses are replaced by i_sb->s_dev. There is a single side effect: a stat on a socket now sees a nonzero st_dev. There is nothing against that - FreeBSD has a nonzero value as well - but there is at least one utility (fuser) that will need an update.
2002-10-30	[PATCH] Move hugetlb declarations into their own header	Andrew Morton
	From Bill Irwin Move hugetlb and hugetlbfs declarations into a dedicated header file. Hugetlb's big #ifdeffed block in mm.h got a lot bigger with hugetlbfs. This patch basically attempts to remove the noise from mm.h by simply rearranging it into a new header, and fixing all users of hugetlb.
2002-10-07	[PATCH] indentation fixes.	Dave Jones
	As per CodingStyle