diff options
| author | Roland McGrath <roland@redhat.com> | 2004-10-18 08:53:09 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2004-10-18 08:53:09 -0700 |
| commit | 31180071ee5e6cc6ff4d036d655c556f582f74e4 (patch) | |
| tree | e74fe9fae8692870f8023796a08d9951b5e8ce1d /include/linux | |
| parent | cc588ba911118a3461d06969fb9b195ccac5dd32 (diff) | |
[PATCH] make rlimit settings per-process instead of per-thread
POSIX specifies that the limit settings provided by getrlimit/setrlimit are
shared by the whole process, not specific to individual threads. This
patch changes the behavior of those calls to comply with POSIX.
I've moved the struct rlimit array from task_struct to signal_struct, as it
has the correct sharing properties. (This reduces kernel memory usage per
thread in multithreaded processes by around 100/200 bytes for 32/64
machines respectively.) I took a fairly minimal approach to the locking
issues with the newly shared struct rlimit array. It turns out that all
the code that is checking limits really just needs to look at one word at a
time (one rlim_cur field, usually). It's only the few places like
getrlimit itself (and fork), that require atomicity in accessing a whole
struct rlimit, so I just used a spin lock for them and no locking for most
of the checks. If it turns out that readers of struct rlimit need more
atomicity where they are now cheap, or less overhead where they are now
atomic (e.g. fork), then seqcount is certainly the right thing to use for
them instead of readers using the spin lock. Though it's in signal_struct,
I didn't use siglock since the access to rlimits never needs to disable
irqs and doesn't overlap with other siglock uses. Instead of adding
something new, I overloaded task_lock(task->group_leader) for this; it is
used for other things that are not likely to happen simultaneously with
limit tweaking. To me that seems preferable to adding a word, but it would
be trivial (and arguably cleaner) to add a separate lock for these users
(or e.g. just use seqlock, which adds two words but is optimal for readers).
Most of the changes here are just the trivial s/->rlim/->signal->rlim/.
I stumbled across what must be a long-standing bug, in reparent_to_init.
It does:
memcpy(current->rlim, init_task.rlim, sizeof(*(current->rlim)));
when surely it was intended to be:
memcpy(current->rlim, init_task.rlim, sizeof(current->rlim));
As rlim is an array, the * in the sizeof expression gets the size of the
first element, so this just changes the first limit (RLIMIT_CPU). This is
for kernel threads, where it's clear that resetting all the rlimits is what
you want. With that fixed, the setting of RLIMIT_FSIZE in nfsd is
superfluous since it will now already have been reset to RLIM_INFINITY.
The other subtlety is removing:
tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
in exit_notify, which was to avoid a race signalling during self-reaping
exit. As the limit is now shared, a dying thread should not change it for
others. Instead, I avoid that race by checking current->state before the
RLIMIT_CPU check. (Adding one new conditional in that path is now required
one way or another, since if not for this check there would also be a new
race with self-reaping exit later on clearing current->signal that would
have to be checked for.)
The one loose end left by this patch is with process accounting.
do_acct_process temporarily resets the RLIMIT_FSIZE limit while writing the
accounting record. I left this as it was, but it is now changing a limit
that might be shared by other threads still running. I left this in a
dubious state because it seems to me that processing accounting may already
be more generally a dubious state when it comes to NPTL threads. I would
think you would want one record per process, with aggregate data about all
threads that ever lived in it, not a separate record for each thread.
I don't use process accounting myself, but if anyone is interested in
testing it out I could provide a patch to change it this way.
One final note, this is not 100% to POSIX compliance in regards to rlimits.
POSIX specifies that RLIMIT_CPU refers to a whole process in aggregate, not
to each individual thread. I will provide patches later on to achieve that
change, assuming this patch goes in first.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'include/linux')
| -rw-r--r-- | include/linux/init_task.h | 2 | ||||
| -rw-r--r-- | include/linux/mm.h | 2 | ||||
| -rw-r--r-- | include/linux/sched.h | 13 | ||||
| -rw-r--r-- | include/linux/security.h | 2 |
4 files changed, 14 insertions, 5 deletions
diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 9937c8df8d7c..803d8efb1c4a 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -50,6 +50,7 @@ .list = LIST_HEAD_INIT(sig.shared_pending.list), \ .signal = {{0}}}, \ .posix_timers = LIST_HEAD_INIT(sig.posix_timers), \ + .rlim = INIT_RLIMITS, \ } #define INIT_SIGHAND(sighand) { \ @@ -96,7 +97,6 @@ extern struct group_info init_groups; .cap_inheritable = CAP_INIT_INH_SET, \ .cap_permitted = CAP_FULL_SET, \ .keep_capabilities = 0, \ - .rlim = INIT_RLIMITS, \ .user = INIT_USER, \ .comm = "swapper", \ .thread = INIT_THREAD, \ diff --git a/include/linux/mm.h b/include/linux/mm.h index 65ff5b5e896a..158ee1c501f0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -540,7 +540,7 @@ static inline int can_do_mlock(void) { if (capable(CAP_IPC_LOCK)) return 1; - if (current->rlim[RLIMIT_MEMLOCK].rlim_cur != 0) + if (current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur != 0) return 1; return 0; } diff --git a/include/linux/sched.h b/include/linux/sched.h index 8810b551082a..389a70ebb189 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -312,6 +312,17 @@ struct signal_struct { unsigned long utime, stime, cutime, cstime; unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw; unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt; + + /* + * We don't bother to synchronize most readers of this at all, + * because there is no reader checking a limit that actually needs + * to get both rlim_cur and rlim_max atomically, and either one + * alone is a single word that can safely be read normally. + * getrlimit/setrlimit use task_lock(current->group_leader) to + * protect this instead of the siglock, because they really + * have no need to disable irqs. + */ + struct rlimit rlim[RLIM_NLIMITS]; }; /* @@ -518,8 +529,6 @@ struct task_struct { kernel_cap_t cap_effective, cap_inheritable, cap_permitted; unsigned keep_capabilities:1; struct user_struct *user; -/* limits */ - struct rlimit rlim[RLIM_NLIMITS]; unsigned short used_math; char comm[16]; /* file system info */ diff --git a/include/linux/security.h b/include/linux/security.h index 983d7c2265bc..a1dee9a60587 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -582,7 +582,7 @@ struct swap_info_struct; * @task_setrlimit: * Check permission before setting the resource limits of the current * process for @resource to @new_rlim. The old resource limit values can - * be examined by dereferencing (current->rlim + resource). + * be examined by dereferencing (current->signal->rlim + resource). * @resource contains the resource whose limit is being set. * @new_rlim contains the new limits for @resource. * Return 0 if permission is granted. |
