| Age | Commit message (Collapse) | Author |
|
This is what a lot of the callers really wanted.
|
|
Single spin_unlock path cuts this down a little..
|
|
From: Manfred Spraul <manfred@colorfullife.com>
de_thread calls list_del(¤t->tasks), but current->tasks was never
added to the task list. The structure contains stale values from the parent.
switch_exec_pid() transforms a normal thread to a thread group leader.
Thread group leaders are included in the init_task.tasks linked list,
non-leaders are not in that list. The patch adds the new thread group
leader to the linked list, otherwise de_thread corrupts the task list.
|
|
Rather than assuming that all the things which copy_process() calls want to
return -ENOMEM, correctly propagate the return values.
This turns out to be a no-op at present.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
__module_get is theoretically allowed on module inside init, since we
already hold an implicit reference. Currently this BUG()s: make the
reference count explicit, which also simplifies delete path. Also cleans
up unload path, such that it only drops semaphore when it's actually
sleeping for rmmod --wait.
|
|
From: Zwane Mwaikambo <zwane@linuxpower.ca>
The proc interface has no way of telling wether there is an active cpufreq
driver or not. This means that if you don't have a cpufreq supported
processor, this will oops in various possible places.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>, David Mosberger
The patch below updates the other platforms with module_arch_cleanup().
Also, I added more debug output to kernel/module.c since I found it useful
to be able to see the final section layout.
|
|
From: george anzinger <george@mvista.com>
- Fix the sig_notify filtering code for the timer_create system call to
properly check for the signal number being small enought, but only if
SIG_NONE is not specified.
- Eliminate useless test of sig_notify.
|
|
Don't depend on undefined preprocessor symbols evaluating to zero.
|
|
into nuts.ninka.net:/home/davem/src/BK/net-2.5
|
|
handling.
This pointed out a bug in x86 sys_rt_sigreturn(), btw.
|
|
|
|
|
|
depending on undefined preprocessor symbols evaluating to zero.
Make panic.c use proper function prototypes.
|
|
into kroah.com:/home/greg/linux/BK/class-2.5
|
|
send_sig_info() has been broken since 2.5.60.
The function can be invoked from a the time interrupt (timer_interrpt ->
do_timer -> update_process_times -> -> update_one_process -> (
do_process_times, do_it_prof, do_it_virt ) -> -> send_sig ->
send_sig_info) but it uses spin_unlock_irq instead of the correct
spin_unlock_irqrestore.
This enables interrupts, and later scheduler_tick() locks runqueue
(without disabling interrupts). And if we are unlucky, a new interrupt
comes at this point. And if this interrupt tries to do wake_up() (like
RTC interrupt does), we will deadlock on runqueue lock :-(
The bug was introduced by signal-fixes-2.5.59-A4, which split the
original send_sig_info into two functions, and in one branch it started
using these unsafe spinlock variants (while the "group" variant uses
irqsave/restore correctly).
|
|
|
|
From: Christoph Hellwig <hch@lst.de>
partition_name() is a variant of __bdevname() that caches results and
returns a pointrer to kmalloc()ed data instead of printing into a buffer.
Due to it's caching it gets utterly confused when the name for a dev_t
changes (can happen easily now with device mapper and probably in the
future with dynamic dev_t users).
It's only used by the raid code and most calls are through a wrapper,
bdev_partition_name() which takes a struct block_device * that maybe be
NULL.
The patch below changes the bdev_partition_name() to call bdevname() if
possible and the other calls where we really have nothing more than a dev_t
to __bdevname.
Btw, it would be nice if someone who knows the md code a bit better than me
could remove bdev_partition_name() in favour of direct calls to bdevname()
where possible - that would also get rid of the returns pointer to string
on stack issue that this patch can't fix yet.
|
|
From: Christoph Hellwig <hch@lst.de>
Change sysrq sync/remount from a magic bdflush hook to proper pdflush
operations. The sync operation reuses most of the regular sys_sync path now
instead of implementing it's own superblock walking and (broken) local disk
detection, the remount implementation has been moved to super.c, cleaned up
and updated for the last two years locking changes. It also shares some code
with the regular remount path now.
|
|
|
|
From: Manfred Spraul <manfred@colorfullife.com>
Update some no-longer-true comments around nr_threads locking.
|
|
|
|
Introduces __module_get for places where we know we already hold
a reference and ignoring the fact that the module is being "rmmod --wait"ed
is simpler.
|
|
Restores .modinfo section, and uses it to store license and vermagic.
|
|
|
|
New helper - bdget_disk(gendisk, partition)
invalidate_device() replaced with invalidate_partition(disk, part)
|
|
New helper - open_by_devnum(). Opens block_device by device number;
for use in situations when we really have nothing better than dev_t (i.e.
had received it from stupid userland API).
|
|
A couple of helpers - simple_pin_fs() and simple_release_fs().
My fault - that code should've been put into libfs.c from the very
beginning. As it is, it got copied all over the place (binfmt_misc,
capifs, usbfs, usbdevfs, rpc_pipefs).
Taken to libfs.c and cleaned up.
|
|
New libfs.c helper - simple_fill_super(). Abstracted from
nfsd/nfsctl.c, couple of filesystems converted to it (nfsctl, binfmt_misc).
Function takes an array of triples (name, file_operations, mode),
superblock and value for its ->s_magic. It acts as fill_super() - populates
superblock or fails. We get a ramfs-style flat tree - root directory and
a bunch of files in it.
That animal allows to put together a simple filesystem without
touching any directory-related stuff - now it's as easy as implementing
file_operations for files you want to have and telling what to call them.
|
|
Every 64-bit architecture changes the end of iomem_resources. Some more
gracefully than others. This patch does away with all that by making
it end at ~0UL by default.
|
|
* bogus calls of invalidate_buffers() gone from floppy_open()
* invalidate_buffers() killed.
* new helper - __invalidate_device(bdev, do_sync). invalidate_device()
is calling it.
* fixed races between floppy_open()/floppy_open and
floppy_open()/set_geometry():
a) floppy_open()/floppy_release() is done under a semaphore. That
closes the races between simultaneous open() on /dev/fd0foo and /dev/fd0bar.
b) pointer to struct block_device is kept as long as floppy is
opened (per-drive, non-NULL when number of openers is non-zero, does not
contribute to block_device refcount).
c) set_geometry() grabs the same semaphore and invalidates the
devices directly instead of messing with setting fake "it had changed"
and calling __check_disk_change().
* __check_disk_change() killed - no remaining callers
* full_check_disk_change() killed - ditto.
|
|
tty->device switched to dev_t
There are very few uses of tty->device left by now; most of
them actually want dev_t (process accounting, proc/<pid>/stat, several
ioctls, slip.c logics, etc.) and the rest will go away shortly.
|
|
Instead of copying tty_driver into tty_struct we put a reference
in there. tty->driver turned into a pointer, users updated. Large, but
trivial
|
|
Here is a trivial fix for task_prio() in the case MAX_RT_PRIO !=
MAX_USER_RT_PRIO. In this case, all priorities are skewed by
(MAX_RT_PRIO - MAX_USER_RT_PRIO).
The fix is to subtract the full MAX_RT_PRIO value from p->prio, not just
MAX_USER_RT_PRIO. This makes sense, as the full priority range is
unrelated to the maximum user value. Only the real maximum RT value
matters.
This has been in Andrew's tree for awhile, with no issue. Also, Ingo
acked it.
|
|
From: "Martin J. Bligh" <mbligh@aracnet.com>
I'd forgotten that I'd set this to only fire every 20s in the past, because
it would rebalance too agressively. That seems to be fixed now, so we should
turn it back on.
|
|
From: george anzinger <george@mvista.com>
In the current system (2.5.67) time_spec to jiffies, time_val to
jiffies and the converse (jiffies to time_val and jiffies to
time_spec) all use 1/HZ as the measure of a jiffie. Because of the
inability of the PIT to actually generate an accurate 1/HZ interrupt,
the wall clock is updated with a more accurate value (999848
nanoseconds per jiffie for HZ = 1000). This causes a 1/HZ
interpretation of jiffies based timing to run faster than the wall
clock, thus causing sleeps and timers to expire short of the requested
time. Try, for example:
time sleep 60
This patch changes the conversion routines to use the same value as
the wall clock update code to do the conversions.
The actual math is almost all done at compile time. The run time
conversions require little if any more execution time.
This patch must be applied after the patch I posted earlier today
which fixed the CLOCK_MONOTONIC resolution issue.
|
|
The POSIX CLOCK_MONOTONIC currently has only 1/HZ resolution. Further, it is
tied to jiffies (i.e. is a restatment of jiffies) rather than "xtime" or the
gettimeofday() clock.
This patch changes CLOCK_MONOTONIC to be a restatment of gettimeofday() plus
an offset to remove any clock setting activity from CLOCK_MONOTONIC. An
offset is kept that represents the difference between CLOCK_MONOTONIC and
gettimeofday(). This offset is updated when ever the gettimeofday() clock is
set to back the clock setting change out of CLOCK_MONOTONIC (which by the
standard, can not be set).
With this change CLOCK_REALTIME (a direct restatement of gettimeofday()),
CLOCK_MONOTONIC and gettimeofday() will all tick at the same time and with
the same rate. And all will be affected by NTP adjustments (save those which
actually set the time).
|
|
If copy_namespace() returns -EPERM, copy_process() will
return a confusing -ENOMEM. Fix it thus.
|
|
This BUG_ON is triggering via ppp's line discipline flushing, due to
brokenness in tty_io.c.
We need to fix tty. Meanwhile, let's not gratuitously nuke people's boxes.
|
|
|
|
Fix for trivial typo. Without it, you can't insert anything on top of
agpgart.ko because the agp_register_driver() will erroneously pick up
the symbol version from agp_backend_acquire().
|
|
Clean up "pendcount" locking (or rather - lack there-of) by making it a
per-timer thing and thus automatically protected by the timer lock.
Fix whitespace damage.
|
|
This one gets rid of sys32_{get,set}affinity in favor of a unified
compat implementation.
|
|
From: george anzinger <george@mvista.com>
The MAJOR problem was a hang in the kernel if a user tried to delete a
repeating timer that had a signal delivery pending. I was putting the
task in a loop waiting for that same task to pick up the signal. OUCH!
A minor issue relates to the need by the glibc folks, to specify a
particular thread to get the signal. I had this code in all along,
but somewhere in 2.5 the signal code was made POSIX compliant, i.e.
deliver to the first thread that doesn't have it masked out.
This now uses the code from the above mentioned clean up. Most
signals go to the group delivery signal code, however, those
specifying THREAD_ID (an extension to the POSIX standard) are sent to
the specified thread. That thread MUST be in the same thread group as
the thread that creates the timer.
|
|
The workqueue code currently has a notion of a per-cpu queue being "busy".
flush_scheduled_work()'s responsibility is to wait for a queue to be not busy.
Problem is, flush_scheduled_work() can easily hang up.
- The workqueue is deemed "busy" when there are pending delayed
(timer-based) works. But if someone repeatedly schedules new delayed work
in the callback, the queue will never fall idle, and flush_scheduled_work()
will not terminate.
- If someone reschedules work (not delayed work) in the work function, that
too will cause the queue to never go idle, and flush_scheduled_work() will
not terminate.
So what this patch does is:
- Create a new "cancel_delayed_work()" which will try to kill off any
timer-based delayed works.
- Change flush_scheduled_work() so that it is immune to people re-adding
work in the work callout handler.
We can do this by recognising that the caller does *not* want to wait
until the workqueue is "empty". The caller merely wants to wait until all
works which were pending at the time flush_scheduled_work() was called have
completed.
The patch uses a couple of sequence numbers for that.
So now, if someone wants to reliably remove delayed work they should do:
/*
* Make sure that my work-callback will no longer schedule new work
*/
my_driver_is_shutting_down = 1;
/*
* Kill off any pending delayed work
*/
cancel_delayed_work(&my_work);
/*
* OK, there will be no new works scheduled. But there may be one
* currently queued or in progress. So wait for that to complete.
*/
flush_scheduled_work();
The patch also changes the flush_workqueue() sleep to be uninterruptible.
We cannot legally bale out if a signal is delivered anyway.
|
|
s390 fixes:
- Initialize timing related variables first and then enable the timer interrupt.
- Normalize nano seconds to micro seconds in do_gettimeofday.
- Add types for __kernel_timer_t and __kernel_clockid_t.
- Fix ugly bug in switch_to: set prev to the return value of resume, otherwise
prev still contains the previous process at the time resume was called and
not the previous process at the time resume returned. They differ...
- Add missing include to get the kernel compiled.
- Get a closer match with the i386 termios.h file.
- Cope with INITIAL_JIFFIES.
- Define cpu_relax to do a cpu yield on VM and LPAR.
- Don't reenable interrupts in program check handler.
- Add pte_file definitions.
- Fix PT_IEEE_IP special case in ptrace.
- Use compare and swap to release the lock in _raw_spin_unlock.
- Introduce invoke_softirq to switch to async. interrupt stack.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Introduce _sinittext and _einittext (cf. _stext and _etext), so kallsyms
includes __init functions.
TODO: Use huffman name compression and 16-bit offsets (see IDE
oopser patch)
|
|
Several places in ext2 and ext3 are using filesystem-wide counters which use
global locking. Mainly for the orlov allocator's heuristics.
To solve the contention which this causes we can trade off accuracy against
speed.
This patch introduces a "percpu_counter" library type in which the counts are
per-cpu and are periodically spilled into a global counter. Readers only
read the global counter.
These objects are *large*. On a 32 CPU P4, they are 4 kbytes. On a 4 way
p3, 128 bytes.
|
|
Time to write a 2M file, one byte at a time:
Before:
1.09s user 4.92s system 99% cpu 6.014 total
0.74s user 5.28s system 99% cpu 6.023 total
1.03s user 4.97s system 100% cpu 5.991 total
After:
0.79s user 5.17s system 99% cpu 5.993 total
0.79s user 5.17s system 100% cpu 5.957 total
0.84s user 5.11s system 100% cpu 5.942 total
|
|
From: Hugh Dickins <hugh@veritas.com>
This patch removes the long deprecated flush_page_to_ram. We have
two different schemes for doing this cache flushing stuff, the old
flush_page_to_ram way and the not so old flush_dcache_page etc. way:
see DaveM's Documentation/cachetlb.txt. Keeping flush_page_to_ram
around is confusing, and makes it harder to get this done right.
All architectures are updated, but the only ones where it amounts
to more than deleting a line or two are m68k, mips, mips64 and v850.
I followed a prescription from DaveM (though not to the letter), that
those arches with non-nop flush_page_to_ram need to do what it did
in their clear_user_page and copy_user_page and flush_dcache_page.
Dave is consterned that, in the v850 nb85e case, this patch leaves its
flush_dcache_page as was, uses it in clear_user_page and copy_user_page,
instead of making them all flush icache as well. That may be wrong:
I'm just hesitant to add cruft blindly, changing a flush_dcache macro
to flush icache too; and naively hope that the necessary flush_icache
calls are already in place. Miles, please let us know which way is
right for v850 nb85e - thanks.
|