| Age | Commit message (Collapse) | Author |
|
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
|
|
NET/ROM is lacking a connection reset like TCP's RST flag which at times
may result in a connecting having to slowly timing out instead of just being
reset. An earlier attempt to reset the connection by sending a
NR_CONNACK | NR_CHOKE_FLAG transport was inacceptable as it did result in
crashes of BPQ systems. An alternative approach of introducing a new
transport type 7 (NR_RESET) has be implemented several years ago in
Paula Jayne Dowie G8PZT's Xrouter.
Implement NR_RESET for Linux's NET/ROM but like any messing with the state
engine consider this experimental for now and thus control it by a sysctl
(net.netrom.reset) which for the time being defaults to off.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IPMI power control function proc_write_chassctrl was badly written, it
directly used userspace pointers, it assumed that strings were NULL
terminated, and it used the evil sscanf function. This converts over to
using the sysctl interface for this data and changes the semantics to be a
little more logical.
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Split spin lock and r/w lock implementation into a single try which is done
inline and an out of line function that repeatedly tries to get the lock
before doing the cpu_relax(). Add a system control to set the number of
retries before a cpu is yielded.
The reason for the spin lock retry is that the diagnose 0x44 that is used to
give up the virtual cpu is quite expensive. For spin locks that are held only
for a short period of time the costs of the diagnoses outweights the savings
for spin locks that are held for a longer timer. The default retry count is
1000.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs". Also some related cleanup.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Separate out the two uses of netdev_max_backlog. One controls the
upper bound on packets processed per softirq, the new name for this is
netdev_budget; the other controls the limit on packets queued via
netif_rx.
Increase the max_backlog default to account for faster processors.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules. The legacy "new RENO" algorithm is used as a starting
point and fallback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new `suid_dumpable' sysctl:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.
(akpm:
> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?
No problem to me.
> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?
Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)
> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.
Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.
)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.
In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.
The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an option to make secondary IP addresses get promoted
when primary IP addresses are removed from the device.
It defaults to off to preserve existing behavior.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Include chunk and skb sizes in sendbuffer accounting.
- 2 policies are supported. 0: per socket accouting, 1: per association
accounting
DaveM: I've made the default per-socket.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
This first patch of the series introduces a sysctl (default off) that
enables/disables the randomisation feature globally. Since randomisation may
make it harder to debug really tricky situations (reproducability goes down),
the sysadmin needs a way to disable it globally.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
The BIC TCP cwnd problem as identified by Yee-Ting Li and Doug Leith
is that the computation is recalc_ssthresh is incorrect and
BICTCP_1_OVER_BETA/2 should be BICTCP_1_OVER_BETA*2.
My fix is to implement the code from BIC TCP 1.1 which uses a sysctl
to set the beta. There are a few variable name changes from the 1.1
code, and made the scaling factor a #define instead of hardcoded.
I validated this using netem and kprobes, for more details see
http://developer.osdl.org/shemminger/bic-beta-patch.pdf
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
into kernel.bkbits.net:/home/davem/net-2.6
|
|
The existing seconds based gc_min_interval is barely
usable.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Rename various fields related to the lower-zone protection code to sync
up with 2.4.
- Remove the automatic determination of the values of the per-zone
protection levels from a single tunable. Replace this with a simple
per-zone sysctl.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch exports to userspace the boot loader ID which has been exported
by (b)zImage boot loaders since boot protocol version 2.
It is needed so that update tools that update kernels from vendors know which
bootloader file they need to update; eg right now those tools do all kinds of
hairy heuristics to find out if it's grub or lilo or .. that installed the
kernel. Those heuristics are fragile in the presence of more than one
bootloader (which isn't that uncommon in OS upgrade situations).
Tested on i386 and x86-64; as far as I know those are the only
architectures which use zImage/bzImage format.
Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Allow disabling of quota messages to console (they can disturb other
output).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds "swap_token_timeout" parameter in /proc/sys/vm. The
parameter means expired time of token. Unit of the value is HZ, and the
default value is the same as current SWAP_TOKEN_TIMEOUT (i.e. HZ * 300).
Signed-off-by: Hideo Aoki <aoki@sdl.hitachi.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This allows control over what percentage of
the congestion window can be consumed by a
single TSO frame.
The setting of this parameter is a choice
between burstiness and building larger TSO
frames.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Apparently a lot of scripts use a construct like
cat /proc/net/ip_conntrack | wc -l
which has a negative impact on system performance due to all the locking
required.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create /proc/sys/vm/legacy_va_layout. If this is non-zero, the kernel
will use the old mmap layout for all tasks. it presently defaults to zero
(the new layout).
From: William Lee Irwin III <wli@holomorphy.com>
hugetlb CONFIG_SYSCTL=n fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into ppc970.osdl.org:/home/torvalds/v2.6/linux
|
|
I made a patch for debugging with the help of NMI trigger switch.
When kernel hangs severely, keyboard operation(e.g.Ctrl-Alt-Del)
doesn't work properly. This patch enables debugging information
to be displayed on console in this case.
I think this feature is necessary as standard functionality.
Please feel free to use this patch and let me know if you have
any comments.
Background:
When a trouble occurs in kernel, we usually begin to investigate
with following information:
- panic >> panic message.
- oops >> CPU registers and stack trace.
- hang >> **NONE** no standard method established.
How it works:
Most IA32 servers have a NMI switch that fires NMI interrupt up.
The NMI interrupt can interrupt even if kernel is serious state,
for example deadlock under the interrupt disabled.
When the NMI switch is pressed after this feature is activated,
CPU registers and stack trace are displayed on console and then
panic occurs.
This feature is activated or deactivated with sysctl.
On IA32 architecture, only the following are defined as reason
of NMI interrupt:
- memory parity error
- I/O check error
The reason code of NMI switch is not defined, so this patch assumes
that all undefined NMI interrupts are fired by MNI switch.
However, oprofile and NMI watchdog also use undefined NMI interrupt.
Therefore this feature cannot be used at the same time with oprofile
and NMI watchdog. This feature hands NMI interrupt over to oprofile
and NMI watchdog. So, when they have been activated, this feature
doesn't work even if it is activated.
Supported architecture:
IA32
Setup:
Set up the system control parameter as follows:
# sysctl -w kernel.unknown_nmi_panic=1
kernel.unknown_nmi_panic = 1
If the NMI switch is pressed, CPU registers and stack trace will
be displayed on console and then panic occurs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
Nobody ever fixed the big FIXME in sysctl - but we really need
to pass around the proper "loff_t *" to all the sysctl functions
if we want them to be well-behaved wrt the file pointer position.
This is all preparation for making direct f_pos accesses go
away.
|
|
Incremental to all other patches so far, there is also the new SCTP
conntrack helper by Kiran Kumar. Please apply for 2.6.9 ++, thanks.
Signed-off-by: Kiran Kumar Immidi <immidi_kiran@yahoo.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@redhat.com>
|
|
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@redhat.com>
|
|
Some people want the dentry and inode caches shrink harder, others want them
shrunk more reluctantly.
The patch adds /proc/sys/vm/vfs_cache_pressure, which tunes the vfs cache
versus pagecache scanning pressure.
- at vfs_cache_pressure=0 we don't shrink dcache and icache at all.
- at vfs_cache_pressure=100 there is no change in behaviour.
- at vfs_cache_pressure > 100 we reclaim dentries and inodes harder.
The number of megabytes of slab left after a slocate.cron on my 256MB test
box:
vfs_cache_pressure=100000 33480
vfs_cache_pressure=10000 61996
vfs_cache_pressure=1000 104056
vfs_cache_pressure=200 166340
vfs_cache_pressure=100 190200
vfs_cache_pressure=50 206168
Of course, this just left more directory and inode pagecache behind instead of
vfs cache. Interestingly, on this machine the entire slocate run fits into
pagecache, but not into VFS caches.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
1) Add sysctl to control rcvbuf moderation, off for now.
2) Set default winscale to zero.
|
|
|
|
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
"Seth, Rohit" <rohit.seth@intel.com>
This patch addresses the longstanding problem wherein Oracle needs
CAP_IPC_LOCK to allocate SHM_HUGETLB shm memory, but people don't want to run
Oracle as root, and capabilties are busted.
Various ideas with rlimits didn't work out, mainly because these objects live
beyond the lifetime of the user processes which establish them.
What we do is to create root-writeable /proc/sys/vm/hugetlb_shm_group which
specifies a single group ID. Users who belong to that group may allocate
hugepages for SHM_HUGETLB shm segments.
So the sysadmin will greate a new group, say `hugepageusers', will add the
oracle user to that group and will write that group's ID into
/proc/sys/vm/hugetlb_shm_group.
|
|
This is a version of Binary Increase Control (BIC) TCP
developed by NCSU. It is yet another TCP congestion control
algorithm for handling big fat pipes. For normal size congestion
windows it behaves the same as existing TCP Reno, but when window
is large it uses additive increase to ensure fairness and when
window is small it uses binary search increase.
For more details see the BIC TCP web page
http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/
The original code was for web100 (2.4); this version is pretty
much the same but targeted for 2.6 with less sysctl parameters
and more constants.
I don't have a real high speed long haul network to test, but
when running over 1G links with delays, the performance is more stable
(ie tests are repeatable) and as fast as existing Reno.
|
|
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch add a system control that allows to switch off the jiffies timer
interrupts while a cpu sleeps in idle. This is useful for a system running
with virtual cpus under z/VM.
|
|
A forward port of an old 2.3.x kernel hack done
years ago. I (DaveM) did the first rough port,
Stephen Hemminger actually cleaned it up and
made it usable.
|
|
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
|
|
into nuts.davemloft.net:/disk1/BK/sparc-2.6
|
|
From: Bart Samwel <bart@samwel.tk>
Adds /proc/sys/vm/laptop-mode: a special knob which says "this is a laptop".
In this mode the kernel will attempt to avoid spinning disks up.
Algorithm: the idea is to hold dirty data in memory for a long time, but to
flush everything which has been accumulated if the disk happens to spin up
for other reasons.
- Whenever a disk request completes (read or write), schedule a timer a few
seconds hence. If the timer was already pending, reset it to a few seconds
hence.
- When the timer expires, write back the whole world. We use
sync_filesystems() for this because it will force ext3 journal commits as
well.
- In balance_dirty_pages(), kick off background writeback when we hit the
high threshold (dirty_ratio), not when we hit the low threshold. This has
the effect of causing "lumpy" writeback which is something I spent a year
fixing, but in laptop mode, it is desirable.
- In try_to_free_pages(), only kick pdflush if the VM is getting into
distress: we want to keep scanning for clean pages, deferring writeback.
- In page reclaim, avoid writing back the odd random dirty page off the
LRU: only start I/O if the scanning is working harder.
The effect is to perform a sync() a few seconds after all I/O has ceased.
The value which was written into /proc/sys/vm/laptop-mode determines, in
seconds, the delay between the final I/O and the flush.
Additionally, the patch adds tools which help answer the question "why the
heck does my disk spin up all the time?". The user may set
/proc/sys/vm/block_dump to a non-zero value and the kernel will print out
information which will identify the process which is performing disk reads or
which is dirtying pagecache.
The user should probably disable syslogd before setting block-dump.
|
|
into nuts.davemloft.net:/disk1/BK/sparc-2.6
|
|
|