| Age | Commit message (Collapse) | Author |
|
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.
In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.
The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an option to make secondary IP addresses get promoted
when primary IP addresses are removed from the device.
It defaults to off to preserve existing behavior.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Include chunk and skb sizes in sendbuffer accounting.
- 2 policies are supported. 0: per socket accouting, 1: per association
accounting
DaveM: I've made the default per-socket.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
This first patch of the series introduces a sysctl (default off) that
enables/disables the randomisation feature globally. Since randomisation may
make it harder to debug really tricky situations (reproducability goes down),
the sysadmin needs a way to disable it globally.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
The BIC TCP cwnd problem as identified by Yee-Ting Li and Doug Leith
is that the computation is recalc_ssthresh is incorrect and
BICTCP_1_OVER_BETA/2 should be BICTCP_1_OVER_BETA*2.
My fix is to implement the code from BIC TCP 1.1 which uses a sysctl
to set the beta. There are a few variable name changes from the 1.1
code, and made the scaling factor a #define instead of hardcoded.
I validated this using netem and kprobes, for more details see
http://developer.osdl.org/shemminger/bic-beta-patch.pdf
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
into kernel.bkbits.net:/home/davem/net-2.6
|
|
The existing seconds based gc_min_interval is barely
usable.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Rename various fields related to the lower-zone protection code to sync
up with 2.4.
- Remove the automatic determination of the values of the per-zone
protection levels from a single tunable. Replace this with a simple
per-zone sysctl.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch exports to userspace the boot loader ID which has been exported
by (b)zImage boot loaders since boot protocol version 2.
It is needed so that update tools that update kernels from vendors know which
bootloader file they need to update; eg right now those tools do all kinds of
hairy heuristics to find out if it's grub or lilo or .. that installed the
kernel. Those heuristics are fragile in the presence of more than one
bootloader (which isn't that uncommon in OS upgrade situations).
Tested on i386 and x86-64; as far as I know those are the only
architectures which use zImage/bzImage format.
Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Allow disabling of quota messages to console (they can disturb other
output).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds "swap_token_timeout" parameter in /proc/sys/vm. The
parameter means expired time of token. Unit of the value is HZ, and the
default value is the same as current SWAP_TOKEN_TIMEOUT (i.e. HZ * 300).
Signed-off-by: Hideo Aoki <aoki@sdl.hitachi.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This allows control over what percentage of
the congestion window can be consumed by a
single TSO frame.
The setting of this parameter is a choice
between burstiness and building larger TSO
frames.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Apparently a lot of scripts use a construct like
cat /proc/net/ip_conntrack | wc -l
which has a negative impact on system performance due to all the locking
required.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create /proc/sys/vm/legacy_va_layout. If this is non-zero, the kernel
will use the old mmap layout for all tasks. it presently defaults to zero
(the new layout).
From: William Lee Irwin III <wli@holomorphy.com>
hugetlb CONFIG_SYSCTL=n fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into ppc970.osdl.org:/home/torvalds/v2.6/linux
|
|
I made a patch for debugging with the help of NMI trigger switch.
When kernel hangs severely, keyboard operation(e.g.Ctrl-Alt-Del)
doesn't work properly. This patch enables debugging information
to be displayed on console in this case.
I think this feature is necessary as standard functionality.
Please feel free to use this patch and let me know if you have
any comments.
Background:
When a trouble occurs in kernel, we usually begin to investigate
with following information:
- panic >> panic message.
- oops >> CPU registers and stack trace.
- hang >> **NONE** no standard method established.
How it works:
Most IA32 servers have a NMI switch that fires NMI interrupt up.
The NMI interrupt can interrupt even if kernel is serious state,
for example deadlock under the interrupt disabled.
When the NMI switch is pressed after this feature is activated,
CPU registers and stack trace are displayed on console and then
panic occurs.
This feature is activated or deactivated with sysctl.
On IA32 architecture, only the following are defined as reason
of NMI interrupt:
- memory parity error
- I/O check error
The reason code of NMI switch is not defined, so this patch assumes
that all undefined NMI interrupts are fired by MNI switch.
However, oprofile and NMI watchdog also use undefined NMI interrupt.
Therefore this feature cannot be used at the same time with oprofile
and NMI watchdog. This feature hands NMI interrupt over to oprofile
and NMI watchdog. So, when they have been activated, this feature
doesn't work even if it is activated.
Supported architecture:
IA32
Setup:
Set up the system control parameter as follows:
# sysctl -w kernel.unknown_nmi_panic=1
kernel.unknown_nmi_panic = 1
If the NMI switch is pressed, CPU registers and stack trace will
be displayed on console and then panic occurs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
Nobody ever fixed the big FIXME in sysctl - but we really need
to pass around the proper "loff_t *" to all the sysctl functions
if we want them to be well-behaved wrt the file pointer position.
This is all preparation for making direct f_pos accesses go
away.
|
|
Incremental to all other patches so far, there is also the new SCTP
conntrack helper by Kiran Kumar. Please apply for 2.6.9 ++, thanks.
Signed-off-by: Kiran Kumar Immidi <immidi_kiran@yahoo.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@redhat.com>
|
|
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@redhat.com>
|
|
Some people want the dentry and inode caches shrink harder, others want them
shrunk more reluctantly.
The patch adds /proc/sys/vm/vfs_cache_pressure, which tunes the vfs cache
versus pagecache scanning pressure.
- at vfs_cache_pressure=0 we don't shrink dcache and icache at all.
- at vfs_cache_pressure=100 there is no change in behaviour.
- at vfs_cache_pressure > 100 we reclaim dentries and inodes harder.
The number of megabytes of slab left after a slocate.cron on my 256MB test
box:
vfs_cache_pressure=100000 33480
vfs_cache_pressure=10000 61996
vfs_cache_pressure=1000 104056
vfs_cache_pressure=200 166340
vfs_cache_pressure=100 190200
vfs_cache_pressure=50 206168
Of course, this just left more directory and inode pagecache behind instead of
vfs cache. Interestingly, on this machine the entire slocate run fits into
pagecache, but not into VFS caches.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
1) Add sysctl to control rcvbuf moderation, off for now.
2) Set default winscale to zero.
|
|
|
|
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
"Seth, Rohit" <rohit.seth@intel.com>
This patch addresses the longstanding problem wherein Oracle needs
CAP_IPC_LOCK to allocate SHM_HUGETLB shm memory, but people don't want to run
Oracle as root, and capabilties are busted.
Various ideas with rlimits didn't work out, mainly because these objects live
beyond the lifetime of the user processes which establish them.
What we do is to create root-writeable /proc/sys/vm/hugetlb_shm_group which
specifies a single group ID. Users who belong to that group may allocate
hugepages for SHM_HUGETLB shm segments.
So the sysadmin will greate a new group, say `hugepageusers', will add the
oracle user to that group and will write that group's ID into
/proc/sys/vm/hugetlb_shm_group.
|
|
This is a version of Binary Increase Control (BIC) TCP
developed by NCSU. It is yet another TCP congestion control
algorithm for handling big fat pipes. For normal size congestion
windows it behaves the same as existing TCP Reno, but when window
is large it uses additive increase to ensure fairness and when
window is small it uses binary search increase.
For more details see the BIC TCP web page
http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/
The original code was for web100 (2.4); this version is pretty
much the same but targeted for 2.6 with less sysctl parameters
and more constants.
I don't have a real high speed long haul network to test, but
when running over 1G links with delays, the performance is more stable
(ie tests are repeatable) and as fast as existing Reno.
|
|
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch add a system control that allows to switch off the jiffies timer
interrupts while a cpu sleeps in idle. This is useful for a system running
with virtual cpus under z/VM.
|
|
A forward port of an old 2.3.x kernel hack done
years ago. I (DaveM) did the first rough port,
Stephen Hemminger actually cleaned it up and
made it usable.
|
|
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
|
|
into nuts.davemloft.net:/disk1/BK/sparc-2.6
|
|
From: Bart Samwel <bart@samwel.tk>
Adds /proc/sys/vm/laptop-mode: a special knob which says "this is a laptop".
In this mode the kernel will attempt to avoid spinning disks up.
Algorithm: the idea is to hold dirty data in memory for a long time, but to
flush everything which has been accumulated if the disk happens to spin up
for other reasons.
- Whenever a disk request completes (read or write), schedule a timer a few
seconds hence. If the timer was already pending, reset it to a few seconds
hence.
- When the timer expires, write back the whole world. We use
sync_filesystems() for this because it will force ext3 journal commits as
well.
- In balance_dirty_pages(), kick off background writeback when we hit the
high threshold (dirty_ratio), not when we hit the low threshold. This has
the effect of causing "lumpy" writeback which is something I spent a year
fixing, but in laptop mode, it is desirable.
- In try_to_free_pages(), only kick pdflush if the VM is getting into
distress: we want to keep scanning for clean pages, deferring writeback.
- In page reclaim, avoid writing back the odd random dirty page off the
LRU: only start I/O if the scanning is working harder.
The effect is to perform a sync() a few seconds after all I/O has ceased.
The value which was written into /proc/sys/vm/laptop-mode determines, in
seconds, the delay between the final I/O and the flush.
Additionally, the patch adds tools which help answer the question "why the
heck does my disk spin up all the time?". The user may set
/proc/sys/vm/block_dump to a non-zero value and the kernel will print out
information which will identify the process which is performing disk reads or
which is dirtying pagecache.
The user should probably disable syslogd before setting block-dump.
|
|
into nuts.davemloft.net:/disk1/BK/sparc-2.6
|
|
|
|
From: David Mosberger <davidm@napali.hpl.hp.com>
Below is a warmed up version of a patch originally done by Werner Almesberger
(see http://tinyurl.com/25zra) to replace the MAX_MAP_COUNT limit with a
sysctl variable. I thought this had gone into the tree a long time ago but
alas it has not and as luck would have it, the hard limit bit someone today
once again with a large app on a large machine.
Here is a small test app:
|
|
|
|
|
|
From: "Randy.Dunlap" <rddunlap@osdl.org>
Add syscalls.h, which contains prototypes for the kernel's system calls.
Replace open-coded declarations all over the place. This patch found a
couple of prior bugs. It appears to be more important with -mregparm=3 as we
discover more asmlinkage mismatches.
Some syscalls have arch-dependent arguments, so their prototypes are in the
arch-specific unistd.h. Maybe it should have been asm/syscalls.h, but there
were already arch-specific syscall prototypes in asm/unistd.h...
Tested on x86, ia64, x86_64, ppc64, s390 and sparc64. May cause
trivial-to-fix build breakage on other architectures.
|
|
From: Tim Hockin <thockin@sun.com>
Attached is a simple patch to expose NGROUPS_MAX via sysctl. Nothing
fancy, just a read-only variable. glibc can use this to sysconf() the
value properly, so apps will stop relying on NGROUPS_MAX as a real
constant.
|
|
From: "H. Peter Anvin" <hpa@transmeta.com>
Remove the limit of 2048 pty's - allocate them on demand up to the 12:20
dev_t limit: a million.
|
|
- Update listhelp.h to benefit from prefetching
- More efficient selective_cleanup() impl. in conntrack
- Export number of conntrack buckets via r/o sysctl.
|
|
From: Janet Morgan <janetmor@us.ibm.com>
It looks like aio_nr and aio_max_nr were intended to be sysctl parameters.
|
|
|
|
|