| Age | Commit message (Collapse) | Author |
|
A security issue is emerging. Disallow Routing Header Type 0 by default
as we have been doing for IPv4.
This version already includes a fix for the original patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Allow sysadmin to disable all warnings about userland apps
making unaligned accesses by using:
# echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap
Rather than having to use prctl on a process by process basis.
Default behaivour leaves the warnings enabled.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Currently, acpi video options can only be set on kernel command line. That's
little inflexible; I'd like userland s2ram application that just works, and
modifying kernel command line according to whitelist is not fun. It is better
to just allow s2ram application to set video options just before suspend
(according to the whitelist).
This implements sysctl to allow setting suspend video options without reboot.
(akpm: Documentation updates for this new sysctl are pending..)
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Currently the zone_reclaim code has a fixed window of 30 seconds of off node
allocations should a local zone have no unused pagecache pages left. Reclaim
will be attempted again after this timeout period to avoid repeated useless
scans for memory. This is also useful to established sufficiently large off
node allocation chunks to relieve the local node.
It may be beneficial to adjust that time period for some special situations.
For example if memory use was exceeding node capacity one may want to give up
for longer periods of time. If memory spikes intermittendly then one may want
to shorten the time period to reduce the number of off node allocations.
This patch allows just that....
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
proc support for zone reclaim
This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
used to override the automatic determination of the zone reclaim made on
bootup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
As recently there has been lot of traffic on the right values for batch and
high water marks for per_cpu_pagelists. This patch makes these two
variables configurable through /proc interface.
A new tunable /proc/sys/vm/percpu_pagelist_fraction is added. This entry
controls the fraction of pages at most in each zone that are allocated for
each per cpu page list. The min value for this is 8. It means that we
don't allow more than 1/8th of pages in each zone to be allocated in any
single per_cpu_pagelist.
The batch value of each per cpu pagelist is also updated as a result. It
is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.
It won't drop dirty data, so the user should run `sync' first.
Caveats:
a) Holds inode_lock for exorbitant amounts of time.
b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.
This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Trivial manual merge fixup for usb_find_interface clashes.
|
|
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.
(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)
This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The patch (originally from Steve) simply adds memory buffer settings to
DECnet similar to those in TCP.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sysctl.h (again) useable from userspace
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Also introduces a sysctl option to configure the receive buffer
accounting policy to be either at socket or association level.
Default is all the associations on the same socket share the
receive buffer.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.
The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The existing connection tracking subsystem in netfilter can only
handle ipv4. There were basically two choices present to add
connection tracking support for ipv6. We could either duplicate all
of the ipv4 connection tracking code into an ipv6 counterpart, or (the
choice taken by these patches) we could design a generic layer that
could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
(TCP, UDP, etc.) connection tracking helper module to be written.
In fact nf_conntrack is capable of working with any layer 3
protocol.
The existing ipv4 specific conntrack code could also not deal
with the pecularities of doing connection tracking on ipv6,
which is also cured here. For example, these issues include:
1) ICMPv6 handling, which is used for neighbour discovery in
ipv6 thus some messages such as these should not participate
in connection tracking since effectively they are like ARP
messages
2) fragmentation must be handled differently in ipv6, because
the simplistic "defrag, connection track and NAT, refrag"
(which the existing ipv4 connection tracking does) approach simply
isn't feasible in ipv6
3) ipv6 extension header parsing must occur at the correct spots
before and after connection tracking decisions, and there were
no provisions for this in the existing connection tracking
design
4) ipv6 has no need for stateful NAT
The ipv4 specific conntrack layer is kept around, until all of
the ipv4 specific conntrack helpers are ported over to nf_conntrack
and it is feature complete. Once that occurs, the old conntrack
stuff will get placed into the feature-removal-schedule and we will
fully kill it off 6 months later.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
|
|
You could open the /proc/sys/net/ipv4/conf/<if>/<whatever> file, then
wait for interface to go away, try to grab as much memory as possible in
hope to hit the (kfreed) ctl_table. Then fill it with pointers to your
function. Then do read from file you've opened and if you are lucky,
you'll get it called as ->proc_handler() in kernel mode.
So this is at least an Oops and possibly more. It does depend on an
interface going away though, so less of a security risk than it would
otherwise be.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
|
|
NET/ROM is lacking a connection reset like TCP's RST flag which at times
may result in a connecting having to slowly timing out instead of just being
reset. An earlier attempt to reset the connection by sending a
NR_CONNACK | NR_CHOKE_FLAG transport was inacceptable as it did result in
crashes of BPQ systems. An alternative approach of introducing a new
transport type 7 (NR_RESET) has be implemented several years ago in
Paula Jayne Dowie G8PZT's Xrouter.
Implement NR_RESET for Linux's NET/ROM but like any messing with the state
engine consider this experimental for now and thus control it by a sysctl
(net.netrom.reset) which for the time being defaults to off.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IPMI power control function proc_write_chassctrl was badly written, it
directly used userspace pointers, it assumed that strings were NULL
terminated, and it used the evil sscanf function. This converts over to
using the sysctl interface for this data and changes the semantics to be a
little more logical.
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Split spin lock and r/w lock implementation into a single try which is done
inline and an out of line function that repeatedly tries to get the lock
before doing the cpu_relax(). Add a system control to set the number of
retries before a cpu is yielded.
The reason for the spin lock retry is that the diagnose 0x44 that is used to
give up the virtual cpu is quite expensive. For spin locks that are held only
for a short period of time the costs of the diagnoses outweights the savings
for spin locks that are held for a longer timer. The default retry count is
1000.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs". Also some related cleanup.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Separate out the two uses of netdev_max_backlog. One controls the
upper bound on packets processed per softirq, the new name for this is
netdev_budget; the other controls the limit on packets queued via
netif_rx.
Increase the max_backlog default to account for faster processors.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules. The legacy "new RENO" algorithm is used as a starting
point and fallback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new `suid_dumpable' sysctl:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.
(akpm:
> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?
No problem to me.
> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?
Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)
> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.
Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.
)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.
In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.
The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an option to make secondary IP addresses get promoted
when primary IP addresses are removed from the device.
It defaults to off to preserve existing behavior.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Include chunk and skb sizes in sendbuffer accounting.
- 2 policies are supported. 0: per socket accouting, 1: per association
accounting
DaveM: I've made the default per-socket.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
This first patch of the series introduces a sysctl (default off) that
enables/disables the randomisation feature globally. Since randomisation may
make it harder to debug really tricky situations (reproducability goes down),
the sysadmin needs a way to disable it globally.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
The BIC TCP cwnd problem as identified by Yee-Ting Li and Doug Leith
is that the computation is recalc_ssthresh is incorrect and
BICTCP_1_OVER_BETA/2 should be BICTCP_1_OVER_BETA*2.
My fix is to implement the code from BIC TCP 1.1 which uses a sysctl
to set the beta. There are a few variable name changes from the 1.1
code, and made the scaling factor a #define instead of hardcoded.
I validated this using netem and kprobes, for more details see
http://developer.osdl.org/shemminger/bic-beta-patch.pdf
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
|
|
into kernel.bkbits.net:/home/davem/net-2.6
|
|
The existing seconds based gc_min_interval is barely
usable.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Rename various fields related to the lower-zone protection code to sync
up with 2.4.
- Remove the automatic determination of the values of the per-zone
protection levels from a single tunable. Replace this with a simple
per-zone sysctl.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch exports to userspace the boot loader ID which has been exported
by (b)zImage boot loaders since boot protocol version 2.
It is needed so that update tools that update kernels from vendors know which
bootloader file they need to update; eg right now those tools do all kinds of
hairy heuristics to find out if it's grub or lilo or .. that installed the
kernel. Those heuristics are fragile in the presence of more than one
bootloader (which isn't that uncommon in OS upgrade situations).
Tested on i386 and x86-64; as far as I know those are the only
architectures which use zImage/bzImage format.
Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Allow disabling of quota messages to console (they can disturb other
output).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds "swap_token_timeout" parameter in /proc/sys/vm. The
parameter means expired time of token. Unit of the value is HZ, and the
default value is the same as current SWAP_TOKEN_TIMEOUT (i.e. HZ * 300).
Signed-off-by: Hideo Aoki <aoki@sdl.hitachi.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This allows control over what percentage of
the congestion window can be consumed by a
single TSO frame.
The setting of this parameter is a choice
between burstiness and building larger TSO
frames.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Apparently a lot of scripts use a construct like
cat /proc/net/ip_conntrack | wc -l
which has a negative impact on system performance due to all the locking
required.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create /proc/sys/vm/legacy_va_layout. If this is non-zero, the kernel
will use the old mmap layout for all tasks. it presently defaults to zero
(the new layout).
From: William Lee Irwin III <wli@holomorphy.com>
hugetlb CONFIG_SYSCTL=n fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into ppc970.osdl.org:/home/torvalds/v2.6/linux
|
|
I made a patch for debugging with the help of NMI trigger switch.
When kernel hangs severely, keyboard operation(e.g.Ctrl-Alt-Del)
doesn't work properly. This patch enables debugging information
to be displayed on console in this case.
I think this feature is necessary as standard functionality.
Please feel free to use this patch and let me know if you have
any comments.
Background:
When a trouble occurs in kernel, we usually begin to investigate
with following information:
- panic >> panic message.
- oops >> CPU registers and stack trace.
- hang >> **NONE** no standard method established.
How it works:
Most IA32 servers have a NMI switch that fires NMI interrupt up.
The NMI interrupt can interrupt even if kernel is serious state,
for example deadlock under the interrupt disabled.
When the NMI switch is pressed after this feature is activated,
CPU registers and stack trace are displayed on console and then
panic occurs.
This feature is activated or deactivated with sysctl.
On IA32 architecture, only the following are defined as reason
of NMI interrupt:
- memory parity error
- I/O check error
The reason code of NMI switch is not defined, so this patch assumes
that all undefined NMI interrupts are fired by MNI switch.
However, oprofile and NMI watchdog also use undefined NMI interrupt.
Therefore this feature cannot be used at the same time with oprofile
and NMI watchdog. This feature hands NMI interrupt over to oprofile
and NMI watchdog. So, when they have been activated, this feature
doesn't work even if it is activated.
Supported architecture:
IA32
Setup:
Set up the system control parameter as follows:
# sysctl -w kernel.unknown_nmi_panic=1
kernel.unknown_nmi_panic = 1
If the NMI switch is pressed, CPU registers and stack trace will
be displayed on console and then panic occurs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6
|
|
Nobody ever fixed the big FIXME in sysctl - but we really need
to pass around the proper "loff_t *" to all the sysctl functions
if we want them to be well-behaved wrt the file pointer position.
This is all preparation for making direct f_pos accesses go
away.
|