<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux-bitkeeper.git/ipc/shm.c, branch master</title>
<subtitle>Linux Kernel BitKeeper History</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/atom?h=master</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/'/>
<updated>2005-03-01T09:15:37Z</updated>
<entry>
<title>Audit IPC object owner/permission changes.</title>
<updated>2005-03-01T09:15:37Z</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw2@shinybook.infradead.org</email>
</author>
<published>2005-03-01T09:15:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=e20ffd76fc5bdccf79223667a615dd4c820947ab'/>
<id>urn:sha1:e20ffd76fc5bdccf79223667a615dd4c820947ab</id>
<content type='text'>
Add linked list of auxiliary data to audit_context
Add callbacks in IPC_SET functions to record requested changes.

Signed-off-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] shmctl SHM_LOCK perms</title>
<updated>2004-12-13T00:30:17Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh@veritas.com</email>
</author>
<published>2004-12-13T00:30:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=2637792e3d9ae50079238615fd16384a0d393b30'/>
<id>urn:sha1:2637792e3d9ae50079238615fd16384a0d393b30</id>
<content type='text'>
Michael Kerrisk has observed that at present any process can SHM_LOCK any
shm segment of size within process RLIMIT_MEMLOCK, despite having no
permissions on the segment: surprising, though not obviously evil.  And any
process can SHM_UNLOCK any shm segment, despite no permissions on it: that
is surely wrong.

Unless CAP_IPC_LOCK, restrict both SHM_LOCK and SHM_UNLOCK to when the
process euid matches the shm owner or creator: that seems the least
surprising behaviour, which could be relaxed if a need appears later.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] add missing linux/syscalls.h includes</title>
<updated>2004-10-18T15:54:02Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2004-10-18T15:54:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=09b9135c6e9950c0f12e3e6993ae52ab1baf0476'/>
<id>urn:sha1:09b9135c6e9950c0f12e3e6993ae52ab1baf0476</id>
<content type='text'>
I found that the prototypes for sys_waitid and sys_fcntl in
&lt;linux/syscalls.h&gt; don't match the implementation.  In order to keep all
prototypes in sync in the future, now include the header from each file
implementing any syscall.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] hugetlb: permit executable mappings</title>
<updated>2004-08-24T04:28:18Z</updated>
<author>
<name>William Lee Irwin III</name>
<email>wli@holomorphy.com</email>
</author>
<published>2004-08-24T04:28:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=b60e5e711ad490216cc39f0cdfac91a789d85694'/>
<id>urn:sha1:b60e5e711ad490216cc39f0cdfac91a789d85694</id>
<content type='text'>
During the kernel summit, some discussion was had about the support
requirements for a userspace program loader that loads executables into
hugetlb on behalf of a major application (Oracle).  In order to support
this in a robust fashion, the cleanup of the hugetlb must be robust in the
presence of disorderly termination of the programs (e.g.  kill -9).  Hence,
the cleanup semantics are those of System V shared memory, but Linux'
System V shared memory needs one critical extension for this use:
executability.

The following microscopic patch enables this major application to provide
robust hugetlb cleanup.

Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] remove magic +1 from shm segment count</title>
<updated>2004-08-24T04:27:43Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2004-08-24T04:27:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=fefd81e1513d6871ffe209a53003c06be6e760da'/>
<id>urn:sha1:fefd81e1513d6871ffe209a53003c06be6e760da</id>
<content type='text'>
Michael Kerrisk found a bug in the shm accounting code: sysv shm allows to
create SHMMNI+1 shared memory segments, instead of SHMMNI segments.  The +1
is probably from the first shared anonymous mapping implementation that
used the sysv code to implement shared anon mappings.

The implementation got replaced, it's now the other way around (sysv uses
the shared anon code), but the +1 remained.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] rlimit-based mlocks for unprivileged users</title>
<updated>2004-08-23T06:06:46Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2004-08-23T06:06:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=16698c49bbb42567c0bbc528d3820d18885e4642'/>
<id>urn:sha1:16698c49bbb42567c0bbc528d3820d18885e4642</id>
<content type='text'>
Here is the last agreed-on patch that lets normal users mlock pages up to
their rlimit.  This patch addresses all the issues brought up by Chris and
Andrea.

From: Chris Wright &lt;chrisw@osdl.org&gt;

Couple more nits.

The default lockable amount is one page now (first patch is was 0).  Why
don't we keep it as 0, with the CAP_IPC_LOCK overrides in place?  That way
nothing is changed from user perspective, and the rest of the policy can be
done by userspace as it should.

This patch breaks in one scenario.  When ulimit == 0, process has
CAP_IPC_LOCK, and does SHM_LOCK.  The subsequent unlock or destroy will
corrupt the locked_shm count.

It's also inconsistent in handling user_can_mlock/CAP_IPC_LOCK interaction
betwen shm_lock and shm_hugetlb.

SHM_HUGETLB can now only be done by the shm_group or CAP_IPC_LOCK.
Not any can_do_mlock() user.

Double check of can_do_mlock isn't needed in SHM_LOCK path.

Interface names user_can_mlock and user_substract_mlock could be better.

Incremental update below.  Ran some simple sanity tests on this plus my
patch below and didn't find any problems.

* Make default RLIM_MEMLOCK limit 0.
* Move CAP_IPC_LOCK check into user_can_mlock to be consistent
  and fix but with ulimit == 0 &amp;&amp; CAP_IPC_LOCK with SHM_LOCK.
* Allow can_do_mlock() user to try SHM_HUGETLB setup.
* Remove unecessary extra can_do_mlock() test in shmem_lock().
* Rename user_can_mlock to user_shm_lock and user_subtract_mlock
  to user_shm_unlock.
* Use user instead of current-&gt;user to fit in 80 cols on SHM_LOCK.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] ipc: Add refcount to ipc_rcu_alloc</title>
<updated>2004-08-23T05:40:37Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2004-08-23T05:40:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=3a4262a016085f178811ec2459b93f63127e6280'/>
<id>urn:sha1:3a4262a016085f178811ec2459b93f63127e6280</id>
<content type='text'>
The lifetime of the ipc objects (sem array, msg queue, shm mapping) is
controlled by kern_ipc_perms-&gt;lock - a spinlock.  There is no simple way to
reacquire this spinlock after it was dropped to
schedule()/kmalloc/copy_{to,from}_user/whatever.

The attached patch adds a reference count as a preparation to get rid of
sem_revalidate().

Signed-Off-By: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] sparse: NULL vs 0 - the rest of it</title>
<updated>2004-06-30T08:52:08Z</updated>
<author>
<name>Mika Kukkonen</name>
<email>mika@osdl.org</email>
</author>
<published>2004-06-30T08:52:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=6079e24889b7c8bdf1cca284cb9fe721e0a70ca3'/>
<id>urn:sha1:6079e24889b7c8bdf1cca284cb9fe721e0a70ca3</id>
<content type='text'>
</content>
</entry>
<entry>
<title>[PATCH] numa api: Add shared memory support</title>
<updated>2004-05-22T15:04:40Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2004-05-22T15:04:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=d31d7a1874c710a5c1b589807d53f32d8e7df397'/>
<id>urn:sha1:d31d7a1874c710a5c1b589807d53f32d8e7df397</id>
<content type='text'>
From: Andi Kleen &lt;ak@suse.de&gt;

Add support to tmpfs and hugetlbfs to support NUMA API.  Shared memory is a
bit of a special case for NUMA policy.  Normally policy is associated to VMAs
or to processes, but for a shared memory segment you really want to share the
policy.  The core NUMA API has code for that, this patch adds the necessary
changes to tmpfs and hugetlbfs.

First it changes the custom swapping code in tmpfs to follow the policy set
via VMAs.

It is also useful to have a "backing store" of policy that saves the policy
even when nobody has the shared memory segment mapped.  This allows command
line tools to pre configure policy, which is then later used by programs.

Note that hugetlbfs needs more changes - it is also required to switch it to
lazy allocation, otherwise the prefault prevents mbind() from working.
</content>
</entry>
<entry>
<title>[PATCH] make the pagecache lock irq-safe.</title>
<updated>2004-04-12T06:10:41Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2004-04-12T06:10:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux-bitkeeper.git/commit/?id=89261aab0c7064ca9766bc79e7867b6104274f56'/>
<id>urn:sha1:89261aab0c7064ca9766bc79e7867b6104274f56</id>
<content type='text'>
Intro to these patches:

- Major surgery against the pagecache, radix-tree and writeback code.  This
  work is to address the O_DIRECT-vs-buffered data exposure horrors which
  we've been struggling with for months.

  As a side-effect, 32 bytes are saved from struct inode and eight bytes
  are removed from struct page.  At a cost of approximately 2.5 bits per page
  in the radix tree nodes on 4k pagesize, assuming the pagecache is densely
  populated.  Not all pages are pagecache; other pages gain the full 8 byte
  saving.

  This change will break any arch code which is using page-&gt;list and will
  also break any arch code which is using page-&gt;lru of memory which was
  obtained from slab.

  The basic problem which we (mainly Daniel McNeil) have been struggling
  with is in getting a really reliable fsync() across the page lists while
  other processes are performing writeback against the same file.  It's like
  juggling four bars of wet soap with your eyes shut while someone is
  whacking you with a baseball bat.  Daniel pretty much has the problem
  plugged but I suspect that's just because we don't have testcases to
  trigger the remaining problems.  The complexity and additional locking
  which those patches add is worrisome.

  So the approach taken here is to remove the page lists altogether and
  replace the list-based writeback and wait operations with in-order
  radix-tree walks.

  The radix-tree code has been enhanced to support "tagging" of pages, for
  later searches for pages which have a particular tag set.  This means that
  we can ask the radix tree code "find me the next 16 dirty pages starting at
  pagecache index N" and it will do that in O(log64(N)) time.

  This affects I/O scheduling potentially quite significantly.  It is no
  longer the case that the kernel will submit pages for I/O in the order in
  which the application dirtied them.  We instead submit them in file-offset
  order all the time.

  This is likely to be advantageous when applications are seeking all over
  a large file randomly writing small amounts of data.  I haven't performed
  much benchmarking, but tiobench random write throughput seems to be
  increased by 30%.  Other tests appear to be unaltered.  dbench may have got
  10-20% quicker, but it's variable.

  There is one large file which everyone seeks all over randomly writing
  small amounts of data: the blockdev mapping which caches filesystem
  metadata.  The kernel's IO submission patterns for this are now ideal.


  Because writeback and wait-for-writeback use a tree walk instead of a
  list walk they are no longer livelockable.  This probably means that we no
  longer need to hold i_sem across O_SYNC writes and perhaps fsync() and
  fdatasync().  This may be beneficial for databases: multiple processes
  writing and syncing different parts of the same file at the same time can
  now all submit and wait upon writes to just their own little bit of the
  file, so we can get a lot more data into the queues.

  It is trivial to implement a part-file-fdatasync() as well, so
  applications can say "sync the file from byte N to byte M", and multiple
  applications can do this concurrently.  This is easy for ext2 filesystems,
  but probably needs lots of work for data-journalled filesystems and XFS and
  it probably doesn't offer much benefit over an i_semless O_SYNC write.


  These patches can end up making ext3 (even) slower:

	for i in 1 2 3 4
	do
		dd if=/dev/zero of=$i bs=1M count=2000 &amp;
	done          

  runs awfully slow on SMP.  This is, yet again, because all the file
  blocks are jumbled up and the per-file linear writeout causes tons of
  seeking.  The above test runs sweetly on UP because the on UP we don't
  allocate blocks to different files in parallel.

  Mingming and Badari are working on getting block reservation working for
  ext3 (preallocation on steroids).  That should fix ext3 up.


This patch:

- Later, we'll need to access the radix trees from inside disk I/O
  completion handlers.  So make mapping-&gt;page_lock irq-safe.  And rename it
  to tree_lock to reliably break any missed conversions.
</content>
</entry>
</feed>
