<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/arch/x86/kernel/syscall_table_32.S, branch v3.2.70</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.70</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.70'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2011-11-01T00:30:44Z</updated>
<entry>
<title>Cross Memory Attach</title>
<updated>2011-11-01T00:30:44Z</updated>
<author>
<name>Christopher Yeoh</name>
<email>cyeoh@au1.ibm.com</email>
</author>
<published>2011-11-01T00:06:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fcf634098c00dd9cd247447368495f0b79be12d1'/>
<id>urn:sha1:fcf634098c00dd9cd247447368495f0b79be12d1</id>
<content type='text'>
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&amp;m=130105930902915&amp;w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh &lt;yeohc@au1.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: &lt;linux-man@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>All Arch: remove linkage for sys_nfsservctl system call</title>
<updated>2011-08-26T22:09:58Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2011-08-26T22:03:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5b940997397229975ea073679b03967932a541b'/>
<id>urn:sha1:f5b940997397229975ea073679b03967932a541b</id>
<content type='text'>
The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ns: Wire up the setns system call</title>
<updated>2011-05-28T17:48:39Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2011-05-28T02:28:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b21fddd087678a70ad64afc0f632e0f1071b092'/>
<id>urn:sha1:7b21fddd087678a70ad64afc0f632e0f1071b092</id>
<content type='text'>
32bit and 64bit on x86 are tested and working.  The rest I have looked
at closely and I can't find any problems.

setns is an easy system call to wire up.  It just takes two ints so I
don't expect any weird architecture porting problems.

While doing this I have noticed that we have some architectures that are
very slow to get new system calls.  cris seems to be the slowest where
the last system calls wired up were preadv and pwritev.  avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h.  frv is
behind with perf_event_open being the last syscall wired up.  On h8300
the last system call wired up was epoll_wait.  On m32r the last system
call wired up was fallocate.  mn10300 has recvmmsg as the last system
call wired up.  The rest seem to at least have syncfs wired up which was
new in the 2.6.39.

v2: Most of the architecture support added by Daniel Lezcano &lt;dlezcano@fr.ibm.com&gt;
v3: ported to v2.6.36-rc4 by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall  conflicts.
v7: ported to Linus's latest post 2.6.39 tree.

&gt;  arch/blackfin/include/asm/unistd.h     |    3 ++-
&gt;  arch/blackfin/mach-common/entry.S      |    1 +
Acked-by: Mike Frysinger &lt;vapier@gentoo.org&gt;

Oh - ia64 wiring looks good.
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>net: Add sendmmsg socket system call</title>
<updated>2011-05-05T18:10:14Z</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2011-05-02T20:21:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=228e548e602061b08ee8e8966f567c12aa079682'/>
<id>urn:sha1:228e548e602061b08ee8e8966f567c12aa079682</id>
<content type='text'>
This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.

I wrote a microbenchmark to test the performance gains of using
this new syscall:

http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.

64B UDP

batch   pkts/sec
1       804570
2       872800 (+ 8 %)
4       916556 (+14 %)
8       939712 (+17 %)
16      952688 (+18 %)
32      956448 (+19 %)
64      964800 (+20 %)

64B raw socket

batch   pkts/sec
1       1201449
2       1350028 (+12 %)
4       1461416 (+22 %)
8       1513080 (+26 %)
16      1541216 (+28 %)
32      1553440 (+29 %)
64      1557888 (+30 %)

We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.

[ Add sparc syscall entries. -DaveM ]

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>introduce sys_syncfs to sync a single file system</title>
<updated>2011-03-21T04:40:29Z</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-03-10T19:31:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b7ed78f56575074f29ec99d8984f347f6c99c914'/>
<id>urn:sha1:b7ed78f56575074f29ec99d8984f347f6c99c914</id>
<content type='text'>
It is frequently useful to sync a single file system, instead of all
mounted file systems via sync(2):

 - On machines with many mounts, it is not at all uncommon for some of
   them to hang (e.g. unresponsive NFS server).  sync(2) will get stuck on
   those and may never get to the one you do care about (e.g., /).
 - Some applications write lots of data to the file system and then
   want to make sure it is flushed to disk.  Calling fsync(2) on each
   file introduces unnecessary ordering constraints that result in a large
   amount of sub-optimal writeback/flush/commit behavior by the file
   system.

There are currently two ways (that I know of) to sync a single super_block:

 - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
   mapping, which isn't usually desirable, and doesn't work for non-block
   file systems.
 - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
   current implemention.  Relying on this little-known side effect for
   something like data safety sounds foolish.

Both of these approaches require root privileges, which some applications
do not have (nor should they need?) given that sync(2) is an unprivileged
operation.

This patch introduces a new system call syncfs(2) that takes an fd and
syncs only the file system it references.  Maybe someday we can

 $ sync /some/path

and not get

 sync: ignoring all arguments

The syscall is motivated by comments by Al and Christoph at the last LSF.
syncfs(2) seems like an appropriate name given statfs(2).

A similar ioctl was also proposed a while back, see
	http://marc.info/?l=linux-fsdevel&amp;m=127970513829285&amp;w=2

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2011-03-16T01:53:35Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-03-16T01:53:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=420c1c572d4ceaa2f37b6311b7017ac6cf049fe2'/>
<id>urn:sha1:420c1c572d4ceaa2f37b6311b7017ac6cf049fe2</id>
<content type='text'>
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
  posix-clocks: Check write permissions in posix syscalls
  hrtimer: Remove empty hrtimer_init_hres_timer()
  hrtimer: Update hrtimer-&gt;state documentation
  hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
  timers: Export CLOCK_BOOTTIME via the posix timers interface
  timers: Add CLOCK_BOOTTIME hrtimer base
  time: Extend get_xtime_and_monotonic_offset() to also return sleep
  time: Introduce get_monotonic_boottime and ktime_get_boottime
  hrtimers: extend hrtimer base code to handle more then 2 clockids
  ntp: Remove redundant and incorrect parameter check
  mn10300: Switch do_timer() to xtimer_update()
  posix clocks: Introduce dynamic clocks
  posix-timers: Cleanup namespace
  posix-timers: Add support for fd based clocks
  x86: Add clock_adjtime for x86
  posix-timers: Introduce a syscall for clock tuning.
  time: Splitout compat timex accessors
  ntp: Add ADJ_SETOFFSET mode bit
  time: Introduce timekeeping_inject_offset
  posix-timer: Update comment
  ...

Fix up new system-call-related conflicts in
	arch/x86/ia32/ia32entry.S
	arch/x86/include/asm/unistd_32.h
	arch/x86/include/asm/unistd_64.h
	arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
	kernel/time.c
</content>
</entry>
<entry>
<title>x86: Add new syscalls for x86_32</title>
<updated>2011-03-15T06:21:44Z</updated>
<author>
<name>Aneesh Kumar K.V</name>
<email>aneesh.kumar@linux.vnet.ibm.com</email>
</author>
<published>2011-01-29T13:13:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7dadb755b082c259f7dd4a95a3a6eb21646a28d5'/>
<id>urn:sha1:7dadb755b082c259f7dd4a95a3a6eb21646a28d5</id>
<content type='text'>
This patch adds new syscalls to x86_32

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>x86: Add clock_adjtime for x86</title>
<updated>2011-02-02T14:28:19Z</updated>
<author>
<name>Richard Cochran</name>
<email>richard.cochran@omicron.at</email>
</author>
<published>2011-02-01T13:52:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ce26efdefa5e8f22d933df72d7f7482725091d6d'/>
<id>urn:sha1:ce26efdefa5e8f22d933df72d7f7482725091d6d</id>
<content type='text'>
This patch adds the clock_adjtime system call to the x86 architecture.

Signed-off-by: Richard Cochran &lt;richard.cochran@omicron.at&gt;
Acked-by: John Stultz &lt;johnstul@us.ibm.com&gt;
LKML-Reference: &lt;20110201134419.968905083@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>x86: fix up system call numbering nit</title>
<updated>2010-08-10T22:35:10Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-10T22:35:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8cbd84f2dd4e52a8771b191030c374ba3e56d291'/>
<id>urn:sha1:8cbd84f2dd4e52a8771b191030c374ba3e56d291</id>
<content type='text'>
As pointed out by Jiri Slaby: when I resolved the the 32-bit x85 system
call entry tables for prlimit (due to the conflict with fanotify), I
forgot to add the numbering in comments that we do for every fifth entry.

Reported-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux</title>
<updated>2010-08-10T19:07:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-10T19:07:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b34d8915c413acb51d837a45fb8747b61f65c020'/>
<id>urn:sha1:b34d8915c413acb51d837a45fb8747b61f65c020</id>
<content type='text'>
* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
  unistd: add __NR_prlimit64 syscall numbers
  rlimits: implement prlimit64 syscall
  rlimits: switch more rlimit syscalls to do_prlimit
  rlimits: redo do_setrlimit to more generic do_prlimit
  rlimits: add rlimit64 structure
  rlimits: do security check under task_lock
  rlimits: allow setrlimit to non-current tasks
  rlimits: split sys_setrlimit
  rlimits: selinux, do rlimits changes under task_lock
  rlimits: make sure -&gt;rlim_max never grows in sys_setrlimit
  rlimits: add task_struct to update_rlimit_cpu
  rlimits: security, add task_struct to setrlimit

Fix up various system call number conflicts.  We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.
</content>
</entry>
</feed>
