<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/sysctl.h, branch v4.12.4</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.12.4</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.12.4'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-07-15T11:09:04Z</updated>
<entry>
<title>proc: Fix proc_sys_prune_dcache to hold a sb reference</title>
<updated>2017-07-15T11:09:04Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-07-06T13:41:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd236b3c9aa79d4ef8ce5700db2aa1727c98f8cf'/>
<id>urn:sha1:dd236b3c9aa79d4ef8ce5700db2aa1727c98f8cf</id>
<content type='text'>
commit 2fd1d2c4ceb2248a727696962cf3370dc9f5a0a4 upstream.

Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
&gt; BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo}  still in use (1) [unmount of proc proc]
&gt; ------------[ cut here ]------------
&gt; WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
&gt; CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
&gt; Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
&gt; Workqueue: events proc_cleanup_work
&gt; Call Trace:
&gt;  dump_stack+0x63/0x86
&gt;  __warn+0xcb/0xf0
&gt;  warn_slowpath_null+0x1d/0x20
&gt;  umount_check+0x6e/0x80
&gt;  d_walk+0xc6/0x270
&gt;  ? dentry_free+0x80/0x80
&gt;  do_one_tree+0x26/0x40
&gt;  shrink_dcache_for_umount+0x2d/0x90
&gt;  generic_shutdown_super+0x1f/0xf0
&gt;  kill_anon_super+0x12/0x20
&gt;  proc_kill_sb+0x40/0x50
&gt;  deactivate_locked_super+0x43/0x70
&gt;  deactivate_super+0x5a/0x60
&gt;  cleanup_mnt+0x3f/0x90
&gt;  mntput_no_expire+0x13b/0x190
&gt;  kern_unmount+0x3e/0x50
&gt;  pid_ns_release_proc+0x15/0x20
&gt;  proc_cleanup_work+0x15/0x20
&gt;  process_one_work+0x197/0x450
&gt;  worker_thread+0x4e/0x4a0
&gt;  kthread+0x109/0x140
&gt;  ? process_one_work+0x450/0x450
&gt;  ? kthread_park+0x90/0x90
&gt;  ret_from_fork+0x2c/0x40
&gt; ---[ end trace e1c109611e5d0b41 ]---
&gt; VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds.  Have a nice day...
&gt; BUG: unable to handle kernel NULL pointer dereference at           (null)
&gt; IP: _raw_spin_lock+0xc/0x30
&gt; PGD 0

Fix this by taking a reference to the super block in proc_sys_prune_dcache.

The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used.  This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list.  Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks.  The fact that
head-&gt;unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.

Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to.  The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.

Reported-by: Andrei Vagin &lt;avagin@gmail.com&gt;
Tested-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sysctl: Remove dead register_sysctl_root</title>
<updated>2017-04-17T04:42:49Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-03-24T13:38:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b54807fa52ae21bdf6bad72b0f00fd400af412eb'/>
<id>urn:sha1:b54807fa52ae21bdf6bad72b0f00fd400af412eb</id>
<content type='text'>
The function no longer does anything.  The is only a single caller of
register_sysctl_root when semantically there should be two.  Remove
this function so that if someone decides this functionality is needed
again it will be obvious all of the callers of setup_sysctl_set need
to be audited and modified appropriately.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc/sysctl: prune stale dentries during unregistering</title>
<updated>2017-02-13T04:00:06Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2017-02-10T07:35:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d6cffbbe9a7e51eb705182965a189457c17ba8a3'/>
<id>urn:sha1:d6cffbbe9a7e51eb705182965a189457c17ba8a3</id>
<content type='text'>
Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.

For example, command:

 # for i in {1..100000} ; do unshare -n -- sysctl -a &amp;&gt; /dev/null ; done
 creates a millions of stale denties around sysctls of loopback interface:

 # sysctl fs.dentry-state
 fs.dentry-state = 25812579  24724135        45      0       0       0

 All of them have matching names thus lookup have to scan though whole
 hash chain and call d_compare (proc_sys_compare) which checks them
 under system-wide spinlock (sysctl_lock).

 # time sysctl -a &gt; /dev/null
 real    1m12.806s
 user    0m0.016s
 sys     1m12.400s

Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.

This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.

Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt; writes:
&gt; On 10.02.2017 10:47, Al Viro wrote:
&gt;&gt; how about &gt;&gt; the matching stats *after* that patch?
&gt;
&gt; dcache size doesn't grow endlessly, so stats are fine
&gt;
&gt; # sysctl fs.dentry-state
&gt; fs.dentry-state = 92712	58376	45	0	0	0
&gt;
&gt; # time sysctl -a &amp;&gt;/dev/null
&gt;
&gt; real	0m0.013s
&gt; user	0m0.004s
&gt; sys	0m0.008s

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2016-10-06T16:52:23Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-10-06T16:52:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14986a34e1289424811443a524cdd9e1688c7913'/>
<id>urn:sha1:14986a34e1289424811443a524cdd9e1688c7913</id>
<content type='text'>
Pull namespace updates from Eric Biederman:
 "This set of changes is a number of smaller things that have been
  overlooked in other development cycles focused on more fundamental
  change. The devpts changes are small things that were a distraction
  until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an
  trivial regression fix to autofs for the unprivileged mount changes
  that went in last cycle. A pair of ioctls has been added by Andrey
  Vagin making it is possible to discover the relationships between
  namespaces when referring to them through file descriptors.

  The big user visible change is starting to add simple resource limits
  to catch programs that misbehave. With namespaces in general and user
  namespaces in particular allowing users to use more kinds of
  resources, it has become important to have something to limit errant
  programs. Because the purpose of these limits is to catch errant
  programs the code needs to be inexpensive to use as it always on, and
  the default limits need to be high enough that well behaved programs
  on well behaved systems don't encounter them.

  To this end, after some review I have implemented per user per user
  namespace limits, and use them to limit the number of namespaces. The
  limits being per user mean that one user can not exhause the limits of
  another user. The limits being per user namespace allow contexts where
  the limit is 0 and security conscious folks can remove from their
  threat anlysis the code used to manage namespaces (as they have
  historically done as it root only). At the same time the limits being
  per user namespace allow other parts of the system to use namespaces.

  Namespaces are increasingly being used in application sand boxing
  scenarios so an all or nothing disable for the entire system for the
  security conscious folks makes increasing use of these sandboxes
  impossible.

  There is also added a limit on the maximum number of mounts present in
  a single mount namespace. It is nontrivial to guess what a reasonable
  system wide limit on the number of mount structure in the kernel would
  be, especially as it various based on how a system is using
  containers. A limit on the number of mounts in a mount namespace
  however is much easier to understand and set. In most cases in
  practice only about 1000 mounts are used. Given that some autofs
  scenarious have the potential to be 30,000 to 50,000 mounts I have set
  the default limit for the number of mounts at 100,000 which is well
  above every known set of users but low enough that the mount hash
  tables don't degrade unreaonsably.

  These limits are a start. I expect this estabilishes a pattern that
  other limits for resources that namespaces use will follow. There has
  been interest in making inotify event limits per user per user
  namespace as well as interest expressed in making details about what
  is going on in the kernel more visible"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits)
  autofs:  Fix automounts by using current_real_cred()-&gt;uid
  mnt: Add a per mount namespace limit on the number of mounts
  netns: move {inc,dec}_net_namespaces into #ifdef
  nsfs: Simplify __ns_get_path
  tools/testing: add a test to check nsfs ioctl-s
  nsfs: add ioctl to get a parent namespace
  nsfs: add ioctl to get an owning user namespace for ns file descriptor
  kernel: add a helper to get an owning user namespace for a namespace
  devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts
  devpts: Remove sync_filesystems
  devpts: Make devpts_kill_sb safe if fsi is NULL
  devpts: Simplify devpts_mount by using mount_nodev
  devpts: Move the creation of /dev/pts/ptmx into fill_super
  devpts: Move parse_mount_options into fill_super
  userns: When the per user per user namespace limit is reached return ENOSPC
  userns; Document per user per user namespace limits.
  mntns: Add a limit on the number of mount namespaces.
  netns: Add a limit on the number of net namespaces
  cgroupns: Add a limit on the number of cgroup namespaces
  ipcns: Add a  limit on the number of ipc namespaces
  ...
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2016-08-30T04:54:02Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-08-30T04:54:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6abdd5f5935fff978f950561f3c5175eb34dad73'/>
<id>urn:sha1:6abdd5f5935fff978f950561f3c5175eb34dad73</id>
<content type='text'>
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sysctl: handle error writing UINT_MAX to u32 fields</title>
<updated>2016-08-27T00:39:35Z</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2016-08-25T22:16:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e7d316a02f683864a12389f8808570e37fb90aa3'/>
<id>urn:sha1:e7d316a02f683864a12389f8808570e37fb90aa3</id>
<content type='text'>
We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels.  An entry which we write to here is
xfrm_aevent_rseqth which is u32.

  echo 4294967295  &gt; /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109e3 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs.  However, this does not apply to unsigned integers.

Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX.  This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax.  Alternatively,
we would need to change the datatype of the entry to 64 bit.

  static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
  {
      i = (unsigned long *) data;   //This cast is causing to read beyond the size of data (u32)
      vleft = table-&gt;maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.

Introduce a new proc handler proc_douintvec.  Individual proc entries
will need to be updated to use the new handler.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 230633d109e3 ("kernel/sysctl.c:detect overflows when converting to int")
Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Cc: Heinrich Schuchardt &lt;xypron.glpk@gmx.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>net: make net namespace sysctls belong to container's owner</title>
<updated>2016-08-15T04:08:58Z</updated>
<author>
<name>Dmitry Torokhov</name>
<email>dmitry.torokhov@gmail.com</email>
</author>
<published>2016-08-10T21:36:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e79c6a4fc923eed2bdd3b716e0f01414847db90a'/>
<id>urn:sha1:e79c6a4fc923eed2bdd3b716e0f01414847db90a</id>
<content type='text'>
If net namespace is attached to a user namespace let's make container's
root owner of sysctls affecting said network namespace instead of global
root.

This also allows us to clean up net_ctl_permissions() because we do not
need to fudge permissions anymore for the container's owner since it now
owns the objects in question.

Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Dmitry Torokhov &lt;dmitry.torokhov@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sysctl: Stop implicitly passing current into sysctl_table_root.lookup</title>
<updated>2016-08-08T14:17:16Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-07-16T20:22:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=13bcc6a2853435bb5dad368bcbaa9d2a5b9c0ac4'/>
<id>urn:sha1:13bcc6a2853435bb5dad368bcbaa9d2a5b9c0ac4</id>
<content type='text'>
Passing nsproxy into sysctl_table_root.lookup was a premature
optimization in attempt to avoid depending on current.  The
directory /proc/self/sys has not appeared and if and when
it does this code will need to be reviewed closely and reworked
anyway.  So remove the premature optimization.

Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>uapi: move forward declarations of internal structures</title>
<updated>2016-08-02T21:31:41Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2016-08-02T21:03:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=db3f60012482756f46cc4d7d9ad7d793ae30360c'/>
<id>urn:sha1:db3f60012482756f46cc4d7d9ad7d793ae30360c</id>
<content type='text'>
Don't user forward declarations of internal kernel structures in headers
exported to userspace.

Move "struct completion;".
Move "struct task_struct;".

Link: http://lkml.kernel.org/r/20160713215808.GA22486@p183.telecom.by
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sysctl: Allow creating permanently empty directories that serve as mountpoints.</title>
<updated>2015-07-01T15:36:39Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-05-10T03:09:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f9bd6733d3f11e24f3949becf277507d422ee1eb'/>
<id>urn:sha1:f9bd6733d3f11e24f3949becf277507d422ee1eb</id>
<content type='text'>
Add a magic sysctl table sysctl_mount_point that when used to
create a directory forces that directory to be permanently empty.

Update the code to use make_empty_dir_inode when accessing permanently
empty directories.

Update the code to not allow adding to permanently empty directories.

Update /proc/sys/fs/binfmt_misc to be a permanently empty directory.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
