<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sys.c, branch v5.8.10</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.8.10</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.8.10'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-06-14T18:39:31Z</updated>
<entry>
<title>Merge tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux</title>
<updated>2020-06-14T18:39:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-14T18:39:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a87b197c1da6b16608d5110709e0b3308e25dcd'/>
<id>urn:sha1:4a87b197c1da6b16608d5110709e0b3308e25dcd</id>
<content type='text'>
Pull SafeSetID update from Micah Morton:
 "Add additional LSM hooks for SafeSetID

  SafeSetID is capable of making allow/deny decisions for set*uid calls
  on a system, and we want to add similar functionality for set*gid
  calls.

  The work to do that is not yet complete, so probably won't make it in
  for v5.8, but we are looking to get this simple patch in for v5.8
  since we have it ready.

  We are planning on the rest of the work for extending the SafeSetID
  LSM being merged during the v5.9 merge window"

* tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux:
  security: Add LSM hooks to set*gid syscalls
</content>
</entry>
<entry>
<title>security: Add LSM hooks to set*gid syscalls</title>
<updated>2020-06-14T17:52:02Z</updated>
<author>
<name>Thomas Cedeno</name>
<email>thomascedeno@google.com</email>
</author>
<published>2020-06-09T17:22:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=39030e1351aa1aa7443bb2da24426573077c83da'/>
<id>urn:sha1:39030e1351aa1aa7443bb2da24426573077c83da</id>
<content type='text'>
The SafeSetID LSM uses the security_task_fix_setuid hook to filter
set*uid() syscalls according to its configured security policy. In
preparation for adding analagous support in the LSM for set*gid()
syscalls, we add the requisite hook here. Tested by putting print
statements in the security_task_fix_setgid hook and seeing them get hit
during kernel boot.

Signed-off-by: Thomas Cedeno &lt;thomascedeno@google.com&gt;
Signed-off-by: Micah Morton &lt;mortonm@chromium.org&gt;
</content>
</entry>
<entry>
<title>mmap locking API: convert mmap_sem comments</title>
<updated>2020-06-09T16:39:14Z</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c1e8d7c6a7a682e1405e3e242d32fc377fd196ff'/>
<id>urn:sha1:c1e8d7c6a7a682e1405e3e242d32fc377fd196ff</id>
<content type='text'>
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mmap locking API: use coccinelle to convert mmap_sem rwsem call sites</title>
<updated>2020-06-09T16:39:14Z</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d8ed45c5dcd455fc5848d47f86883a1b872ac0d0'/>
<id>urn:sha1:d8ed45c5dcd455fc5848d47f86883a1b872ac0d0</id>
<content type='text'>
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&amp;mm-&gt;mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Reviewed-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2020-06-02T19:21:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-02T19:21:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94709049fb8442fb2f7b91fbec3c2897a75e18df'/>
<id>urn:sha1:94709049fb8442fb2f7b91fbec3c2897a75e18df</id>
<content type='text'>
Merge updates from Andrew Morton:
 "A few little subsystems and a start of a lot of MM patches.

  Subsystems affected by this patch series: squashfs, ocfs2, parisc,
  vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
  swap, memcg, pagemap, memory-failure, vmalloc, kasan"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (128 commits)
  kasan: move kasan_report() into report.c
  mm/mm_init.c: report kasan-tag information stored in page-&gt;flags
  ubsan: entirely disable alignment checks under UBSAN_TRAP
  kasan: fix clang compilation warning due to stack protector
  x86/mm: remove vmalloc faulting
  mm: remove vmalloc_sync_(un)mappings()
  x86/mm/32: implement arch_sync_kernel_mappings()
  x86/mm/64: implement arch_sync_kernel_mappings()
  mm/ioremap: track which page-table levels were modified
  mm/vmalloc: track which page-table levels were modified
  mm: add functions to track page directory modifications
  s390: use __vmalloc_node in stack_alloc
  powerpc: use __vmalloc_node in alloc_vm_stack
  arm64: use __vmalloc_node in arch_alloc_vmap_stack
  mm: remove vmalloc_user_node_flags
  mm: switch the test_vmalloc module to use __vmalloc_node
  mm: remove __vmalloc_node_flags_caller
  mm: remove both instances of __vmalloc_node_flags
  mm: remove the prot argument to __vmalloc_node
  mm: remove the pgprot argument to __vmalloc
  ...
</content>
</entry>
<entry>
<title>mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE</title>
<updated>2020-06-02T17:59:08Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2020-06-02T04:48:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a37b0715ddf3007734c4e2424c14bc7efcdd1190'/>
<id>urn:sha1:a37b0715ddf3007734c4e2424c14bc7efcdd1190</id>
<content type='text'>
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).

The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled.  The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.

This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics.  This means the simple "add 25%" heuristic is no longer
reliable.

The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.

This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi.  This ensures it will always be
able to have some pages in flight, and so will not deadlock.

In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.

This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.

So this patch
 - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
 - removes the 25% bonus that that flag gives, and
 - If PF_LOCAL_THROTTLE is set, don't delay at all unless the
   global and the local free-run thresholds are exceeded.

Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads.  This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks.  I don't know what is wanted for realtime.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;	[nfsd]
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>compat sysinfo(2): don't bother with field-by-field copyout</title>
<updated>2020-04-25T22:06:05Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2020-02-19T03:43:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ce5155c4f8226f4816f9f7cb88149ec275c7b0a8'/>
<id>urn:sha1:ce5155c4f8226f4816f9f7cb88149ec275c7b0a8</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>sys/sysinfo: Respect boottime inside time namespace</title>
<updated>2020-03-03T18:34:32Z</updated>
<author>
<name>Cyril Hrubis</name>
<email>chrubis@suse.cz</email>
</author>
<published>2020-03-03T15:06:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ecc421e05bab97cf3ff4fe456ade47ef84dba8c2'/>
<id>urn:sha1:ecc421e05bab97cf3ff4fe456ade47ef84dba8c2</id>
<content type='text'>
The sysinfo() syscall includes uptime in seconds but has no correction for
time namespaces which makes it inconsistent with the /proc/uptime inside of
a time namespace.

Add the missing time namespace adjustment call.

Signed-off-by: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Dmitry Safonov &lt;dima@arista.com&gt;
Link: https://lkml.kernel.org/r/20200303150638.7329-1-chrubis@suse.cz

</content>
</entry>
<entry>
<title>prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim</title>
<updated>2020-01-28T09:09:51Z</updated>
<author>
<name>Mike Christie</name>
<email>mchristi@redhat.com</email>
</author>
<published>2019-11-12T00:19:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8d19f1c8e1937baf74e1962aae9f90fa3aeab463'/>
<id>urn:sha1:8d19f1c8e1937baf74e1962aae9f90fa3aeab463</id>
<content type='text'>
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.

In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.

Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory.  Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.

[ 1626.609191] msgr-worker-1   D    0  1026      1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195]  ? __schedule+0x29b/0x630
[ 1626.609197]  ? wait_for_completion+0xe0/0x170
[ 1626.609198]  schedule+0x30/0xb0
[ 1626.609200]  schedule_timeout+0x1f6/0x2f0
[ 1626.609202]  ? blk_finish_plug+0x21/0x2e
[ 1626.609204]  ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206]  ? wait_for_completion+0xe0/0x170
[ 1626.609208]  wait_for_completion+0x108/0x170
[ 1626.609210]  ? wake_up_q+0x70/0x70
[ 1626.609212]  ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214]  ? xfs_bwrite+0x25/0x60
[ 1626.609215]  xfs_buf_iowait+0x22/0xf0
[ 1626.609218]  __xfs_buf_submit+0x12e/0x250
[ 1626.609220]  xfs_bwrite+0x25/0x60
[ 1626.609222]  xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224]  xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227]  xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228]  super_cache_scan+0x152/0x1a0
[ 1626.609231]  do_shrink_slab+0x12c/0x2d0
[ 1626.609233]  shrink_slab+0x9c/0x2a0
[ 1626.609235]  shrink_node+0xd7/0x470
[ 1626.609237]  do_try_to_free_pages+0xbf/0x380
[ 1626.609240]  try_to_free_pages+0xd9/0x1f0
[ 1626.609245]  __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251]  ? ___slab_alloc+0x238/0x560
[ 1626.609254]  __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259]  skb_page_frag_refill+0x97/0xd0
[ 1626.609274]  sk_page_frag_refill+0x1d/0x80
[ 1626.609279]  tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304]  tcp_sendmsg+0x27/0x40
[ 1626.609307]  sock_sendmsg+0x54/0x60
[ 1626.609308]  ___sys_sendmsg+0x29f/0x320
[ 1626.609313]  ? sock_poll+0x66/0xb0
[ 1626.609318]  ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320]  ? ep_send_events_proc+0xe6/0x230
[ 1626.609322]  ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324]  ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326]  ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327]  ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329]  ? __hrtimer_init+0xb0/0xb0
[ 1626.609331]  ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334]  ? ep_poll+0x26c/0x4a0
[ 1626.609337]  ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339]  ? release_sock+0x43/0x90
[ 1626.609341]  ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342]  __sys_sendmsg+0x47/0x80
[ 1626.609347]  do_syscall_64+0x5f/0x1c0
[ 1626.609349]  ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.

Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: Masato Suzuki &lt;masato.suzuki@wdc.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>kernel/sys.c: avoid copying possible padding bytes in copy_to_user</title>
<updated>2019-12-05T03:44:12Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2019-12-05T00:50:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5e1aada08cd19ea652b2d32a250501d09b02ff2e'/>
<id>urn:sha1:5e1aada08cd19ea652b2d32a250501d09b02ff2e</id>
<content type='text'>
Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.

Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Cc: Dan Carpenter &lt;error27@gmail.com&gt;
Cc: Julia Lawall &lt;julia.lawall@lip6.fr&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
