<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/backing-dev-defs.h, branch v5.8.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.8.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.8.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-05-09T22:15:13Z</updated>
<entry>
<title>bdi: remove the name field in struct backing_dev_info</title>
<updated>2020-05-09T22:15:13Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-05-04T12:48:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1cd925d583857ee3ead6cfbf1e4b1cd067d28591'/>
<id>urn:sha1:1cd925d583857ee3ead6cfbf1e4b1cd067d28591</id>
<content type='text'>
The name is only printed for a not registered bdi in writeback.  Use the
device name there as is more useful anyway for the unlike case that the
warning triggers.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: add a -&gt;dev_name field to struct backing_dev_info</title>
<updated>2020-05-09T22:07:57Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-05-04T12:47:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6bd87eec23cbc9ed222bed0f5b5b02bf300e9a8d'/>
<id>urn:sha1:6bd87eec23cbc9ed222bed0f5b5b02bf300e9a8d</id>
<content type='text'>
Cache a copy of the name for the life time of the backing_dev_info
structure so that we can reference it even after unregistering.

Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears")
Reported-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.</title>
<updated>2020-04-18T03:38:11Z</updated>
<author>
<name>Zhiqiang Liu</name>
<email>liuzhiqiang26@huawei.com</email>
</author>
<published>2020-04-13T05:12:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4b4c2a78a9fc0c532c58504e8cb5441224ff1d9'/>
<id>urn:sha1:c4b4c2a78a9fc0c532c58504e8cb5441224ff1d9</id>
<content type='text'>
free_more_memory func has been completely removed in commit bc48f001de12
("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")

So comment and `WB_REASON_FREE_MORE_MEM` reason about free_more_memory
are no longer needed.

Fixes: bc48f001de12 ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>writeback, memcg: Implement foreign dirty flushing</title>
<updated>2019-08-27T15:22:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-08-26T16:06:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=97b27821b4854ca744946dae32a3f2fd55bcd5bc'/>
<id>urn:sha1:97b27821b4854ca744946dae32a3f2fd55bcd5bc</id>
<content type='text'>
There's an inherent mismatch between memcg and writeback.  The former
trackes ownership per-page while the latter per-inode.  This was a
deliberate design decision because honoring per-page ownership in the
writeback path is complicated, may lead to higher CPU and IO overheads
and deemed unnecessary given that write-sharing an inode across
different cgroups isn't a common use-case.

Combined with inode majority-writer ownership switching, this works
well enough in most cases but there are some pathological cases.  For
example, let's say there are two cgroups A and B which keep writing to
different but confined parts of the same inode.  B owns the inode and
A's memory is limited far below B's.  A's dirty ratio can rise enough
to trigger balance_dirty_pages() sleeps but B's can be low enough to
avoid triggering background writeback.  A will be slowed down without
a way to make writeback of the dirty pages happen.

This patch implements foreign dirty recording and foreign mechanism so
that when a memcg encounters a condition as above it can trigger
flushes on bdi_writebacks which can clean its pages.  Please see the
comment on top of mem_cgroup_track_foreign_dirty_slowpath() for
details.

A reproducer follows.

write-range.c::

  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;unistd.h&gt;
  #include &lt;fcntl.h&gt;
  #include &lt;sys/types.h&gt;

  static const char *usage = "write-range FILE START SIZE\n";

  int main(int argc, char **argv)
  {
	  int fd;
	  unsigned long start, size, end, pos;
	  char *endp;
	  char buf[4096];

	  if (argc &lt; 4) {
		  fprintf(stderr, usage);
		  return 1;
	  }

	  fd = open(argv[1], O_WRONLY);
	  if (fd &lt; 0) {
		  perror("open");
		  return 1;
	  }

	  start = strtoul(argv[2], &amp;endp, 0);
	  if (*endp != '\0') {
		  fprintf(stderr, usage);
		  return 1;
	  }

	  size = strtoul(argv[3], &amp;endp, 0);
	  if (*endp != '\0') {
		  fprintf(stderr, usage);
		  return 1;
	  }

	  end = start + size;

	  while (1) {
		  for (pos = start; pos &lt; end; ) {
			  long bread, bwritten = 0;

			  if (lseek(fd, pos, SEEK_SET) &lt; 0) {
				  perror("lseek");
				  return 1;
			  }

			  bread = read(0, buf, sizeof(buf) &lt; end - pos ?
					       sizeof(buf) : end - pos);
			  if (bread &lt; 0) {
				  perror("read");
				  return 1;
			  }
			  if (bread == 0)
				  return 0;

			  while (bwritten &lt; bread) {
				  long this;

				  this = write(fd, buf + bwritten,
					       bread - bwritten);
				  if (this &lt; 0) {
					  perror("write");
					  return 1;
				  }

				  bwritten += this;
				  pos += bwritten;
			  }
		  }
	  }
  }

repro.sh::

  #!/bin/bash

  set -e
  set -x

  sysctl -w vm.dirty_expire_centisecs=300000
  sysctl -w vm.dirty_writeback_centisecs=300000
  sysctl -w vm.dirtytime_expire_seconds=300000
  echo 3 &gt; /proc/sys/vm/drop_caches

  TEST=/sys/fs/cgroup/test
  A=$TEST/A
  B=$TEST/B

  mkdir -p $A $B
  echo "+memory +io" &gt; $TEST/cgroup.subtree_control
  echo $((1&lt;&lt;30)) &gt; $A/memory.high
  echo $((32&lt;&lt;30)) &gt; $B/memory.high

  rm -f testfile
  touch testfile
  fallocate -l 4G testfile

  echo "Starting B"

  (echo $BASHPID &gt; $B/cgroup.procs
   pv -q --rate-limit 70M &lt; /dev/urandom | ./write-range testfile $((2&lt;&lt;30)) $((2&lt;&lt;30))) &amp;

  echo "Waiting 10s to ensure B claims the testfile inode"
  sleep 5
  sync
  sleep 5
  sync
  echo "Starting A"

  (echo $BASHPID &gt; $A/cgroup.procs
   pv &lt; /dev/urandom | ./write-range testfile 0 $((2&lt;&lt;30)))

v2: Added comments explaining why the specific intervals are being used.

v3: Use 0 @nr when calling cgroup_writeback_by_id() to use best-effort
    flushing while avoding possible livelocks.

v4: Use get_jiffies_64() and time_before/after64() instead of raw
    jiffies_64 and arthimetic comparisons as suggested by Jan.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: Add bdi-&gt;id</title>
<updated>2019-08-27T15:22:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-08-26T16:06:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=34f8fe501f0624de115d087680c84000b5d9abc9'/>
<id>urn:sha1:34f8fe501f0624de115d087680c84000b5d9abc9</id>
<content type='text'>
There currently is no way to universally identify and lookup a bdi
without holding a reference and pointer to it.  This patch adds an
non-recycling bdi-&gt;id and implements bdi_get_by_id() which looks up
bdis by their ids.  This will be used by memcg foreign inode flushing.

I left bdi_list alone for simplicity and because while rb_tree does
support rcu assignment it doesn't seem to guarantee lossless walk when
walk is racing aginst tree rebalance operations.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>writeback: Generalize and expose wb_completion</title>
<updated>2019-08-27T15:22:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-08-26T16:06:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d'/>
<id>urn:sha1:5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d</id>
<content type='text'>
wb_completion is used to track writeback completions.  We want to use
it from memcg side for foreign inode flushes.  This patch updates it
to remember the target waitq instead of assuming bdi-&gt;wb_waitq and
expose it outside of fs-writeback.c.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>backing-dev: no need to check return value of debugfs_create functions</title>
<updated>2019-06-03T13:49:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2019-01-22T15:21:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2d146b924ec3c0873f06308d149684dc1105d9a3'/>
<id>urn:sha1:2d146b924ec3c0873f06308d149684dc1105d9a3</id>
<content type='text'>
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

And as the return value does not matter at all, no need to save the
dentry in struct backing_dev_info, so delete it.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Anders Roxell &lt;anders.roxell@linaro.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: linux-mm@kvack.org
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: synchronize sync(2) against cgroup writeback membership switches</title>
<updated>2019-01-22T21:39:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-12-12T16:38:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7fc5854f8c6efae9e7624970ab49a1eac2faefb1'/>
<id>urn:sha1:7fc5854f8c6efae9e7624970ab49a1eac2faefb1</id>
<content type='text'>
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info-&gt;wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jiufei Xue &lt;xuejiufei@gmail.com&gt;
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>writeback: don't decrement wb-&gt;refcnt if !wb-&gt;bdi</title>
<updated>2018-12-28T20:11:46Z</updated>
<author>
<name>Anders Roxell</name>
<email>anders.roxell@linaro.org</email>
</author>
<published>2018-12-28T08:33:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=368686a95e55fd66b88542b5b23d802a4886b1aa'/>
<id>urn:sha1:368686a95e55fd66b88542b5b23d802a4886b1aa</id>
<content type='text'>
This happened while running in qemu-system-aarch64, the AMBA PL011 UART
driver when enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE.
arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init),
devtmpfs' handle_remove() crashes because the reference count is a NULL
pointer only because wb-&gt;bdi hasn't been initialized yet.

Rework so that wb_put have an extra check if wb-&gt;bdi before decrement
wb-&gt;refcnt and also add a WARN_ON_ONCE to get a warning if it happens
again in other drivers.

Link: http://lkml.kernel.org/r/20181030113545.30999-2-anders.roxell@linaro.org
Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Anders Roxell &lt;anders.roxell@linaro.org&gt;
Co-developed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bdi: use refcount_t for reference counting instead atomic_t</title>
<updated>2018-08-22T17:52:46Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2018-08-22T04:55:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e58dd0de5eadf145895b13451a1fef8ef03946eb'/>
<id>urn:sha1:e58dd0de5eadf145895b13451a1fef8ef03946eb</id>
<content type='text'>
refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This permits avoiding
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/20180703200141.28415-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
