<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/writeback.h, branch v3.4.40</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.40</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.40'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-03-07T15:08:46Z</updated>
<entry>
<title>writeback: fix typo in the writeback_control comment</title>
<updated>2012-03-07T15:08:46Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2012-03-05T23:06:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95468fd41a8930743180ea3560f4c601d4174275'/>
<id>urn:sha1:95468fd41a8930743180ea3560f4c601d4174275</id>
<content type='text'>
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
</entry>
<entry>
<title>Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux</title>
<updated>2012-01-11T00:59:59Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-01-11T00:59:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=001a541ea9163ace5e8243ee0e907ad80a4c0ec2'/>
<id>urn:sha1:001a541ea9163ace5e8243ee0e907ad80a4c0ec2</id>
<content type='text'>
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
  writeback: balanced_rate cannot exceed write bandwidth
  writeback: do strict bdi dirty_exceeded
  writeback: avoid tiny dirty poll intervals
  writeback: max, min and target dirty pause time
  writeback: dirty ratelimit - think time compensation
  btrfs: fix dirtied pages accounting on sub-page writes
  writeback: fix dirtied pages accounting on redirty
  writeback: fix dirtied pages accounting on sub-page writes
  writeback: charge leaked page dirties to active tasks
  writeback: Include all dirty inodes in background writeback
</content>
</entry>
<entry>
<title>mm: try to distribute dirty pages fairly across zones</title>
<updated>2012-01-11T00:30:43Z</updated>
<author>
<name>Johannes Weiner</name>
<email>jweiner@redhat.com</email>
</author>
<published>2012-01-10T23:07:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a756cf5908530e8b40bdf569eb48b40139e8d7fd'/>
<id>urn:sha1:a756cf5908530e8b40bdf569eb48b40139e8d7fd</id>
<content type='text'>
The maximum number of dirty pages that exist in the system at any time is
determined by a number of pages considered dirtyable and a user-configured
percentage of those, or an absolute number in bytes.

This number of dirtyable pages is the sum of memory provided by all the
zones in the system minus their lowmem reserves and high watermarks, so
that the system can retain a healthy number of free pages without having
to reclaim dirty pages.

But there is a flaw in that we have a zoned page allocator which does not
care about the global state but rather the state of individual memory
zones.  And right now there is nothing that prevents one zone from filling
up with dirty pages while other zones are spared, which frequently leads
to situations where kswapd, in order to restore the watermark of free
pages, does indeed have to write pages from that zone's LRU list.  This
can interfere so badly with IO from the flusher threads that major
filesystems (btrfs, xfs, ext4) mostly ignore write requests from reclaim
already, taking away the VM's only possibility to keep such a zone
balanced, aside from hoping the flushers will soon clean pages from that
zone.

Enter per-zone dirty limits.  They are to a zone's dirtyable memory what
the global limit is to the global amount of dirtyable memory, and try to
make sure that no single zone receives more than its fair share of the
globally allowed dirty pages in the first place.  As the number of pages
considered dirtyable excludes the zones' lowmem reserves and high
watermarks, the maximum number of dirty pages in a zone is such that the
zone can always be balanced without requiring page cleaning.

As this is a placement decision in the page allocator and pages are
dirtied only after the allocation, this patch allows allocators to pass
__GFP_WRITE when they know in advance that the page will be written to and
become dirty soon.  The page allocator will then attempt to allocate from
the first zone of the zonelist - which on NUMA is determined by the task's
NUMA memory policy - that has not exceeded its dirty limit.

At first glance, it would appear that the diversion to lower zones can
increase pressure on them, but this is not the case.  With a full high
zone, allocations will be diverted to lower zones eventually, so it is
more of a shift in timing of the lower zone allocations.  Workloads that
previously could fit their dirty pages completely in the higher zone may
be forced to allocate from lower zones, but the amount of pages that
"spill over" are limited themselves by the lower zones' dirty constraints,
and thus unlikely to become a problem.

For now, the problem of unfair dirty page distribution remains for NUMA
configurations where the zones allowed for allocation are in sum not big
enough to trigger the global dirty limits, wake up the flusher threads and
remedy the situation.  Because of this, an allocation that could not
succeed on any of the considered zones is allowed to ignore the dirty
limits before going into direct reclaim or even failing the allocation,
until a future patch changes the global dirty throttling and flusher
thread activation so that they take individual zone states into account.

			Test results

15M DMA + 3246M DMA32 + 504 Normal = 3765M memory
40% dirty ratio
16G USB thumb drive
10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 &lt;&lt; 15))

		seconds			nr_vmscan_write
		        (stddev)	       min|     median|        max
xfs
vanilla:	 549.747( 3.492)	     0.000|      0.000|      0.000
patched:	 550.996( 3.802)	     0.000|      0.000|      0.000

fuse-ntfs
vanilla:	1183.094(53.178)	 54349.000|  59341.000|  65163.000
patched:	 558.049(17.914)	     0.000|      0.000|     43.000

btrfs
vanilla:	 573.679(14.015)	156657.000| 460178.000| 606926.000
patched:	 563.365(11.368)	     0.000|      0.000|   1362.000

ext4
vanilla:	 561.197(15.782)	     0.000|2725438.000|4143837.000
patched:	 568.806(17.496)	     0.000|      0.000|      0.000

Signed-off-by: Johannes Weiner &lt;jweiner@redhat.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Tested-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Shaohua Li &lt;shaohua.li@intel.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page-writeback.c: make determine_dirtyable_memory static again</title>
<updated>2012-01-11T00:30:41Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2012-01-10T23:06:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1edf223485c42c99655dcd001db1e46ad5e5d2d7'/>
<id>urn:sha1:1edf223485c42c99655dcd001db1e46ad5e5d2d7</id>
<content type='text'>
The tracing ring-buffer used this function briefly, but not anymore.
Make it local to the writeback code again.

Also, move the function so that no forward declaration needs to be
reintroduced.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c</title>
<updated>2012-01-08T02:35:19Z</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2012-01-08T02:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc31b86a5923fad5f3fbb6192f767f410241ba27'/>
<id>urn:sha1:bc31b86a5923fad5f3fbb6192f767f410241ba27</id>
<content type='text'>
Fix compile error

 fs/fs-writeback.c:515:33: error: ‘PAGE_CACHE_SHIFT’ undeclared (first use in this function)

Reported-by: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Acked-by: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
<entry>
<title>writeback: fix dirtied pages accounting on redirty</title>
<updated>2011-12-18T06:20:23Z</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2011-08-08T21:22:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2f800fbd777b792de54187088df19a7df0251254'/>
<id>urn:sha1:2f800fbd777b792de54187088df19a7df0251254</id>
<content type='text'>
De-account the accumulative dirty counters on page redirty.

Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)

a) NR_DIRTIED, BDI_DIRTIED, tsk-&gt;nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN

This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints).

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
<entry>
<title>writeback: charge leaked page dirties to active tasks</title>
<updated>2011-12-18T06:20:20Z</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2011-04-05T19:21:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=54848d73f9f254631303d6eab9b976855988b266'/>
<id>urn:sha1:54848d73f9f254631303d6eab9b976855988b266</id>
<content type='text'>
It's a years long problem that a large number of short-lived dirtiers
(eg. gcc instances in a fast kernel build) may starve long-run dirtiers
(eg. dd) as well as pushing the dirty pages to the global hard limit.

The solution is to charge the pages dirtied by the exited gcc to the
other random dirtying tasks. It sounds not perfect, however should
behave good enough in practice, seeing as that throttled tasks aren't
actually running so those that are running are more likely to pick it up
and get throttled, therefore promoting an equal spread.

Randy: fix compile error: 'dirty_throttle_leaks' undeclared in exit.c

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
<entry>
<title>writeback: Add a 'reason' to wb_writeback_work</title>
<updated>2011-10-30T16:33:36Z</updated>
<author>
<name>Curt Wohlgemuth</name>
<email>curtw@google.com</email>
</author>
<published>2011-10-08T03:54:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0e175a1835ffc979e55787774e58ec79e41957d7'/>
<id>urn:sha1:0e175a1835ffc979e55787774e58ec79e41957d7</id>
<content type='text'>
This creates a new 'reason' field in a wb_writeback_work
structure, which unambiguously identifies who initiates
writeback activity.  A 'wb_reason' enumeration has been
added to writeback.h, to enumerate the possible reasons.

The 'writeback_work_class' and tracepoint event class and
'writeback_queue_io' tracepoints are updated to include the
symbolic 'reason' in all trace events.

And the 'writeback_inodes_sbXXX' family of routines has had
a wb_stats parameter added to them, so callers can specify
why writeback is being started.

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Curt Wohlgemuth &lt;curtw@google.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
<entry>
<title>writeback: add bg_threshold parameter to __bdi_update_bandwidth()</title>
<updated>2011-10-03T13:08:56Z</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2011-10-04T02:46:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=af6a311384bce6c88e15c80ab22ab051a918b4eb'/>
<id>urn:sha1:af6a311384bce6c88e15c80ab22ab051a918b4eb</id>
<content type='text'>
No behavior change.

Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
<entry>
<title>squeeze max-pause area and drop pass-good area</title>
<updated>2011-08-19T14:42:07Z</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2011-08-16T19:37:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bb0822954aab7d23a3f902c2a103ee0242f6046e'/>
<id>urn:sha1:bb0822954aab7d23a3f902c2a103ee0242f6046e</id>
<content type='text'>
Revert the pass-good area introduced in ffd1f609ab10 ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.

This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.

Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.

The test logs show that

- the disks are sometimes under utilized

- global dirty pages sometimes rush high to the pass-good area for
  several hundred seconds, while in the mean time some bdi dirty pages
  drop to very low value (bdi_dirty &lt;&lt; bdi_thresh).  Then suddenly the
  global dirty pages dropped under global dirty threshold and bdi_dirty
  rush very high (for example, 2 times higher than bdi_thresh). During
  which time balance_dirty_pages() is not called at all.

So the problems are

1) The random writes progress so slow that they break the assumption of
   the max-pause logic that "8 pages per 200ms is typically more than
   enough to curb heavy dirtiers".

2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
   for some bdi's to over dirty pages, leading to (bdi_dirty &gt;&gt; bdi_thresh)
   and then (bdi_thresh &gt;&gt; bdi_dirty) for others.

3) The higher max-pause/pass-good thresholds somehow leads to the bad
   swing of dirty pages.

The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.

Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.

Reported-by: Li Shaohua &lt;shaohua.li@intel.com&gt;
Tested-by: Li Shaohua &lt;shaohua.li@intel.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
</feed>
