user/sven/linux.git/mm/backing-dev.c, branch v3.6.5

vfs: kill write_super and sync_supers

2012-08-03T21:24:44Z

Finally we can kill the 'sync_supers' kernel thread along with the '->write_super()' superblock operation because all the users are gone. Now every file-system is supposed to self-manage own superblock and its dirty state. The nice thing about killing this thread is that it improves power management. Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up every 5 seconds no matter what - even if there were no dirty superblocks and even if there were no file-systems using this service (e.g., btrfs and journalled ext4 do not need it). So it was wasting power most of the time. And because the thread was in the core of the kernel, all systems had to have it. So I am quite happy to make it go away. Interestingly, this thread is a left-over from the pdflush kernel thread which was a self-forking kernel thread responsible for all the write-back in old Linux kernels. It was turned into per-block device BDI threads, and 'sync_supers' was a left-over. Thus, R.I.P, pdflush as well. Signed-off-by: Artem Bityutskiy Signed-off-by: Al Viro

mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads

2012-08-01T01:42:40Z

Since per-BDI flusher threads were introduced in 2.6, the pdflush mechanism is not used any more. But the old interface exported through /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless. For back-compatibility, printk warning information and return 2 to notify the users that the interface is removed. Signed-off-by: Wanpeng Li Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

block: Convert BDI proportion calculations to flexible proportions

2012-06-08T23:37:56Z

Convert calculations of proportion of writeback each bdi does to new flexible proportion code. That allows us to use aging period of fixed wallclock time which gives better proportion estimates given the hugely varying throughput of different devices. Acked-by: Peter Zijlstra Signed-off-by: Jan Kara Signed-off-by: Fengguang Wu

backing-dev: fix wakeup timer races with bdi_unregister()

2012-02-01T08:52:49Z

While 7a401a972df8e18 ("backing-dev: ensure wakeup_timer is deleted") addressed the problem of the bdi being freed with a queued wakeup timer, there are other races that could happen if the wakeup timer expires after/during bdi_unregister(), before bdi_destroy() is called. wakeup_timer_fn() could attempt to wakeup a task which has already has been freed, or could access a NULL bdi->dev via the wake_forker_thread tracepoint. Cc: Cc: Jens Axboe Reported-by: Chanho Min Reviewed-by: Namjae Jeon Signed-off-by: Rabin Vincent Signed-off-by: Wu Fengguang

freezer: implement and use kthread_freezable_should_stop()

2011-11-21T20:32:23Z

Writeback and thinkpad_acpi have been using thaw_process() to prevent deadlock between the freezer and kthread_stop(); unfortunately, this is inherently racy - nothing prevents freezing from happening between thaw_process() and kthread_stop(). This patch implements kthread_freezable_should_stop() which enters refrigerator if necessary but is guaranteed to return if kthread_stop() is invoked. Both thaw_process() users are converted to use the new function. Note that this deadlock condition exists for many of freezable kthreads. They need to be converted to use the new should_stop or freezable workqueue. Tested with synthetic test case. Signed-off-by: Tejun Heo Acked-by: Henrique de Moraes Holschuh Cc: Jens Axboe Cc: Oleg Nesterov

backing-dev: ensure wakeup_timer is deleted

2011-11-11T12:29:04Z

bdi_prune_sb() in bdi_unregister() attempts to removes the bdi links from all super_blocks and then del_timer_sync() the writeback timer. However, this can race with __mark_inode_dirty(), leading to bdi_wakeup_thread_delayed() rearming the writeback timer on the bdi we're unregistering, after we've called del_timer_sync(). This can end up with the bdi being freed with an active timer inside it, as in the case of the following dump after the removal of an SD card. Fix this by redoing the del_timer_sync() in bdi_destory(). ------------[ cut here ]------------ WARNING: at /home/rabin/kernel/arm/lib/debugobjects.c:262 debug_print_object+0x9c/0xc8() ODEBUG: free active (active state 0) object type: timer_list hint: wakeup_timer_fn+0x0/0x180 Modules linked in: Backtrace: [] (dump_backtrace+0x0/0x110) from [] (dump_stack+0x18/0x1c) r6:c02bc638 r5:00000106 r4:c79f5d18 r3:00000000 [] (dump_stack+0x0/0x1c) from [] (warn_slowpath_common+0x54/0x6c) [] (warn_slowpath_common+0x0/0x6c) from [] (warn_slowpath_fmt+0x38/0x40) r8:20000013 r7:c780c6f0 r6:c031613c r5:c780c6f0 r4:c02b1b29 r3:00000009 [] (warn_slowpath_fmt+0x0/0x40) from [] (debug_print_object+0x9c/0xc8) r3:c02b1b29 r2:c02bc662 [] (debug_print_object+0x0/0xc8) from [] (debug_check_no_obj_freed+0xac/0x1dc) r6:c7964000 r5:00000001 r4:c7964000 [] (debug_check_no_obj_freed+0x0/0x1dc) from [] (kmem_cache_free+0x88/0x1f8) [] (kmem_cache_free+0x0/0x1f8) from [] (blk_release_queue+0x70/0x78) [] (blk_release_queue+0x0/0x78) from [] (kobject_release+0x70/0x84) r5:c79641f0 r4:c796420c [] (kobject_release+0x0/0x84) from [] (kref_put+0x68/0x80) r7:00000083 r6:c74083d0 r5:c015289c r4:c796420c [] (kref_put+0x0/0x80) from [] (kobject_put+0x48/0x5c) r5:c79643b4 r4:c79641f0 [] (kobject_put+0x0/0x5c) from [] (blk_cleanup_queue+0x68/0x74) r4:c7964000 [] (blk_cleanup_queue+0x0/0x74) from [] (mmc_blk_put+0x78/0xe8) r5:00000000 r4:c794c400 [] (mmc_blk_put+0x0/0xe8) from [] (mmc_blk_release+0x24/0x38) r5:c794c400 r4:c0322824 [] (mmc_blk_release+0x0/0x38) from [] (__blkdev_put+0xe8/0x170) r5:c78d5e00 r4:c74083c0 [] (__blkdev_put+0x0/0x170) from [] (blkdev_put+0x11c/0x12c) r8:c79f5f70 r7:00000001 r6:c74083d0 r5:00000083 r4:c74083c0 r3:00000000 [] (blkdev_put+0x0/0x12c) from [] (kill_block_super+0x60/0x6c) r7:c7942300 r6:c79f4000 r5:00000083 r4:c74083c0 [] (kill_block_super+0x0/0x6c) from [] (deactivate_locked_super+0x44/0x70) r6:c79f4000 r5:c031af64 r4:c794dc00 r3:c00b06c4 [] (deactivate_locked_super+0x0/0x70) from [] (deactivate_super+0x6c/0x70) r5:c794dc00 r4:c794dc00 [] (deactivate_super+0x0/0x70) from [] (mntput_no_expire+0x188/0x194) r5:c794dc00 r4:c7942300 [] (mntput_no_expire+0x0/0x194) from [] (sys_umount+0x2e4/0x310) r6:c7942300 r5:00000000 r4:00000000 r3:00000000 [] (sys_umount+0x0/0x310) from [] (ret_fast_syscall+0x0/0x30) ---[ end trace e5c83c92ada51c76 ]--- Cc: stable@kernel.org Signed-off-by: Rabin Vincent Signed-off-by: Linus Walleij Signed-off-by: Jens Axboe

Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

2011-11-07T03:02:23Z

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Add a 'reason' to wb_writeback_work writeback: send work item to queue_io, move_expired_inodes writeback: trace event balance_dirty_pages writeback: trace event bdi_dirty_ratelimit writeback: fix ppc compile warnings on do_div(long long, unsigned long) writeback: per-bdi background threshold writeback: dirty position control - bdi reserve area writeback: control dirty pause time writeback: limit max dirty pause time writeback: IO-less balance_dirty_pages() writeback: per task dirty rate limit writeback: stabilize bdi->dirty_ratelimit writeback: dirty rate control writeback: add bg_threshold parameter to __bdi_update_bandwidth() writeback: dirty position control writeback: account per-bdi accumulated dirtied pages

mm-add-comment-explaining-task-state-setting-in-bdi_forker_thread-fix

2011-11-01T00:30:49Z

fiddle wording Cc: Jan Kara Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

writeback: Add a 'reason' to wb_writeback_work

2011-10-30T16:33:36Z

This creates a new 'reason' field in a wb_writeback_work structure, which unambiguously identifies who initiates writeback activity. A 'wb_reason' enumeration has been added to writeback.h, to enumerate the possible reasons. The 'writeback_work_class' and tracepoint event class and 'writeback_queue_io' tracepoints are updated to include the symbolic 'reason' in all trace events. And the 'writeback_inodes_sbXXX' family of routines has had a wb_stats parameter added to them, so callers can specify why writeback is being started. Acked-by: Jan Kara Signed-off-by: Curt Wohlgemuth Signed-off-by: Wu Fengguang

writeback: stabilize bdi->dirty_ratelimit

2011-10-03T13:08:57Z

There are some imperfections in balanced_dirty_ratelimit. 1) large fluctuations The dirty_rate used for computing balanced_dirty_ratelimit is merely averaged in the past 200ms (very small comparing to the 3s estimation period for write_bw), which makes rather dispersed distribution of balanced_dirty_ratelimit. It's pretty hard to average out the singular points by increasing the estimation period. Considering that the averaging technique will introduce very undesirable time lags, I give it up totally. (btw, the 3s write_bw averaging time lag is much more acceptable because its impact is one-way and therefore won't lead to oscillations.) The more practical way is filtering -- most singular balanced_dirty_ratelimit points can be filtered out by remembering some prev_balanced_rate and prev_prev_balanced_rate. However the more reliable way is to guard balanced_dirty_ratelimit with task_ratelimit. 2) due to truncates and fs redirties, the (write_bw <=> dirty_rate) match could become unbalanced, which may lead to large systematical errors in balanced_dirty_ratelimit. The truncates, due to its possibly bumpy nature, can hardly be compensated smoothly. So let's face it. When some over-estimated balanced_dirty_ratelimit brings dirty_ratelimit high, dirty pages will go higher than the setpoint. task_ratelimit will in turn become lower than dirty_ratelimit. So if we consider both balanced_dirty_ratelimit and task_ratelimit and update dirty_ratelimit only when they are on the same side of dirty_ratelimit, the systematical errors in balanced_dirty_ratelimit won't be able to bring dirty_ratelimit far away. The balanced_dirty_ratelimit estimation may also be inaccurate near @limit or @freerun, however is less an issue. 3) since we ultimately want to - keep the fluctuations of task ratelimit as small as possible - keep the dirty pages around the setpoint as long time as possible the update policy used for (2) also serves the above goals nicely: if for some reason the dirty pages are high (task_ratelimit < dirty_ratelimit), and dirty_ratelimit is low (dirty_ratelimit < balanced_dirty_ratelimit), there is no point to bring up dirty_ratelimit in a hurry only to hurt both the above two goals. So, we make use of task_ratelimit to limit the update of dirty_ratelimit in two ways: 1) avoid changing dirty rate when it's against the position control target (the adjusted rate will slow down the progress of dirty pages going back to setpoint). 2) limit the step size. task_ratelimit is changing values step by step, leaving a consistent trace comparing to the randomly jumping balanced_dirty_ratelimit. task_ratelimit also has the nice smaller errors in stable state and typically larger errors when there are big errors in rate. So it's a pretty good limiting factor for the step size of dirty_ratelimit. Note that bdi->dirty_ratelimit is always tracking balanced_dirty_ratelimit. task_ratelimit is merely used as a limiting factor. Signed-off-by: Wu Fengguang