<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/buffer.c, branch v4.13.14</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.13.14</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.13.14'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-07-10T23:32:30Z</updated>
<entry>
<title>fs/buffer.c: make bh_lru_install() more efficient</title>
<updated>2017-07-10T23:32:30Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2017-07-10T22:47:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=241f01fbeda2521f802eeef4de0261387e6e9c1d'/>
<id>urn:sha1:241f01fbeda2521f802eeef4de0261387e6e9c1d</id>
<content type='text'>
To install a buffer_head into the cpu's LRU queue, bh_lru_install()
would construct a new copy of the queue and then memcpy it over the real
queue.  But it's easily possible to do the update in-place, which is
faster and simpler.  Some work can also be skipped if the buffer_head
was already in the queue.

As a microbenchmark I timed how long it takes to run sb_getblk()
10,000,000 times alternating between BH_LRU_SIZE + 1 blocks.
Effectively, this benchmarks looking up buffer_heads that are in the
page cache but not in the LRU:

	Before this patch: 1.758s
	After this patch: 1.653s

This patch also removes about 350 bytes of compiled code (on x86_64),
partly due to removal of the memcpy() which was being inlined+unrolled.

Link: http://lkml.kernel.org/r/20161229193445.1913-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux</title>
<updated>2017-07-10T17:51:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-10T17:51:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=642338ba33c5331f2b94ca3944845741fbbf8b89'/>
<id>urn:sha1:642338ba33c5331f2b94ca3944845741fbbf8b89</id>
<content type='text'>
Pull XFS updates from Darrick Wong:
 "Here are some changes for you for 4.13. For the most part it's fixes
  for bugs and deadlock problems, and preparation for online fsck in
  some future merge window.

   - Avoid quotacheck deadlocks

   - Fix transaction overflows when bunmapping fragmented files

   - Refactor directory readahead

   - Allow admin to configure if ASSERT is fatal

   - Improve transaction usage detail logging during overflows

   - Minor cleanups

   - Don't leak log items when the log shuts down

   - Remove double-underscore typedefs

   - Various preparation for online scrubbing

   - Introduce new error injection configuration sysfs knobs

   - Refactor dq_get_next to use extent map directly

   - Fix problems with iterating the page cache for unwritten data

   - Implement SEEK_{HOLE,DATA} via iomap

   - Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA

   - Don't use MAXPATHLEN to check on-disk symlink target lengths"

* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
  xfs: don't crash on unexpected holes in dir/attr btrees
  xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
  xfs: fix contiguous dquot chunk iteration livelock
  xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
  vfs: Add iomap_seek_hole and iomap_seek_data helpers
  vfs: Add page_cache_seek_hole_data helper
  xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
  xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
  xfs: Check for m_errortag initialization in xfs_errortag_test
  xfs: grab dquots without taking the ilock
  xfs: fix semicolon.cocci warnings
  xfs: Don't clear SGID when inheriting ACLs
  xfs: free cowblocks and retry on buffered write ENOSPC
  xfs: replace log_badcrc_factor knob with error injection tag
  xfs: convert drop_writes to use the errortag mechanism
  xfs: remove unneeded parameter from XFS_TEST_ERROR
  xfs: expose errortag knobs via sysfs
  xfs: make errortag a per-mountpoint structure
  xfs: free uncommitted transactions during log recovery
  xfs: don't allow bmap on rt files
  ...
</content>
</entry>
<entry>
<title>Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4</title>
<updated>2017-07-09T16:31:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-09T16:31:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc2c6421cbb420677c4bb56adaf434414770ce8a'/>
<id>urn:sha1:bc2c6421cbb420677c4bb56adaf434414770ce8a</id>
<content type='text'>
Pull ext4 updates from Ted Ts'o:
 "The first major feature for ext4 this merge window is the largedir
  feature, which allows ext4 directories to support over 2 billion
  directory entries (assuming ~64 byte file names; in practice, users
  will run into practical performance limits first.) This feature was
  originally written by the Lustre team, and credit goes to Artem
  Blagodarenko from Seagate for getting this feature upstream.

  The second major major feature allows ext4 to support extended
  attribute values up to 64k. This feature was also originally from
  Lustre, and has been enhanced by Tahsin Erdogan from Google with a
  deduplication feature so that if multiple files have the same xattr
  value (for example, Windows ACL's stored by Samba), only one copy will
  be stored on disk for encoding and caching efficiency.

  We also have the usual set of bug fixes, cleanups, and optimizations"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: fix spelling mistake: "prellocated" -&gt; "preallocated"
  ext4: fix __ext4_new_inode() journal credits calculation
  ext4: skip ext4_init_security() and encryption on ea_inodes
  fs: generic_block_bmap(): initialize all of the fields in the temp bh
  ext4: change fast symlink test to not rely on i_blocks
  ext4: require key for truncate(2) of encrypted file
  ext4: don't bother checking for encryption key in -&gt;mmap()
  ext4: check return value of kstrtoull correctly in reserved_clusters_store
  ext4: fix off-by-one fsmap error on 1k block filesystems
  ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry()
  ext4: return EIO on read error in ext4_find_entry
  ext4: forbid encrypting root directory
  ext4: send parallel discards on commit completions
  ext4: avoid unnecessary stalls in ext4_evict_inode()
  ext4: add nombcache mount option
  ext4: strong binding of xattr inode references
  ext4: eliminate xattr entry e_hash recalculation for removes
  ext4: reserve space for xattr entries/names
  quota: add get_inode_usage callback to transfer multi-inode charges
  ext4: xattr inode deduplication
  ...
</content>
</entry>
<entry>
<title>Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux</title>
<updated>2017-07-08T02:38:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-08T02:38:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750'/>
<id>urn:sha1:088737f44bbf6378745f5b57b035e57ee3dc4750</id>
<content type='text'>
Pull Writeback error handling updates from Jeff Layton:
 "This pile represents the bulk of the writeback error handling fixes
  that I have for this cycle. Some of the earlier patches in this pile
  may look trivial but they are prerequisites for later patches in the
  series.

  The aim of this set is to improve how we track and report writeback
  errors to userland. Most applications that care about data integrity
  will periodically call fsync/fdatasync/msync to ensure that their
  writes have made it to the backing store.

  For a very long time, we have tracked writeback errors using two flags
  in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
  writeback error occurs (via mapping_set_error) and are cleared as a
  side-effect of filemap_check_errors (as you noted yesterday). This
  model really sucks for userland.

  Only the first task to call fsync (or msync or fdatasync) will see the
  error. Any subsequent task calling fsync on a file will get back 0
  (unless another writeback error occurs in the interim). If I have
  several tasks writing to a file and calling fsync to ensure that their
  writes got stored, then I need to have them coordinate with one
  another. That's difficult enough, but in a world of containerized
  setups that coordination may even not be possible.

  But wait...it gets worse!

  The calls to filemap_check_errors can be buried pretty far down in the
  call stack, and there are internal callers of filemap_write_and_wait
  and the like that also end up clearing those errors. Many of those
  callers ignore the error return from that function or return it to
  userland at nonsensical times (e.g. truncate() or stat()). If I get
  back -EIO on a truncate, there is no reason to think that it was
  because some previous writeback failed, and a subsequent fsync() will
  (incorrectly) return 0.

  This pile aims to do three things:

   1) ensure that when a writeback error occurs that that error will be
      reported to userland on a subsequent fsync/fdatasync/msync call,
      regardless of what internal callers are doing

   2) report writeback errors on all file descriptions that were open at
      the time that the error occurred. This is a user-visible change,
      but I think most applications are written to assume this behavior
      anyway. Those that aren't are unlikely to be hurt by it.

   3) document what filesystems should do when there is a writeback
      error. Today, there is very little consistency between them, and a
      lot of cargo-cult copying. We need to make it very clear what
      filesystems should do in this situation.

  To achieve this, the set adds a new data type (errseq_t) and then
  builds new writeback error tracking infrastructure around that. Once
  all of that is in place, we change the filesystems to use the new
  infrastructure for reporting wb errors to userland.

  Note that this is just the initial foray into cleaning up this mess.
  There is a lot of work remaining here:

   1) convert the rest of the filesystems in a similar fashion. Once the
      initial set is in, then I think most other fs' will be fairly
      simple to convert. Hopefully most of those can in via individual
      filesystem trees.

   2) convert internal waiters on writeback to use errseq_t for
      detecting errors instead of relying on the AS_* flags. I have some
      draft patches for this for ext4, but they are not quite ready for
      prime time yet.

  This was a discussion topic this year at LSF/MM too. If you're
  interested in the gory details, LWN has some good articles about this:

      https://lwn.net/Articles/718734/
      https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  btrfs: minimal conversion to errseq_t writeback error reporting on fsync
  xfs: minimal conversion to errseq_t writeback error reporting
  ext4: use errseq_t based error handling for reporting data writeback errors
  fs: convert __generic_file_fsync to use errseq_t based reporting
  block: convert to errseq_t based writeback error tracking
  dax: set errors in mapping when writeback fails
  Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
  fs: new infrastructure for writeback error handling and reporting
  lib: add errseq_t type and infrastructure for handling it
  mm: don't TestClearPageError in __filemap_fdatawait_range
  mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
  jbd2: don't clear and reset errors after waiting on writeback
  buffer: set errors in mapping at the time that the error occurs
  fs: check for writeback errors after syncing out buffers in generic_file_fsync
  buffer: use mapping_set_error instead of setting the flag
  mm: fix mapping_set_error call in me_pagecache_dirty
</content>
</entry>
<entry>
<title>buffer: set errors in mapping at the time that the error occurs</title>
<updated>2017-07-06T11:02:21Z</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@redhat.com</email>
</author>
<published>2017-07-06T11:02:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=87354e5de04fe727227ff619af164202adcfa4d4'/>
<id>urn:sha1:87354e5de04fe727227ff619af164202adcfa4d4</id>
<content type='text'>
I noticed on xfs that I could still sometimes get back an error on fsync
on a fd that was opened after the error condition had been cleared.

The problem is that the buffer code sets the write_io_error flag and
then later checks that flag to set the error in the mapping. That flag
perisists for quite a while however. If the file is later opened with
O_TRUNC, the buffers will then be invalidated and the mapping's error
set such that a subsequent fsync will return error. I think this is
incorrect, as there was no writeback between the open and fsync.

Add a new mark_buffer_write_io_error operation that sets the flag and
the error in the mapping at the same time. Replace all calls to
set_buffer_write_io_error with mark_buffer_write_io_error, and remove
the places that check this flag in order to set the error in the
mapping.

This sets the error in the mapping earlier, at the time that it's first
detected.

Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Carlos Maiolino &lt;cmaiolino@redhat.com&gt;
</content>
</entry>
<entry>
<title>buffer: use mapping_set_error instead of setting the flag</title>
<updated>2017-07-06T11:02:20Z</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@redhat.com</email>
</author>
<published>2017-07-06T11:02:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d945b59db8449ab8323995391c6a63525b3666f6'/>
<id>urn:sha1:d945b59db8449ab8323995391c6a63525b3666f6</id>
<content type='text'>
Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>fs: generic_block_bmap(): initialize all of the fields in the temp bh</title>
<updated>2017-07-05T04:56:21Z</updated>
<author>
<name>Alexander Potapenko</name>
<email>glider@google.com</email>
</author>
<published>2017-07-05T04:56:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2a527d6858c246db8afc3d576dbcbff0902f933b'/>
<id>urn:sha1:2a527d6858c246db8afc3d576dbcbff0902f933b</id>
<content type='text'>
KMSAN (KernelMemorySanitizer, a new error detection tool) reports the
use of uninitialized memory in ext4_update_bh_state():

==================================================================
BUG: KMSAN: use of unitialized memory
CPU: 3 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0-rc6+ #597
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
 0000000000000282 ffff88003cc96f68 ffffffff81f30856 0000003000000008
 ffff88003cc96f78 0000000000000096 ffffffff8169742a ffff88003cc96ff8
 ffffffff812fc1fc 0000000000000008 ffff88003a1980e8 0000000100000000
Call Trace:
 [&lt;     inline     &gt;] __dump_stack lib/dump_stack.c:15
 [&lt;ffffffff81f30856&gt;] dump_stack+0xa6/0xc0 lib/dump_stack.c:51
 [&lt;ffffffff812fc1fc&gt;] kmsan_report+0x1ec/0x300 mm/kmsan/kmsan.c:?
 [&lt;ffffffff812fc33b&gt;] __msan_warning+0x2b/0x40 ??:?
 [&lt;     inline     &gt;] ext4_update_bh_state fs/ext4/inode.c:727
 [&lt;ffffffff8169742a&gt;] _ext4_get_block+0x6ca/0x8a0 fs/ext4/inode.c:759
 [&lt;ffffffff81696d4c&gt;] ext4_get_block+0x8c/0xa0 fs/ext4/inode.c:769
 [&lt;ffffffff814a2d36&gt;] generic_block_bmap+0x246/0x2b0 fs/buffer.c:2991
 [&lt;ffffffff816ca30e&gt;] ext4_bmap+0x5ee/0x660 fs/ext4/inode.c:3177
...
origin description: ----tmp@generic_block_bmap
==================================================================

(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

The local |tmp| is created in generic_block_bmap() and then passed into
ext4_bmap() =&gt; ext4_get_block() =&gt; _ext4_get_block() =&gt;
ext4_update_bh_state(). Along the way tmp.b_page is never initialized
before ext4_update_bh_state() checks its value.

[ Use the approach suggested by Kees Cook of initializing the whole bh
  structure.]

Signed-off-by: Alexander Potapenko &lt;glider@google.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>vfs: Add page_cache_seek_hole_data helper</title>
<updated>2017-07-03T05:46:13Z</updated>
<author>
<name>Andreas Gruenbacher</name>
<email>agruenba@redhat.com</email>
</author>
<published>2017-06-29T18:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=334fd34d76f237c0ee58dfc400d2c4e34d660544'/>
<id>urn:sha1:334fd34d76f237c0ee58dfc400d2c4e34d660544</id>
<content type='text'>
Both ext4 and xfs implement seeking for the next hole or piece of data
in unwritten extents by scanning the page cache, and both versions share
the same bug when iterating the buffers of a page: the start offset into
the page isn't taken into account, so when a page fits more than two
filesystem blocks, things will go wrong.  For example, on a filesystem
with a block size of 1k, the following command will fail:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Introduce a generic vfs helper for seeking in the page cache that gets
this right.  The next commits will replace the filesystem specific
implementations.

Signed-off-by: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
[hch: dropped the export]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</content>
</entry>
<entry>
<title>fs: add support for buffered writeback to pass down write hints</title>
<updated>2017-06-27T18:05:39Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2017-06-27T15:30:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8e8f9298818c4c2754182d544158cb182581a9ab'/>
<id>urn:sha1:8e8f9298818c4c2754182d544158cb182581a9ab</id>
<content type='text'>
Reviewed-by: Andreas Dilger &lt;adilger@dilger.ca&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: switch bios to blk_status_t</title>
<updated>2017-06-09T15:27:32Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-06-03T07:38:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4e4cbee93d56137ebff722be022cae5f70ef84fb'/>
<id>urn:sha1:4e4cbee93d56137ebff722be022cae5f70ef84fb</id>
<content type='text'>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
