diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-10-06 17:36:48 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-10-06 17:36:48 -0700 |
| commit | 76e45035348c247a70ed50eb29a9906657e4444f (patch) | |
| tree | e4101b34b1a3ddfea00be656586c22f704b33a2d /fs/btrfs/extent-tree.c | |
| parent | 4c0ed7d8d6e3dc013c4599a837de84794baa5b62 (diff) | |
| parent | cbddcc4fa3443fe8cfb2ff8e210deb1f6a0eea38 (diff) | |
Merge tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"There's a bunch of performance improvements, most notably the FIEMAP
speedup, the new block group tree to speed up mount on large
filesystems, more io_uring integration, some sysfs exports and the
usual fixes and core updates.
Summary:
Performance:
- outstanding FIEMAP speed improvement
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in
files with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)
- improved inode logging, especially for directories (on dbench
workload throughput +25%, max latency -21%)
- improved buffered IO, remove redundant extent state tracking,
lowering memory consumption and avoiding rb tree traversal
- add sysfs tunable to let qgroup temporarily skip exact accounting
when deleting snapshot, leading to a speedup but requiring a rescan
after that, will be used by snapper
- support io_uring and buffered writes, until now it was just for
direct IO, with the no-wait semantics implemented in the buffered
write path it now works and leads to speed improvement in IOPS
(2x), throughput (2.2x), latency (depends, 2x to 150x)
- small performance improvements when dropping and searching for
extent maps as well as when flushing delalloc in COW mode
(throughput +5MB/s)
User visible changes:
- new incompatible feature block-group-tree adding a dedicated tree
for tracking block groups, this allows a much faster load during
mount and avoids seeking unlike when it's scattered in the extent
tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also
be updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)
- improved reporting of super block corruption detected by scrub
- scrub also tries to repair super block and does not wait until next
commit
- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)
- qgroup status is exported in sysfs
(/sys/sys/fs/btrfs/FSID/qgroups/)
- verify that super block was not modified when thawing filesystem
Fixes:
- FIEMAP fixes
- fix extent sharing status, does not depend on the cached status
where merged
- flush delalloc so compressed extents are reported correctly
- fix alignment of VMA for memory mapped files on THP
- send: fix failures when processing inodes with no links (orphan
files and directories)
- fix race between quota enable and quota rescan ioctl
- handle more corner cases for read-only compat feature verification
- fix missed extent on fsync after dropping extent maps
Core:
- lockdep annotations to validate various transactions states and
state transitions
- preliminary support for fs-verity in send
- more effective memory use in scrub for subpage where sector is
smaller than page
- block group caching progress logic has been removed, load is now
synchronous
- simplify end IO callbacks and bio handling, use chained bios
instead of own tracking
- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write
- cleanups and refactoring
MM changes:
- export balance_dirty_pages_ratelimited_flags"
* tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (177 commits)
btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer
btrfs: drop extent map range more efficiently
btrfs: avoid pointless extent map tree search when flushing delalloc
btrfs: remove unnecessary next extent map search
btrfs: remove unnecessary NULL pointer checks when searching extent maps
btrfs: assert tree is locked when clearing extent map from logging
btrfs: remove unnecessary extent map initializations
btrfs: remove the refcount warning/check at free_extent_map()
btrfs: add helper to replace extent map range with a new extent map
btrfs: move open coded extent map tree deletion out of inode eviction
btrfs: use cond_resched_rwlock_write() during inode eviction
btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
btrfs: move btrfs_drop_extent_cache() to extent_map.c
btrfs: fix missed extent on fsync after dropping extent maps
btrfs: remove stale prototype of btrfs_write_inode
btrfs: enable nowait async buffered writes
btrfs: assert nowait mode is not used for some btree search functions
btrfs: make btrfs_buffered_write nowait compatible
btrfs: plumb NOWAIT through the write path
btrfs: make lock_and_cleanup_extent_if_need nowait compatible
...
Diffstat (limited to 'fs/btrfs/extent-tree.c')
| -rw-r--r-- | fs/btrfs/extent-tree.c | 33 |
1 files changed, 23 insertions, 10 deletions
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6914cd8024ba..cd2d36580f1a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2220,6 +2220,12 @@ static noinline int check_delayed_ref(struct btrfs_root *root, } if (!mutex_trylock(&head->mutex)) { + if (path->nowait) { + spin_unlock(&delayed_refs->lock); + btrfs_put_transaction(cur_trans); + return -EAGAIN; + } + refcount_inc(&head->refs); spin_unlock(&delayed_refs->lock); @@ -2686,13 +2692,8 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, len = cache->start + cache->length - start; len = min(len, end + 1 - start); - down_read(&fs_info->commit_root_sem); - if (start < cache->last_byte_to_unpin && return_free_space) { - u64 add_len = min(len, cache->last_byte_to_unpin - start); - - btrfs_add_free_space(cache, start, add_len); - } - up_read(&fs_info->commit_root_sem); + if (return_free_space) + btrfs_add_free_space(cache, start, len); start += len; total_unpinned += len; @@ -3804,7 +3805,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, block_group->start == fs_info->data_reloc_bg || fs_info->data_reloc_bg == 0); - if (block_group->ro || block_group->zoned_data_reloc_ongoing) { + if (block_group->ro || + test_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags)) { ret = 1; goto out; } @@ -3881,7 +3883,7 @@ out: * regular extents) at the same time to the same zone, which * easily break the write pointer. */ - block_group->zoned_data_reloc_ongoing = 1; + set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &block_group->runtime_flags); fs_info->data_reloc_bg = 0; } spin_unlock(&fs_info->relocation_bg_lock); @@ -4888,6 +4890,9 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, !test_bit(BTRFS_ROOT_RESET_LOCKDEP_CLASS, &root->state)) lockdep_owner = BTRFS_FS_TREE_OBJECTID; + /* btrfs_clean_tree_block() accesses generation field. */ + btrfs_set_header_generation(buf, trans->transid); + /* * This needs to stay, because we could allocate a freed block from an * old tree into a new tree, so we need to make sure this new block is @@ -5639,6 +5644,8 @@ static noinline int walk_up_tree(struct btrfs_trans_handle *trans, */ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc) { + const bool is_reloc_root = (root->root_key.objectid == + BTRFS_TREE_RELOC_OBJECTID); struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_path *path; struct btrfs_trans_handle *trans; @@ -5798,6 +5805,9 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc) goto out_end_trans; } + if (!is_reloc_root) + btrfs_set_last_root_drop_gen(fs_info, trans->transid); + btrfs_end_transaction_throttle(trans); if (!for_reloc && btrfs_need_cleaner_sleep(fs_info)) { btrfs_debug(fs_info, @@ -5832,7 +5842,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc) goto out_end_trans; } - if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) { + if (!is_reloc_root) { ret = btrfs_find_root(tree_root, &root->root_key, path, NULL, NULL); if (ret < 0) { @@ -5864,6 +5874,9 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc) btrfs_put_root(root); root_dropped = true; out_end_trans: + if (!is_reloc_root) + btrfs_set_last_root_drop_gen(fs_info, trans->transid); + btrfs_end_transaction_throttle(trans); out_free: kfree(wc); |
