diff options
| author | Abhishek Bapat <abhishekbapat@google.com> | 2026-01-15 21:31:03 +0000 |
|---|---|---|
| committer | Jan Kara <jack@suse.cz> | 2026-01-16 13:30:35 +0100 |
| commit | 77449e453dfc006ad738dec55374c4cbc056fd39 (patch) | |
| tree | df4360321216e4ac56b4ebcfbd83787e4f72d142 /include | |
| parent | aa62e130149f9cba53374e4ab51c9a9581dc9764 (diff) | |
quota: fix livelock between quotactl and freeze_super
When a filesystem is frozen, quotactl_block() enters a retry loop
waiting for the filesystem to thaw. It acquires s_umount, checks the
freeze state, drops s_umount and uses sb_start_write() - sb_end_write()
pair to wait for the unfreeze.
However, this retry loop can trigger a livelock issue, specifically on
kernels with preemption disabled.
The mechanism is as follows:
1. freeze_super() sets SB_FREEZE_WRITE and calls sb_wait_write().
2. sb_wait_write() calls percpu_down_write(), which initiates
synchronize_rcu().
3. Simultaneously, quotactl_block() spins in its retry loop, immediately
executing the sb_start_write() - sb_end_write() pair.
4. Because the kernel is non-preemptible and the loop contains no
scheduling points, quotactl_block() never yields the CPU. This
prevents that CPU from reaching an RCU quiescent state.
5. synchronize_rcu() in the freezer thread waits indefinitely for the
quotactl_block() CPU to report a quiescent state.
6. quotactl_block() spins indefinitely waiting for the freezer to
advance, which it cannot do as it is blocked on the RCU sync.
This results in a hang of the freezer process and 100% CPU usage by the
quota process.
While this can occur intermittently on multi-core systems, it is
reliably reproducing on a node with the following script, running both
the freezer and the quota toggle on the same CPU:
# mkfs.ext4 -O quota /dev/sda 2g && mkdir a_mount
# mount /dev/sda -o quota,usrquota,grpquota a_mount
# taskset -c 3 bash -c "while true; do xfs_freeze -f a_mount; \
xfs_freeze -u a_mount; done" &
# taskset -c 3 bash -c "while true; do quotaon a_mount; \
quotaoff a_mount; done" &
Adding cond_resched() to the retry loop fixes the issue. It acts as an
RCU quiescent state, allowing synchronize_rcu() in percpu_down_write()
to complete.
Fixes: 576215cffdef ("fs: Drop wait_unfrozen wait queue")
Signed-off-by: Abhishek Bapat <abhishekbapat@google.com>
Link: https://patch.msgid.link/20260115213103.1089129-1-abhishekbapat@google.com
Signed-off-by: Jan Kara <jack@suse.cz>
Diffstat (limited to 'include')
0 files changed, 0 insertions, 0 deletions
