user/sven/linux.git/block, branch v4.19.226

block, bfq: fix use after free in bfq_bfqq_expire

2021-12-29T11:20:43Z

commit eed47d19d9362bdd958e4ab56af480b9dbf6b2b6 upstream. The function bfq_bfqq_expire() invokes the function __bfq_bfqq_expire(), and the latter may free the in-service bfq-queue. If this happens, then no other instruction of bfq_bfqq_expire() must be executed, or a use-after-free will occur. Basing on the assumption that __bfq_bfqq_expire() invokes bfq_put_queue() on the in-service bfq-queue exactly once, the queue is assumed to be freed if its refcounter is equal to one right before invoking __bfq_bfqq_expire(). But, since commit 9dee8b3b057e ("block, bfq: fix queue removal from weights tree") this assumption is false. __bfq_bfqq_expire() may also invoke bfq_weights_tree_remove() and, since commit 9dee8b3b057e ("block, bfq: fix queue removal from weights tree"), also the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire() may invoke bfq_put_queue() twice, and this is the actual case where the in-service queue may happen to be freed. To address this issue, this commit moves the check on the refcounter of the queue right around the last bfq_put_queue() that may be invoked on the queue. Fixes: 9dee8b3b057e ("block, bfq: fix queue removal from weights tree") Reported-by: Dmitrii Tcvetkov Reported-by: Douglas Anderson Tested-by: Dmitrii Tcvetkov Tested-by: Douglas Anderson Signed-off-by: Paolo Valente Signed-off-by: Jens Axboe Signed-off-by: Yu Kuai Signed-off-by: Greg Kroah-Hartman

block, bfq: fix queue removal from weights tree

2021-12-29T11:20:42Z

commit 9dee8b3b057e1da26f85f1842f2aaf3bb200fb94 upstream. bfq maintains an ordered list, through a red-black tree, of unique weights of active bfq_queues. This list is used to detect whether there are active queues with differentiated weights. The weight of a queue is removed from the list when both the following two conditions become true: (1) the bfq_queue is flagged as inactive (2) the has no in-flight request any longer; Unfortunately, in the rare cases where condition (2) becomes true before condition (1), the removal fails, because the function to remove the weight of the queue (bfq_weights_tree_remove) is rightly invoked in the path that deactivates the bfq_queue, but mistakenly invoked *before* the function that actually performs the deactivation (bfq_deactivate_bfqq). This commits moves the invocation of bfq_weights_tree_remove for condition (1) to after bfq_deactivate_bfqq. As a consequence of this move, it is necessary to add a further reference to the queue when the weight of a queue is added, because the queue might otherwise be freed before bfq_weights_tree_remove is invoked. This commit adds this reference and makes all related modifications. Signed-off-by: Paolo Valente Signed-off-by: Jens Axboe Signed-off-by: Yu Kuai Signed-off-by: Greg Kroah-Hartman

block, bfq: fix decrement of num_active_groups

2021-12-29T11:20:42Z

commit ba7aeae5539c7a7cccc4cf07a2bc61281a93c50e upstream. Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")', if there are process groups with I/O requests waiting for completion, then BFQ tags the scenario as 'asymmetric'. This detection is needed for preserving service guarantees (for details, see comments on the computation * of the variable asymmetric_scenario in the function bfq_better_to_idle). Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")' contains an error exactly in the updating of the number of groups with I/O requests waiting for completion: if a group has more than one descendant process, then the above number of groups, which is renamed from num_active_groups to a more appropriate num_groups_with_pending_reqs by this commit, may happen to be wrongly decremented multiple times, namely every time one of the descendant processes gets all its pending I/O requests completed. A correct, complete solution should work as follows. Consider a group that is inactive, i.e., that has no descendant process with pending I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs is still accounting for this group, because the group still has some descendant process with some I/O request still in flight. num_groups_with_pending_reqs should be decremented when the in-flight request of the last descendant process is finally completed (assuming that nothing else has changed for the group in the meantime, in terms of composition of the group and active/inactive state of child groups and processes). To accomplish this, an additional pending-request counter must be added to entities, and must be updated correctly. To avoid this additional field and operations, this commit resorts to the following tradeoff between simplicity and accuracy: for an inactive group that is still counted in num_groups_with_pending_reqs, this commit decrements num_groups_with_pending_reqs when the first descendant process of the group remains with no request waiting for completion. This simplified scheme provides a fix to the unbalanced decrements introduced by 2d29c9f89fcd. Since this error was also caused by lack of comments on this non-trivial issue, this commit also adds related comments. Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection") Reported-by: Steven Barrett Tested-by: Steven Barrett Tested-by: Lucjan Lucjanov Reviewed-by: Federico Motta Signed-off-by: Paolo Valente Signed-off-by: Jens Axboe Signed-off-by: Yu Kuai Signed-off-by: Greg Kroah-Hartman

block, bfq: fix asymmetric scenarios detection

2021-12-29T11:20:42Z

commit 98fa7a3e001b21fb47c08af4304f40a3b0535cbd upstream. Since commit 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection"), a scenario is defined asymmetric when one of the following conditions holds: - active bfq_queues have different weights - one or more group of entities (bfq_queue or other groups of entities) are active bfq grants fairness and low latency also in such asymmetric scenarios, by plugging the dispatching of I/O if the bfq_queue in service happens to be temporarily idle. This plugging may lower throughput, so it is important to do it only when strictly needed. By mistake, in commit '2d29c9f89fcd' ("block, bfq: improve asymmetric scenarios detection") the num_active_groups counter was firstly incremented and subsequently decremented at any entity (group or bfq_queue) weight change. This is useless, because only transitions from active to inactive and vice versa matter for that counter. Unfortunately this is also incorrect in the following case: the entity at issue is a bfq_queue and it is under weight raising. In fact in this case there is a spurious increment of the num_active_groups counter. This spurious increment may cause scenarios to be wrongly detected as asymmetric, thus causing useless plugging and loss of throughput. This commit fixes this issue by simply removing the above useless and wrong increments and decrements. Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection") Tested-by: Oleksandr Natalenko Signed-off-by: Federico Motta Signed-off-by: Paolo Valente Signed-off-by: Jens Axboe Signed-off-by: Yu Kuai Signed-off-by: Greg Kroah-Hartman

block, bfq: improve asymmetric scenarios detection

2021-12-29T11:20:42Z

commit 2d29c9f89fcd9bf408fcdaaf515c90a169f22ecd upstream. bfq defines as asymmetric a scenario where an active entity, say E (representing either a single bfq_queue or a group of other entities), has a higher weight than some other entities. If the entity E does sync I/O in such a scenario, then bfq plugs the dispatch of the I/O of the other entities in the following situation: E is in service but temporarily has no pending I/O request. In fact, without this plugging, all the times that E stops being temporarily idle, it may find the internal queues of the storage device already filled with an out-of-control number of extra requests, from other entities. So E may have to wait for the service of these extra requests, before finally having its own requests served. This may easily break service guarantees, with E getting less than its fair share of the device throughput. Usually, the end result is that E gets the same fraction of the throughput as the other entities, instead of getting more, according to its higher weight. Yet there are two other more subtle cases where E, even if its weight is actually equal to or even lower than the weight of any other active entities, may get less than its fair share of the throughput in case the above I/O plugging is not performed: 1. other entities issue larger requests than E; 2. other entities contain more active child entities than E (or in general tend to have more backlog than E). In the first case, other entities may get more service than E because they get larger requests, than those of E, served during the temporary idle periods of E. In the second case, other entities get more service because, by having many child entities, they have many requests ready for dispatching while E is temporarily idle. This commit addresses this issue by extending the definition of asymmetric scenario: a scenario is asymmetric when - active entities representing bfq_queues have differentiated weights, as in the original definition or (inclusive) - one or more entities representing groups of entities are active. This broader definition makes sure that I/O plugging will be performed in all the above cases, provided that there is at least one active group. Of course, this definition is very coarse, so it will trigger I/O plugging also in cases where it is not needed, such as, e.g., multiple active entities with just one child each, and all with the same I/O-request size. The reason for this coarse definition is just that a finer-grained definition would be rather heavy to compute. On the opposite end, even this new definition does not trigger I/O plugging in all cases where there is no active group, and all bfq_queues have the same weight. So, in these cases some unfairness may occur if there are asymmetries in I/O-request sizes. We made this choice because I/O plugging may lower throughput, and probably a user that has not created any group cares more about throughput than about perfect fairness. At any rate, as for possible applications that may care about service guarantees, bfq already guarantees a high responsiveness and a low latency to soft real-time applications automatically. Signed-off-by: Federico Motta Signed-off-by: Paolo Valente Signed-off-by: Jens Axboe Signed-off-by: Yu Kuai Signed-off-by: Greg Kroah-Hartman

block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2)

2021-12-14T09:18:07Z

commit e6a59aac8a8713f335a37d762db0dbe80e7f6d38 upstream. do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. Serialize ioprio_get to take the tasklist_lock in this case, just like it's set counterpart. Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()) Acked-by: Oleg Nesterov Signed-off-by: Davidlohr Bueso Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

Revert "block, bfq: honor already-setup queue merges"

2021-10-06T13:31:23Z

[ Upstream commit ebc69e897e17373fbe1daaff1debaa77583a5284 ] This reverts commit 2d52c58b9c9bdae0ca3df6a1eab5745ab3f7d80b. We have had several folks complain that this causes hangs for them, which is especially problematic as the commit has also hit stable already. As no resolution seems to be forthcoming right now, revert the patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214503 Fixes: 2d52c58b9c9b ("block, bfq: honor already-setup queue merges") Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()

2021-09-26T11:39:49Z

[ Upstream commit 884f0e84f1e3195b801319c8ec3d5774e9bf2710 ] The pending timer has been set up in blk_throtl_init(). However, the timer is not deleted in blk_throtl_exit(). This means that the timer handler may still be running after freeing the timer, which would result in a use-after-free. Fix by calling del_timer_sync() to delete the timer in blk_throtl_exit(). Signed-off-by: Li Jinlin Link: https://lore.kernel.org/r/20210907121242.2885564-1-lijinlin3@huawei.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block, bfq: honor already-setup queue merges

2021-09-22T09:48:13Z

[ Upstream commit 2d52c58b9c9bdae0ca3df6a1eab5745ab3f7d80b ] The function bfq_setup_merge prepares the merging between two bfq_queues, say bfqq and new_bfqq. To this goal, it assigns bfqq->new_bfqq = new_bfqq. Then, each time some I/O for bfqq arrives, the process that generated that I/O is disassociated from bfqq and associated with new_bfqq (merging is actually a redirection). In this respect, bfq_setup_merge increases new_bfqq->ref in advance, adding the number of processes that are expected to be associated with new_bfqq. Unfortunately, the stable-merging mechanism interferes with this setup. After bfqq->new_bfqq has been set by bfq_setup_merge, and before all the expected processes have been associated with bfqq->new_bfqq, bfqq may happen to be stably merged with a different queue than the current bfqq->new_bfqq. In this case, bfqq->new_bfqq gets changed. So, some of the processes that have been already accounted for in the ref counter of the previous new_bfqq will not be associated with that queue. This creates an unbalance, because those references will never be decremented. This commit fixes this issue by reestablishing the previous, natural behaviour: once bfqq->new_bfqq has been set, it will not be changed until all expected redirections have occurred. Signed-off-by: Davide Zini Signed-off-by: Paolo Valente Link: https://lore.kernel.org/r/20210802141352.74353-2-paolo.valente@linaro.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: bfq: fix bfq_set_next_ioprio_data()

2021-09-22T09:47:59Z

commit a680dd72ec336b81511e3bff48efac6dbfa563e7 upstream. For a request that has a priority level equal to or larger than IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This is not consistent with the warning and the allowed values for priority levels. Fix this by setting the request new_ioprio field to IOPRIO_BE_NR - 1, the lowest priority level allowed. Cc: Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: Damien Le Moal Reviewed-by: Hannes Reinecke Link: https://lore.kernel.org/r/20210811033702.368488-2-damien.lemoal@wdc.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman