user/sven/linux.git/mm/memcontrol.c, branch v5.7.4

mm, memcg: fix error return value of mem_cgroup_css_alloc()

2020-05-08T02:27:20Z

When I run my memcg testcase which creates lots of memcgs, I found there're unexpected out of memory logs while there're still enough available free memory. The error log is mkdir: cannot create directory 'foo.65533': Cannot allocate memory The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno should be -ENOSPC "No space left on device", which is an appropriate errno for userspace's failed mkdir. As the errno really misled me, we should make it right. After this patch, the error log will be mkdir: cannot create directory 'foo.65533': No space left on device [akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal] [akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal] Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Suggested-by: Matthew Wilcox Signed-off-by: Yafang Shao Signed-off-by: Andrew Morton Acked-by: Michal Hocko Acked-by: Johannes Weiner Cc: Vladimir Davydov Link: http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1586192163-20099-1-git-send-email-laoar.shao@gmail.com Signed-off-by: Linus Torvalds

mm, memcg: do not high throttle allocators based on wraparound

2020-04-10T22:36:20Z

If a cgroup violates its memory.high constraints, we may end up unduly penalising it. For example, for the following hierarchy: A: max high, 20 usage A/B: 9 high, 10 usage A/C: max high, 10 usage We would end up doing the following calculation below when calculating high delay for A/B: A/B: 10 - 9 = 1... A: 20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21. This gets worse with higher disparities in usage in the parent. I have no idea how this disappeared from the final version of the patch, but it is certainly Not Good(tm). This wasn't obvious in testing because, for a simple cgroup hierarchy with only one child, the result is usually roughly the same. It's only in more complex hierarchies that things go really awry (although still, the effects are limited to a maximum of 2 seconds in schedule_timeout_killable at a maximum). [chris@chrisdown.name: changelog] Fixes: e26733e0d0ec ("mm, memcg: throttle allocators based on ancestral memory.high") Signed-off-by: Jakub Kicinski Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: [5.4.x] Link: http://lkml.kernel.org/r/20200331152424.GA1019937@chrisdown.name Signed-off-by: Linus Torvalds

mm: use fallthrough;

2020-04-07T17:43:41Z

Convert the various /* fallthrough */ comments to the pseudo-keyword fallthrough; Done via script: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Reviewed-by: Gustavo A. R. Silva Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com Signed-off-by: Linus Torvalds

mm, memcg: bypass high reclaim iteration for cgroup hierarchy root

2020-04-07T17:43:37Z

The root of the hierarchy cannot have high set, so we will never reclaim based on it. This makes that clearer and avoids another entry. Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Johannes Weiner Cc: Tejun Heo Cc: Roman Gushchin Cc: Michal Hocko Link: http://lkml.kernel.org/r/20200312164137.GA1753625@chrisdown.name Signed-off-by: Linus Torvalds

mm: memcg: make memory.oom.group tolerable to task migration

2020-04-02T16:35:29Z

If a task is getting moved out of the OOMing cgroup, it might result in unexpected OOM killings if memory.oom.group is used anywhere in the cgroup tree. Imagine the following example: A (oom.group = 1) / \ (OOM) B C Let's say B's memory.max is exceeded and it's OOMing. The OOM killer selects a task in B as a victim, but someone asynchronously moves the task into C. mem_cgroup_get_oom_group() will iterate over all ancestors of C up to the root cgroup. In theory it had to stop at the oom_domain level - the memory cgroup which is OOMing. But because B is not an ancestor of C, it's not happening. Instead it chooses A (because it's oom.group is set), and kills all tasks in A. This behavior is wrong because the OOM happened in B, so there is no reason to kill anything outside. Fix this by checking it the memory cgroup to which the task belongs is a descendant of the oom_domain. If not, memory.oom.group should be ignored, and the OOM killer should kill only the victim task. Reported-by: Dan Schatzberg Signed-off-by: Roman Gushchin Signed-off-by: Andrew Morton Acked-by: Michal Hocko Acked-by: Johannes Weiner Link: http://lkml.kernel.org/r/20200316223510.3176148-1-guro@fb.com Signed-off-by: Linus Torvalds

mm, memcg: prevent mem_cgroup_protected store tearing

2020-04-02T16:35:29Z

The read side of this is all protected, but we can still tear if multiple iterations of mem_cgroup_protected are going at the same time. There's some intentional racing in mem_cgroup_protected which is ok, but load/store tearing should be avoided. Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Roman Gushchin Cc: Tejun Heo Link: http://lkml.kernel.org/r/d1e9fbc0379fe8db475d82c8b6fbe048876e12ae.1584034301.git.chris@chrisdown.name Signed-off-by: Linus Torvalds

mm, memcg: prevent memory.swap.max load tearing

2020-04-02T16:35:29Z

The write side of this is xchg()/smp_mb(), so that's all good. Just a few sites missing a READ_ONCE. Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Roman Gushchin Cc: Tejun Heo Link: http://lkml.kernel.org/r/bbec2c3d822217334855c8877a9d28b2a6d395fb.1584034301.git.chris@chrisdown.name Signed-off-by: Linus Torvalds

mm, memcg: prevent memory.min load/store tearing

2020-04-02T16:35:29Z

This can be set concurrently with reads, which may cause the wrong value to be propagated. Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Roman Gushchin Cc: Tejun Heo Link: http://lkml.kernel.org/r/e809b4e6b0c1626dac6945970de06409a180ee65.1584034301.git.chris@chrisdown.name Signed-off-by: Linus Torvalds

mm, memcg: prevent memory.max load tearing

2020-04-02T16:35:28Z

This one is a bit more nuanced because we have memcg_max_mutex, which is mostly just used for enforcing invariants, but we still need to READ_ONCE since (despite its name) it doesn't really protect memory.max access. On write (page_counter_set_max() and memory_max_write()) we use xchg(), which uses smp_mb(), so that's already fine. Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Roman Gushchin Cc: Tejun Heo Link: http://lkml.kernel.org/r/50a31e5f39f8ae6c8fb73966ba1455f0924e8f44.1584034301.git.chris@chrisdown.name Signed-off-by: Linus Torvalds

mm, memcg: prevent memory.high load/store tearing

2020-04-02T16:35:28Z

A mem_cgroup's high attribute can be concurrently set at the same time as we are trying to read it -- for example, if we are in memory_high_write at the same time as we are trying to do high reclaim. Signed-off-by: Chris Down Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Roman Gushchin Cc: Tejun Heo Link: http://lkml.kernel.org/r/2f66f7038ed1d4688e59de72b627ae0ea52efa83.1584034301.git.chris@chrisdown.name Signed-off-by: Linus Torvalds