<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/memcontrol.h, branch v6.17.10</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.10</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.10'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-10-19T14:37:00Z</updated>
<entry>
<title>memcg: skip cgroup_file_notify if spinning is not allowed</title>
<updated>2025-10-19T14:37:00Z</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-09-22T22:02:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9d1a250a7303ad274901aab0e289f6555c549666'/>
<id>urn:sha1:9d1a250a7303ad274901aab0e289f6555c549666</id>
<content type='text'>
commit fcc0669c5aa681994c507b50f1c706c969d99730 upstream.

Generally memcg charging is allowed from all the contexts including NMI
where even spinning on spinlock can cause locking issues.  However one
call chain was missed during the addition of memcg charging from any
context support.  That is try_charge_memcg() -&gt; memcg_memory_event() -&gt;
cgroup_file_notify().

The possible function call tree under cgroup_file_notify() can acquire
many different spin locks in spinning mode.  Some of them are
cgroup_file_kn_lock, kernfs_notify_lock, pool_workqeue's lock.  So, let's
just skip cgroup_file_notify() from memcg charging if the context does not
allow spinning.

Alternative approach was also explored where instead of skipping
cgroup_file_notify(), we defer the memcg event processing to irq_work [1].
However it adds complexity and it was decided to keep things simple until
we need more memcg events with !allow_spinning requirement.

Link: https://lore.kernel.org/all/5qi2llyzf7gklncflo6gxoozljbm4h3tpnuv4u4ej4ztysvi6f@x44v7nz2wdzd/ [1]
Link: https://lkml.kernel.org/r/20250922220203.261714-1-shakeel.butt@linux.dev
Fixes: 3ac4638a734a ("memcg: make memcg_rstat_updated nmi safe")
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Closes: https://lore.kernel.org/all/20250905061919.439648-1-yepeilin@google.com/
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peilin Ye &lt;yepeilin@google.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.</title>
<updated>2025-10-15T10:03:44Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2025-08-15T20:16:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8102fd165c220c5b0b3bfd948c225e1cc00b93c9'/>
<id>urn:sha1:8102fd165c220c5b0b3bfd948c225e1cc00b93c9</id>
<content type='text'>
[ Upstream commit 68889dfd547bd8eabc5a98b58475d7b901cf5129 ]

When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
sk-&gt;sk_memcg based on the current task.

MPTCP subflow socket creation is triggered from userspace or
an in-kernel worker.

In the latter case, sk-&gt;sk_memcg is not what we want.  So, we fix
it up from the parent socket's sk-&gt;sk_memcg in mptcp_attach_cgroup().

Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
under #ifdef CONFIG_SOCK_CGROUP_DATA.

The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
correctly.

Let's move the code out of the wrong ifdef guard.

Note that sk-&gt;sk_memcg is freed in sk_prot_free() and the parent
sk holds the refcnt of memcg-&gt;css here, so we don't need to use
css_tryget().

Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Link: https://patch.msgid.link/20250815201712.1745332-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>memcg: convert memcg-&gt;socket_pressure to u64</title>
<updated>2025-07-25T02:12:32Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2025-07-17T19:46:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=378bdb97405a00bf03d8d993435c83add1688e36'/>
<id>urn:sha1:378bdb97405a00bf03d8d993435c83add1688e36</id>
<content type='text'>
memcg-&gt;socket_pressure is initialised with jiffies when the memcg is
created.

Once vmpressure detects that the cgroup is under memory pressure, the
field is updated with jiffies + HZ to signal the fact to the socket layer
and suppress memory allocation for one second.

Otherwise, the field is not updated.

mem_cgroup_under_socket_pressure() uses time_before() to check if jiffies
is less than memcg-&gt;socket_pressure, and this has a bug on 32-bit kernel.

  if (time_before(jiffies, memcg-&gt;socket_pressure))
          return true;

As time_before() casts the final result to long, the acceptable delta
between two timestamps is 2 ^ (BITS_PER_LONG - 1).

On 32-bit kernel with CONFIG_HZ=1000, this is about 24 days.

  &gt;&gt;&gt; (2 ** 31) / 1000 / 60 / 60 / 24
  24.855134814814818

Once 24 days have passed since the last update of socket_pressure,
mem_cgroup_under_socket_pressure() starts to lie until the next 24 days
pass.

We don't need to worry about this on 64-bit machines unless they serve for
300 million years.

  &gt;&gt;&gt; (2 ** 63) / 1000 / 60 / 60 / 24 / 365
  292471208.6775361

Let's convert memcg-&gt;socket_pressure to u64.

Performance teting:

I don't have a real 32-bit machine so this is a result on QEMU, but
with/without the u64 jiffie patch, the time spent in
mem_cgroup_under_socket_pressure() was 1~5us and I didn't see any
measurable delta.

no patch applied:
iperf3   273 [000]   137.296248:
probe:mem_cgroup_under_socket_pressure: (c13660d0)
                c13660d1 mem_cgroup_under_socket_pressure+0x1
([kernel.kallsyms])
iperf3   273 [000]   137.296249:
probe:mem_cgroup_under_socket_pressure__return: (c13660d0 &lt;- c1d8fd7f)
iperf3   273 [000]   137.296251:
probe:mem_cgroup_under_socket_pressure: (c13660d0)
                c13660d1 mem_cgroup_under_socket_pressure+0x1
([kernel.kallsyms])
iperf3   273 [000]   137.296253:
probe:mem_cgroup_under_socket_pressure__return: (c13660d0 &lt;- c1d8fd7f)


u64 jiffies patch applied:
iperf3   308 [001]   330.669370:
probe:mem_cgroup_under_socket_pressure: (c12ddba0)
                c12ddba1 mem_cgroup_under_socket_pressure+0x1
([kernel.kallsyms])
iperf3   308 [001]   330.669371:
probe:mem_cgroup_under_socket_pressure__return: (c12ddba0 &lt;- c1ce98bf)
iperf3   308 [001]   330.669382:
probe:mem_cgroup_under_socket_pressure: (c12ddba0)
                c12ddba1 mem_cgroup_under_socket_pressure+0x1
([kernel.kallsyms])
iperf3   308 [001]   330.669384:
probe:mem_cgroup_under_socket_pressure__return: (c12ddba0 &lt;- c1ce98bf)

So the u64 approach is good enough.

Link: https://lkml.kernel.org/r/20250717194645.1096500-1-kuniyu@google.com
Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reported-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;ncardwell@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: nmi safe memcg stats for specific archs</title>
<updated>2025-06-01T05:46:08Z</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-05-19T06:31:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=940b01fc8dc1aead398819215650727cb9e7335e'/>
<id>urn:sha1:940b01fc8dc1aead398819215650727cb9e7335e</id>
<content type='text'>
There are archs which have NMI but does not support this_cpu_* ops safely
in the nmi context but they support safe atomic ops in nmi context.  For
such archs, let's add infra to use atomic ops for the memcg stats which
can be updated in nmi.

At the moment, the memcg stats which get updated in the objcg charging
path are MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B &amp; NR_SLAB_UNRECLAIMABLE_B. 
Rather than adding support for all memcg stats to be nmi safe, let's just
add infra to make these three stats nmi safe which this patch is doing.

Link: https://lkml.kernel.org/r/20250519063142.111219-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: make count_memcg_events re-entrant safe against irqs</title>
<updated>2025-05-22T21:55:38Z</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-05-14T18:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e52401e7247bb36cabba389ae32fb75a12a6e94b'/>
<id>urn:sha1:e52401e7247bb36cabba389ae32fb75a12a6e94b</id>
<content type='text'>
Let's make count_memcg_events re-entrant safe against irqs.  The only
thing needed is to convert the usage of __this_cpu_add() to
this_cpu_add().  In addition, with re-entrant safety, there is no need to
disable irqs.  Also add warnings for in_nmi() as it is not safe against
nmi context.

Link: https://lkml.kernel.org/r/20250514184158.3471331-5-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: make mod_memcg_state re-entrant safe against irqs</title>
<updated>2025-05-22T21:55:38Z</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-05-14T18:41:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8814e3b8692b31e0150a894dd70c14ca0b7b746a'/>
<id>urn:sha1:8814e3b8692b31e0150a894dd70c14ca0b7b746a</id>
<content type='text'>
Let's make mod_memcg_state re-entrant safe against irqs.  The only thing
needed is to convert the usage of __this_cpu_add() to this_cpu_add().  In
addition, with re-entrant safety, there is no need to disable irqs.

mod_memcg_state() is not safe against nmi, so let's add warning if someone
tries to call it in nmi context.

Link: https://lkml.kernel.org/r/20250514184158.3471331-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memcg: move mem_cgroup_init() ahead of cgroup_init()</title>
<updated>2025-05-13T06:50:34Z</updated>
<author>
<name>Huan Yang</name>
<email>link@vivo.com</email>
</author>
<published>2025-04-25T03:19:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc9817bb7a21f64fbca2c4b83811d943036ec870'/>
<id>urn:sha1:bc9817bb7a21f64fbca2c4b83811d943036ec870</id>
<content type='text'>
Patch series "Use kmem_cache for memcg alloc", v3.

(willy tldr: "you've gone from allocating 8 objects per 32KiB to
allocating 13 objects per 32KiB, a 62% improvement in memory consumption"
[1])


The mem_cgroup_alloc function creates mem_cgroup struct and it's
associated structures including mem_cgroup_per_node.  Through detailed
analysis on our test machine (Arm64, 16GB RAM, 6.6 kernel, 1 NUMA node,
memcgv2 with nokmem,nosocket,cgroup_disable=pressure), we can observe the
memory allocation for these structures using the following shell commands:

  # Enable tracing
  echo 1 &gt; /sys/kernel/tracing/events/kmem/kmalloc/enable
  echo 1 &gt; /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmalloc | grep mem_cgroup

  # Trigger allocation if cgroup subtree do not enable memcg
  echo +memory &gt; /sys/fs/cgroup/cgroup.subtree_control

Ftrace Output:

  # mem_cgroup struct allocation
  sh-6312    [000] ..... 58015.698365: kmalloc:
    call_site=mem_cgroup_css_alloc+0xd8/0x5b4
    ptr=000000003e4c3799 bytes_req=2312 bytes_alloc=4096
    gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false

  # mem_cgroup_per_node allocation
  sh-6312    [000] ..... 58015.698389: kmalloc:
    call_site=mem_cgroup_css_alloc+0x1d8/0x5b4
    ptr=00000000d798700c bytes_req=2896 bytes_alloc=4096
    gfp_flags=GFP_KERNEL|__GFP_ZERO node=0 accounted=false

Key Observations:

  1. Both structures use kmalloc with requested sizes between 2KB-4KB
  2. Allocation alignment forces 4KB slab usage due to pre-defined sizes
     (64B, 128B,..., 2KB, 4KB, 8KB)
  3. Memory waste per memcg instance:
      Base struct: 4096 - 2312 = 1784 bytes
      Per-node struct: 4096 - 2896 = 1200 bytes
      Total waste: 2984 bytes (1-node system)
      NUMA scaling: (1200 + 8) * nr_node_ids bytes

So, it's a little waste.

This patchset introduces dedicated kmem_cache:
  Patch2 - mem_cgroup kmem_cache - memcg_cachep
  Patch3 - mem_cgroup_per_node kmem_cache - memcg_pn_cachep

The benefits of this change can be observed with the following tracing
commands:

  # Enable tracing
  echo 1 &gt; /sys/kernel/tracing/events/kmem/kmem_cache_alloc/enable
  echo 1 &gt; /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmem_cache_alloc | grep mem_cgroup
  # In another terminal:
  echo +memory &gt; /sys/fs/cgroup/cgroup.subtree_control


The output might now look like this:

  # mem_cgroup struct allocation
  sh-9827     [000] .....   289.513598: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=00000000695c1806
    bytes_req=2312 bytes_alloc=2368 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false
  # mem_cgroup_per_node allocation
  sh-9827     [000] .....   289.513602: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=000000002989e63a
    bytes_req=2896 bytes_alloc=2944 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false

This indicates that the `mem_cgroup` struct now requests 2312 bytes and is
allocated 2368 bytes, while `mem_cgroup_per_node` requests 2896 bytes and
is allocated 2944 bytes.  The slight increase in allocated size is due to
`SLAB_HWCACHE_ALIGN` in the `kmem_cache`.

Without `SLAB_HWCACHE_ALIGN`, the allocation might appear as:

  # mem_cgroup struct allocation
  sh-9269     [003] .....    80.396366: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=000000005b12b475
    bytes_req=2312 bytes_alloc=2312 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false

  # mem_cgroup_per_node allocation
  sh-9269     [003] .....    80.396411: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=00000000f347adc6
    bytes_req=2896 bytes_alloc=2896 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false

While the `bytes_alloc` now matches the `bytes_req`, this patchset
defaults to using `SLAB_HWCACHE_ALIGN` as it is generally considered more
beneficial for performance.  Please let me know if there are any issues or
if I've misunderstood anything.

This patchset also move mem_cgroup_init ahead of cgroup_init() due to
cgroup_init() will allocate root_mem_cgroup, but each initcall invoke
after cgroup_init, so if each kmem_cache do not prepare, we need testing
NULL before use it.


This patch (of 3):

When cgroup_init() creates root_mem_cgroup through css_alloc callback,
some critical resources might not be fully initialized, forcing later
operations to perform conditional checks for resource availability.

This patch move mem_cgroup_init() to address the init order, it invoke
before cgroup_init, so, compare to subsys_initcall, it can use to prepare
some key resources before root_mem_cgroup alloc.

Link: https://lkml.kernel.org/r/aAsRCj-niMMTtmK8@casper.infradead.org [1]
Link: https://lkml.kernel.org/r/20250425031935.76411-1-link@vivo.com
Link: https://lkml.kernel.org/r/20250425031935.76411-2-link@vivo.com
Signed-off-by: Huan Yang &lt;link@vivo.com&gt;
Suggested-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Francesco Valla &lt;francesco@valla.it&gt;
Cc: guoweikang &lt;guoweikang.kernel@gmail.com&gt;
Cc: Huang Shijie &lt;shijie@os.amperecomputing.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: "Paul E . McKenney" &lt;paulmck@kernel.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Raul E Rangel &lt;rrangel@chromium.org&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: "Uladzislau Rezki (Sony)" &lt;urezki@gmail.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmscan,cgroup: apply mems_effective to reclaim</title>
<updated>2025-05-13T06:50:33Z</updated>
<author>
<name>Gregory Price</name>
<email>gourry@gourry.net</email>
</author>
<published>2025-04-24T20:28:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7d709f49babc28907b0ac60228f522d2e6216add'/>
<id>urn:sha1:7d709f49babc28907b0ac60228f522d2e6216add</id>
<content type='text'>
It is possible for a reclaimer to cause demotions of an lruvec belonging
to a cgroup with cpuset.mems set to exclude some nodes.  Attempt to apply
this limitation based on the lruvec's memcg and prevent demotion.

Notably, this may still allow demotion of shared libraries or any memory
first instantiated in another cgroup.  This means cpusets still cannot
cannot guarantee complete isolation when demotion is enabled, and the docs
have been updated to reflect this.

This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently - with the noted exceptions.

Note on locking:

The cgroup_get_e_css reference protects the css-&gt;effective_mems, and calls
of this interface would be subject to the same race conditions associated
with a non-atomic access to cs-&gt;effective_mems.

So while this interface cannot make strong guarantees of correctness, it
can therefore avoid taking a global or rcu_read_lock for performance.

Link: https://lkml.kernel.org/r/20250424202806.52632-3-gourry@gourry.net
Signed-off-by: Gregory Price &lt;gourry@gourry.net&gt;
Suggested-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Suggested-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/proc/page: refactor to reduce code duplication</title>
<updated>2025-05-12T00:48:20Z</updated>
<author>
<name>Liu Ye</name>
<email>liuye@kylinos.cn</email>
</author>
<published>2025-03-18T06:32:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a40b3fa844b4c29276a228ce878bd427a3149d37'/>
<id>urn:sha1:a40b3fa844b4c29276a228ce878bd427a3149d37</id>
<content type='text'>
kpageflags_read() and kpagecgroup_read() are quite similar to
kpagecount_read().  Refactor common code into a helper function to reduce
code duplication.

Link: https://lkml.kernel.org/r/20250318063226.223284-1-liuyerd@163.com
Signed-off-by: Liu Ye &lt;liuye@kylinos.cn&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Svetly Todorov &lt;svetly.todorov@memverge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: simplify folio_memcg_charged()</title>
<updated>2025-03-22T05:03:12Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2025-03-14T13:36:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8492936abb49eda26e00eab01e58d7e69f2355ba'/>
<id>urn:sha1:8492936abb49eda26e00eab01e58d7e69f2355ba</id>
<content type='text'>
There's no need to check which kind of pointer is in the memcg_data field,
all we actually care about is whether it's zero or not.  Saves 70 bytes in
workingset_activation() with the Debian config.

Link: https://lkml.kernel.org/r/20250314133617.138071-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
