<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/mmzone.h, branch v6.6.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-08-18T17:12:12Z</updated>
<entry>
<title>mm: remove obsolete comment above struct per_cpu_pages</title>
<updated>2023-08-18T17:12:12Z</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2023-07-06T09:24:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f21912a4bf854e51b3ba69298f559f976d63685'/>
<id>urn:sha1:8f21912a4bf854e51b3ba69298f559f976d63685</id>
<content type='text'>
Since commit 01b44456a7aa ("mm/page_alloc: replace local_lock with normal
spinlock"), per_cpu_pages is protected by normal spinlock.  Remove the
obsolete comment as it's not that helpful.

Link: https://lkml.kernel.org/r/20230706092441.1574950-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-06-28T17:28:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-06-28T17:28:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6e17c6de3ddf3073741d9c91a796ee696914d8a0'/>
<id>urn:sha1:6e17c6de3ddf3073741d9c91a796ee696914d8a0</id>
<content type='text'>
Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
</content>
</entry>
<entry>
<title>mm/vmscan: fix root proactive reclaim unthrottling unbalanced node</title>
<updated>2023-06-23T23:59:32Z</updated>
<author>
<name>Yosry Ahmed</name>
<email>yosryahmed@google.com</email>
</author>
<published>2023-06-21T02:31:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1bc545bff45ce9eefc176ccf663074462a209cb6'/>
<id>urn:sha1:1bc545bff45ce9eefc176ccf663074462a209cb6</id>
<content type='text'>
When memory.reclaim was introduced, it became the first case where
cgroup_reclaim() is true for the root cgroup.  Johannes concluded [1] that
for most cases this is okay, except for one case.  Historically, kswapd
would throttle reclaim on a node if a lot of pages marked for reclaim are
under writeback (aka the node is congested).  This occurred by setting
LRUVEC_CONGESTED bit in lruvec-&gt;flags.  The bit would be cleared when the
node is balanced.

Similarly, cgroup reclaim would set the same bit when an lruvec is
congested, and clear it on the way out of reclaim (to throttle local
reclaimers).

Before the introduction of memory.reclaim, the root memcg was the only
target of kswapd reclaim, and non-root memcgs were the only targets of
cgroup reclaim, so they would never interfere.  Using the same bit for
both was fine.  After memory.reclaim, it is possible for cgroup reclaim on
the root cgroup to clear the bit set by kswapd.  This would result in
reclaim on the node to be unthrottled before the node is balanced.

Fix this by introducing separate bits for cgroup-level and node-level
congestion.  kswapd can unthrottle an lruvec that is marked as congested
by cgroup reclaim (as the entire node should no longer be congested), but
not vice versa (to prevent premature unthrottling before the entire node
is balanced).

[1]https://lore.kernel.org/lkml/20230405200150.GA35884@cmpxchg.org/

Link: https://lkml.kernel.org/r/20230621023101.432780-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Reported-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Closes: https://lore.kernel.org/lkml/20230405200150.GA35884@cmpxchg.org/
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mmzone: introduce folio_migratetype()</title>
<updated>2023-06-19T23:19:34Z</updated>
<author>
<name>Vishal Moola (Oracle)</name>
<email>vishal.moola@gmail.com</email>
</author>
<published>2023-06-14T02:13:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=28fb54f6a2fd6cc471165cce1650a57dfbf49746'/>
<id>urn:sha1:28fb54f6a2fd6cc471165cce1650a57dfbf49746</id>
<content type='text'>
Introduce folio_migratetype() as a folio equivalent for
get_pageblock_migratetype().  This function intends to return the
migratetype the folio is located in, hence the name choice.

Link: https://lkml.kernel.org/r/20230614021312.34085-3-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) &lt;vishal.moola@gmail.com&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mmzone: introduce folio_is_zone_movable()</title>
<updated>2023-06-19T23:19:34Z</updated>
<author>
<name>Vishal Moola (Oracle)</name>
<email>vishal.moola@gmail.com</email>
</author>
<published>2023-06-14T02:13:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=708ff4914dfb410761227a219c17c3e9dbd68c05'/>
<id>urn:sha1:708ff4914dfb410761227a219c17c3e9dbd68c05</id>
<content type='text'>
Patch series "Replace is_longterm_pinnable_page()", v2.

This patchset introduces some more helper functions for the folio
conversions, and converts all callers of is_longterm_pinnable_page() to
use folios.


This patch (of 5):

Introduce folio_is_zone_movable() to act as a folio equivalent for
is_zone_movable_page().  This is to assist in later folio conversions.

Link: https://lkml.kernel.org/r/20230614021312.34085-1-vishal.moola@gmail.com
Link: https://lkml.kernel.org/r/20230614021312.34085-2-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) &lt;vishal.moola@gmail.com&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: multi-gen LRU: cleanup lru_gen_soft_reclaim()</title>
<updated>2023-06-09T23:25:39Z</updated>
<author>
<name>T.J. Alumbaugh</name>
<email>talumbau@google.com</email>
</author>
<published>2023-05-22T11:20:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c7e7a0d79072eb02780a2c0dee730b23cde711d'/>
<id>urn:sha1:5c7e7a0d79072eb02780a2c0dee730b23cde711d</id>
<content type='text'>
lru_gen_soft_reclaim() gets the lruvec from the memcg and node ID to keep a
cleaner interface on the caller side.

Link: https://lkml.kernel.org/r/20230522112058.2965866-2-talumbau@google.com
Signed-off-by: T.J. Alumbaugh &lt;talumbau@google.com&gt;
Reviewed-by: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_alloc: move sysctls into it own fils</title>
<updated>2023-06-09T23:25:24Z</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2023-05-16T06:38:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e95d372c4cd46b6ec4eeacc07adcb7260ab4cfa0'/>
<id>urn:sha1:e95d372c4cd46b6ec4eeacc07adcb7260ab4cfa0</id>
<content type='text'>
This moves all page alloc related sysctls to its own file, as part of the
kernel/sysctl.c spring cleaning, also move some functions declarations
from mm.h into internal.h.

Link: https://lkml.kernel.org/r/20230516063821.121844-13-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: Add support for unaccepted memory</title>
<updated>2023-06-06T14:38:22Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2023-06-06T14:26:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6'/>
<id>urn:sha1:dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6</id>
<content type='text'>
UEFI Specification version 2.9 introduces the concept of memory
acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, require memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific to the Virtual Machine
platform.

There are several ways the kernel can deal with unaccepted memory:

 1. Accept all the memory during boot. It is easy to implement and it
    doesn't have runtime cost once the system is booted. The downside is
    very long boot time.

    Accept can be parallelized to multiple CPUs to keep it manageable
    (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate
    memory bandwidth and does not scale beyond the point.

 2. Accept a block of memory on the first use. It requires more
    infrastructure and changes in page allocator to make it work, but
    it provides good boot time.

    On-demand memory accept means latency spikes every time kernel steps
    onto a new memory block. The spikes will go away once workload data
    set size gets stabilized or all memory gets accepted.

 3. Accept all memory in background. Introduce a thread (or multiple)
    that gets memory accepted proactively. It will minimize time the
    system experience latency spikes on memory allocation while keeping
    low boot time.

    This approach cannot function on its own. It is an extension of #2:
    background memory acceptance requires functional scheduler, but the
    page allocator may need to tap into unaccepted memory before that.

    The downside of the approach is that these threads also steal CPU
    cycles and memory bandwidth from the user's workload and may hurt
    user experience.

Implement #1 and #2 for now. #2 is the default. Some workloads may want
to use #1 with accept_memory=eager in kernel command line. #3 can be
implemented later based on user's demands.

Support of unaccepted memory requires a few changes in core-mm code:

  - memblock accepts memory on allocation. It serves early boot memory
    allocations and doesn't limit them to pre-accepted pool of memory.

  - page allocator accepts memory on the first allocation of the page.
    When kernel runs out of accepted memory, it accepts memory until the
    high watermark is reached. It helps to minimize fragmentation.

EFI code will provide two helpers if the platform supports unaccepted
memory:

 - accept_memory() makes a range of physical addresses accepted.

 - range_contains_unaccepted_memory() checks anything within the range
   of physical addresses requires acceptance.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;	# memblock
Link: https://lore.kernel.org/r/20230606142637.5171-2-kirill.shutemov@linux.intel.com
</content>
</entry>
<entry>
<title>mm: Multi-gen LRU: remove wait_event_killable()</title>
<updated>2023-04-18T23:30:11Z</updated>
<author>
<name>Kalesh Singh</name>
<email>kaleshsingh@google.com</email>
</author>
<published>2023-04-13T21:43:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f63cf2d9b9bbe7b90f808927558a66ff737d399'/>
<id>urn:sha1:7f63cf2d9b9bbe7b90f808927558a66ff737d399</id>
<content type='text'>
Android 14 and later default to MGLRU [1] and field telemetry showed
occasional long tail latency (&gt;100ms) in the reclaim path.

Tracing revealed priority inversion in the reclaim path.  In
try_to_inc_max_seq(), when high priority tasks were blocked on
wait_event_killable(), the preemption of the low priority task to call
wake_up_all() caused those high priority tasks to wait longer than
necessary.  In general, this problem is not different from others of its
kind, e.g., one caused by mutex_lock().  However, it is specific to MGLRU
because it introduced the new wait queue lruvec-&gt;mm_state.wait.

The purpose of this new wait queue is to avoid the thundering herd
problem.  If many direct reclaimers rush into try_to_inc_max_seq(), only
one can succeed, i.e., the one to wake up the rest, and the rest who
failed might cause premature OOM kills if they do not wait.  So far there
is no evidence supporting this scenario, based on how often the wait has
been hit.  And this begs the question how useful the wait queue is in
practice.

Based on Minchan's recommendation, which is in line with his commit
6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the
rest of the MGLRU code which also uses trylock when possible, remove the
wait queue.

[1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf

Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Suggested-by: Minchan Kim &lt;minchan@kernel.org&gt;
Reported-by: Wei Wang &lt;wvw@google.com&gt;
Acked-by: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Jan Alexander Steffens (heftig) &lt;heftig@archlinux.org&gt;
Cc: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Cc: Suleiman Souhlal &lt;suleiman@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: move free_area_empty() to mm/internal.h</title>
<updated>2023-04-18T23:29:47Z</updated>
<author>
<name>Mike Rapoport (IBM)</name>
<email>rppt@kernel.org</email>
</author>
<published>2023-03-26T16:02:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=62f31bd4dcedffe3c919deb76ed65bf62c3cf80b'/>
<id>urn:sha1:62f31bd4dcedffe3c919deb76ed65bf62c3cf80b</id>
<content type='text'>
The free_area_empty() helper is only used inside mm/ so move it there to
reduce noise in include/linux/mmzone.h

Link: https://lkml.kernel.org/r/20230326160215.2674531-1-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Suggested-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
