<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include, branch v6.12.4</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.4</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.4'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-12-09T09:41:12Z</updated>
<entry>
<title>drm/panic: Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n</title>
<updated>2024-12-09T09:41:12Z</updated>
<author>
<name>Lyude Paul</name>
<email>lyude@redhat.com</email>
</author>
<published>2024-09-16T23:00:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=73292e8e61874d1e56df6684da99ab6c9696198b'/>
<id>urn:sha1:73292e8e61874d1e56df6684da99ab6c9696198b</id>
<content type='text'>
commit 319e53f155907cf2c6dabc16ec9dce0179bc04d1 upstream.

It turns out that if you happen to have a kernel config where
CONFIG_DRM_PANIC is disabled and spinlock debugging is enabled, along with
KMS being enabled - we'll end up trying to acquire an uninitialized
spin_lock with drm_panic_lock() when we try to do a commit:

  rvkms rvkms.0: [drm:drm_atomic_commit] committing 0000000068d2ade1
  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  CPU: 4 PID: 1347 Comm: modprobe Not tainted 6.10.0-rc1Lyude-Test+ #272
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20240524-3.fc40 05/24/2024
  Call Trace:
   &lt;TASK&gt;
   dump_stack_lvl+0x77/0xa0
   assign_lock_key+0x114/0x120
   register_lock_class+0xa8/0x2c0
   __lock_acquire+0x7d/0x2bd0
   ? __vmap_pages_range_noflush+0x3a8/0x550
   ? drm_atomic_helper_swap_state+0x2ad/0x3a0
   lock_acquire+0xec/0x290
   ? drm_atomic_helper_swap_state+0x2ad/0x3a0
   ? lock_release+0xee/0x310
   _raw_spin_lock_irqsave+0x4e/0x70
   ? drm_atomic_helper_swap_state+0x2ad/0x3a0
   drm_atomic_helper_swap_state+0x2ad/0x3a0
   drm_atomic_helper_commit+0xb1/0x270
   drm_atomic_commit+0xaf/0xe0
   ? __pfx___drm_printfn_info+0x10/0x10
   drm_client_modeset_commit_atomic+0x1a1/0x250
   drm_client_modeset_commit_locked+0x4b/0x180
   drm_client_modeset_commit+0x27/0x50
   __drm_fb_helper_restore_fbdev_mode_unlocked+0x76/0x90
   drm_fb_helper_set_par+0x38/0x40
   fbcon_init+0x3c4/0x690
   visual_init+0xc0/0x120
   do_bind_con_driver+0x409/0x4c0
   do_take_over_console+0x233/0x280
   do_fb_registered+0x11f/0x210
   fbcon_fb_registered+0x2c/0x60
   register_framebuffer+0x248/0x2a0
   __drm_fb_helper_initial_config_and_unlock+0x58a/0x720
   drm_fbdev_generic_client_hotplug+0x6e/0xb0
   drm_client_register+0x76/0xc0
   _RNvXs_CsHeezP08sTT_5rvkmsNtB4_5RvkmsNtNtCs1cdwasc6FUb_6kernel8platform6Driver5probe+0xed2/0x1060 [rvkms]
   ? _RNvMs_NtCs1cdwasc6FUb_6kernel8platformINtB4_7AdapterNtCsHeezP08sTT_5rvkms5RvkmsE14probe_callbackBQ_+0x2b/0x70 [rvkms]
   ? acpi_dev_pm_attach+0x25/0x110
   ? platform_probe+0x6a/0xa0
   ? really_probe+0x10b/0x400
   ? __driver_probe_device+0x7c/0x140
   ? driver_probe_device+0x22/0x1b0
   ? __device_attach_driver+0x13a/0x1c0
   ? __pfx___device_attach_driver+0x10/0x10
   ? bus_for_each_drv+0x114/0x170
   ? __device_attach+0xd6/0x1b0
   ? bus_probe_device+0x9e/0x120
   ? device_add+0x288/0x4b0
   ? platform_device_add+0x75/0x230
   ? platform_device_register_full+0x141/0x180
   ? rust_helper_platform_device_register_simple+0x85/0xb0
   ? _RNvMs2_NtCs1cdwasc6FUb_6kernel8platformNtB5_6Device13create_simple+0x1d/0x60
   ? _RNvXs0_CsHeezP08sTT_5rvkmsNtB5_5RvkmsNtCs1cdwasc6FUb_6kernel6Module4init+0x11e/0x160 [rvkms]
   ? 0xffffffffc083f000
   ? init_module+0x20/0x1000 [rvkms]
   ? kernfs_xattr_get+0x3e/0x80
   ? do_one_initcall+0x148/0x3f0
   ? __lock_acquire+0x5ef/0x2bd0
   ? __lock_acquire+0x5ef/0x2bd0
   ? __lock_acquire+0x5ef/0x2bd0
   ? put_cpu_partial+0x51/0x1d0
   ? lock_acquire+0xec/0x290
   ? put_cpu_partial+0x51/0x1d0
   ? lock_release+0xee/0x310
   ? put_cpu_partial+0x51/0x1d0
   ? fs_reclaim_acquire+0x69/0xf0
   ? lock_acquire+0xec/0x290
   ? fs_reclaim_acquire+0x69/0xf0
   ? kfree+0x22f/0x340
   ? lock_release+0xee/0x310
   ? kmalloc_trace_noprof+0x48/0x340
   ? do_init_module+0x22/0x240
   ? kmalloc_trace_noprof+0x155/0x340
   ? do_init_module+0x60/0x240
   ? __se_sys_finit_module+0x2e0/0x3f0
   ? do_syscall_64+0xa4/0x180
   ? syscall_exit_to_user_mode+0x108/0x140
   ? do_syscall_64+0xb0/0x180
   ? vma_end_read+0xd0/0xe0
   ? do_user_addr_fault+0x309/0x640
   ? clear_bhb_loop+0x45/0xa0
   ? clear_bhb_loop+0x45/0xa0
   ? clear_bhb_loop+0x45/0xa0
   ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
   &lt;/TASK&gt;

Fix this by stubbing these macros out when this config option isn't
enabled, along with fixing the unused variable warning that introduces.

Signed-off-by: Lyude Paul &lt;lyude@redhat.com&gt;
Reviewed-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Fixes: e2a1cda3e0c7 ("drm/panic: Add drm panic locking")
Cc: &lt;stable@vger.kernel.org&gt; # v6.10+
Link: https://patchwork.freedesktop.org/patch/msgid/20240916230103.611490-1-lyude@redhat.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation</title>
<updated>2024-12-09T09:41:08Z</updated>
<author>
<name>Adrian Huang</name>
<email>ahuang12@lenovo.com</email>
</author>
<published>2024-07-26T16:52:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f9a18889aad9b4c19c6c4550c67ad4f9ed2a354f'/>
<id>urn:sha1:f9a18889aad9b4c19c6c4550c67ad4f9ed2a354f</id>
<content type='text'>
commit 9e9e085effe9b7e342138fde3cf8577d22509932 upstream.

When compiling kernel source 'make -j $(nproc)' with the up-and-running
KASAN-enabled kernel on a 256-core machine, the following soft lockup is
shown:

watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760]
CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95
Workqueue: events drain_vmap_area_work
RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0
Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 &lt;0f&gt; b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75
RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202
RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949
RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50
RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800
R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39
R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0
Call Trace:
 &lt;IRQ&gt;
 ? watchdog_timer_fn+0x2cd/0x390
 ? __pfx_watchdog_timer_fn+0x10/0x10
 ? __hrtimer_run_queues+0x300/0x6d0
 ? sched_clock_cpu+0x69/0x4e0
 ? __pfx___hrtimer_run_queues+0x10/0x10
 ? srso_return_thunk+0x5/0x5f
 ? ktime_get_update_offsets_now+0x7f/0x2a0
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? hrtimer_interrupt+0x2ca/0x760
 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0
 ? sysvec_apic_timer_interrupt+0x6a/0x90
 &lt;/IRQ&gt;
 &lt;TASK&gt;
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? smp_call_function_many_cond+0x1d8/0xbb0
 ? __pfx_do_kernel_range_flush+0x10/0x10
 on_each_cpu_cond_mask+0x20/0x40
 flush_tlb_kernel_range+0x19b/0x250
 ? srso_return_thunk+0x5/0x5f
 ? kasan_release_vmalloc+0xa7/0xc0
 purge_vmap_node+0x357/0x820
 ? __pfx_purge_vmap_node+0x10/0x10
 __purge_vmap_area_lazy+0x5b8/0xa10
 drain_vmap_area_work+0x21/0x30
 process_one_work+0x661/0x10b0
 worker_thread+0x844/0x10e0
 ? srso_return_thunk+0x5/0x5f
 ? __kthread_parkme+0x82/0x140
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a5/0x370
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x30/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 &lt;/TASK&gt;

Debugging Analysis:

  1. The following ftrace log shows that the lockup CPU spends too much
     time iterating vmap_nodes and flushing TLB when purging vm_area
     structures. (Some info is trimmed).

     kworker: funcgraph_entry:              |  drain_vmap_area_work() {
     kworker: funcgraph_entry:              |   mutex_lock() {
     kworker: funcgraph_entry:  1.092 us    |     __cond_resched();
     kworker: funcgraph_exit:   3.306 us    |   }
     ...                                        ...
     kworker: funcgraph_entry:              |    flush_tlb_kernel_range() {
     ...                                          ...
     kworker: funcgraph_exit: # 7533.649 us |    }
     ...                                         ...
     kworker: funcgraph_entry:  2.344 us    |   mutex_unlock();
     kworker: funcgraph_exit: $ 23871554 us | }

     The drain_vmap_area_work() spends over 23 seconds.

     There are 2805 flush_tlb_kernel_range() calls in the ftrace log.
       * One is called in __purge_vmap_area_lazy().
       * Others are called by purge_vmap_node-&gt;kasan_release_vmalloc.
         purge_vmap_node() iteratively releases kasan vmalloc
         allocations and flushes TLB for each vmap_area.
           - [Rough calculation] Each flush_tlb_kernel_range() runs
             about 7.5ms.
               -- 2804 * 7.5ms = 21.03 seconds.
               -- That's why a soft lock is triggered.

  2. Extending the soft lockup time can work around the issue (For example,
     # echo 60 &gt; /proc/sys/kernel/watchdog_thresh). This confirms the
     above-mentioned speculation: drain_vmap_area_work() spends too much
     time.

If we combine all TLB flush operations of the KASAN shadow virtual
address into one operation in the call path
'purge_vmap_node()-&gt;kasan_release_vmalloc()', the running time of
drain_vmap_area_work() can be saved greatly. The idea is from the
flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the
soft lockup won't be triggered.

Here is the test result based on 6.10:

[6.10 wo/ the patch]
  1. ftrace latency profiling (record a trace if the latency &gt; 20s).
     echo 20000000 &gt; /sys/kernel/debug/tracing/tracing_thresh
     echo drain_vmap_area_work &gt; /sys/kernel/debug/tracing/set_graph_function
     echo function_graph &gt; /sys/kernel/debug/tracing/current_tracer
     echo 1 &gt; /sys/kernel/debug/tracing/tracing_on

  2. Run `make -j $(nproc)` to compile the kernel source

  3. Once the soft lockup is reproduced, check the ftrace log:
     cat /sys/kernel/debug/tracing/trace
        # tracer: function_graph
        #
        # CPU  DURATION                  FUNCTION CALLS
        # |     |   |                     |   |   |   |
          76) $ 50412985 us |    } /* __purge_vmap_area_lazy */
          76) $ 50412997 us |  } /* drain_vmap_area_work */
          76) $ 29165911 us |    } /* __purge_vmap_area_lazy */
          76) $ 29165926 us |  } /* drain_vmap_area_work */
          91) $ 53629423 us |    } /* __purge_vmap_area_lazy */
          91) $ 53629434 us |  } /* drain_vmap_area_work */
          91) $ 28121014 us |    } /* __purge_vmap_area_lazy */
          91) $ 28121026 us |  } /* drain_vmap_area_work */

[6.10 w/ the patch]
  1. Repeat step 1-2 in "[6.10 wo/ the patch]"

  2. The soft lockup is not triggered and ftrace log is empty.
     cat /sys/kernel/debug/tracing/trace
     # tracer: function_graph
     #
     # CPU  DURATION                  FUNCTION CALLS
     # |     |   |                     |   |   |   |

  3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace
     log.

  4. Setting 'tracing_thresh' to 1 second gets ftrace log.
     cat /sys/kernel/debug/tracing/trace
     # tracer: function_graph
     #
     # CPU  DURATION                  FUNCTION CALLS
     # |     |   |                     |   |   |   |
       23) $ 1074942 us  |    } /* __purge_vmap_area_lazy */
       23) $ 1074950 us  |  } /* drain_vmap_area_work */

  The worst execution time of drain_vmap_area_work() is about 1 second.

Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/
Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com
Fixes: 282631cb2447 ("mm: vmalloc: remove global purge_vmap_area_root rb-tree")
Signed-off-by: Adrian Huang &lt;ahuang12@lenovo.com&gt;
Co-developed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Tested-by: Jiwei Sun &lt;sunjw10@lenovo.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>util_macros.h: fix/rework find_closest() macros</title>
<updated>2024-12-09T09:41:06Z</updated>
<author>
<name>Alexandru Ardelean</name>
<email>aardelean@baylibre.com</email>
</author>
<published>2024-11-05T14:54:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3e5fbcbc388aa222ecde1e317713b74b516237d7'/>
<id>urn:sha1:3e5fbcbc388aa222ecde1e317713b74b516237d7</id>
<content type='text'>
commit bc73b4186736341ab5cd2c199da82db6e1134e13 upstream.

A bug was found in the find_closest() (find_closest_descending() is also
affected after some testing), where for certain values with small
progressions, the rounding (done by averaging 2 values) causes an
incorrect index to be returned.  The rounding issues occur for
progressions of 1, 2 and 3.  It goes away when the progression/interval
between two values is 4 or larger.

It's particularly bad for progressions of 1.  For example if there's an
array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0
(the index of '1'), rather than returning 1 (the index of '2').  This
means that for exact values (with a progression of 1), find_closest() will
misbehave and return the index of the value smaller than the one we're
searching for.

For progressions of 2 and 3, the exact values are obtained correctly; but
values aren't approximated correctly (as one would expect).  Starting with
progressions of 4, all seems to be good (one gets what one would expect).

While one could argue that 'find_closest()' should not be used for arrays
with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still
behave correctly.

The bug was found while testing the 'drivers/iio/adc/ad7606.c',
specifically the oversampling feature.
For reference, the oversampling values are listed as:
   static const unsigned int ad7606_oversampling_avail[7] = {
          1, 2, 4, 8, 16, 32, 64,
   };

When doing:
  1. $ echo 1 &gt; /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     1  # this is fine
  2. $ echo 2 &gt; /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     1  # this is wrong; 2 should be returned here
  3. $ echo 3 &gt; /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     2  # this is fine
  4. $ echo 4 &gt; /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
     4  # this is fine
And from here-on, the values are as correct (one gets what one would
expect.)

While writing a kunit test for this bug, a peculiar issue was found for the
array in the 'drivers/hwmon/ina2xx.c' &amp; 'drivers/iio/adc/ina2xx-adc.c'
drivers. While running the kunit test (for 'ina226_avg_tab' from these
drivers):
  * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab));
    This returns idx == 0, so value.
  * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab));
    This returns idx == 0, value 1; and now one could argue whether 3 is
    closer to 4 or to 1. This quirk only appears for value '3' in this
    array, but it seems to be a another rounding issue.
  * And from 4 onwards the 'find_closest'() works fine (one gets what one
    would expect).

This change reworks the find_closest() macros to also check the difference
between the left and right elements when 'x'. If the distance to the right
is smaller (than the distance to the left), the index is incremented by 1.
This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro.

In order to accommodate for any mix of negative + positive values, the
internal variables '__fc_x', '__fc_mid_x', '__fc_left' &amp; '__fc_right' are
forced to 'long' type. This also addresses any potential bugs/issues with
'x' being of an unsigned type. In those situations any comparison between
signed &amp; unsigned would be promoted to a comparison between 2 unsigned
numbers; this is especially annoying when '__fc_left' &amp; '__fc_right'
underflow.

The find_closest_descending() macro was also reworked and duplicated from
the find_closest(), and it is being iterated in reverse. The main reason
for this is to get the same indices as 'find_closest()' (but in reverse).
The comparison for '__fc_right &lt; __fc_left' favors going the array in
ascending order.
For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we
get:
    __fc_mid_x = 2
    __fc_left = -1
    __fc_right = -2
    Then '__fc_right &lt; __fc_left' evaluates to true and '__fc_i++' becomes 7
    which is not quite incorrect, but 3 is closer to 4 than to 1.

This change has been validated with the kunit from the next patch.

Link: https://lkml.kernel.org/r/20241105145406.554365-1-aardelean@baylibre.com
Fixes: 95d119528b0b ("util_macros.h: add find_closest() macro")
Signed-off-by: Alexandru Ardelean &lt;aardelean@baylibre.com&gt;
Cc: Bartosz Golaszewski &lt;bartosz.golaszewski@linaro.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>netkit: Add option for scrubbing skb meta data</title>
<updated>2024-12-09T09:40:56Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2024-10-04T10:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=596f7faa60b25a44314865b78ffb4fb75c941b7a'/>
<id>urn:sha1:596f7faa60b25a44314865b78ffb4fb75c941b7a</id>
<content type='text'>
commit 83134ef4609388f6b9ca31a384f531155196c2a7 upstream.

Jordan reported that when running Cilium with netkit in per-endpoint-routes
mode, network policy misclassifies traffic. In this direct routing mode
of Cilium which is used in case of GKE/EKS/AKS, the Pod's BPF program to
enforce policy sits on the netkit primary device's egress side.

The issue here is that in case of netkit's netkit_prep_forward(), it will
clear meta data such as skb-&gt;mark and skb-&gt;priority before executing the
BPF program. Thus, identity data stored in there from earlier BPF programs
(e.g. from tcx ingress on the physical device) gets cleared instead of
being made available for the primary's program to process. While for traffic
egressing the Pod via the peer device this might be desired, this is
different for the primary one where compared to tcx egress on the host
veth this information would be available.

To address this, add a new parameter for the device orchestration to
allow control of skb-&gt;mark and skb-&gt;priority scrubbing, to make the two
accessible from BPF (and eventually leave it up to the program to scrub).
By default, the current behavior is retained. For netkit peer this also
enables the use case where applications could cooperate/signal intent to
the BPF program.

Note that struct netkit has a 4 byte hole between policy and bundle which
is used here, in other words, struct netkit's first cacheline content used
in fast-path does not get moved around.

Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device")
Reported-by: Jordan Rife &lt;jrife@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Link: https://github.com/cilium/cilium/issues/34042
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Acked-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Link: https://lore.kernel.org/r/20241004101335.117711-1-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: always verify unfreeze lock on the owner task</title>
<updated>2024-12-05T13:03:10Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-10-31T13:37:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b12cfcae8a83140d949dfaad7b19c37275069f81'/>
<id>urn:sha1:b12cfcae8a83140d949dfaad7b19c37275069f81</id>
<content type='text'>
commit 6a78699838a0ddeed3620ddf50c1521f1fe1e811 upstream.

commit f1be1788a32e ("block: model freeze &amp; enter queue as lock for
supporting lockdep") tries to apply lockdep for verifying freeze &amp;
unfreeze. However, the verification is only done the outmost freeze and
unfreeze. This way is actually not correct because q-&gt;mq_freeze_depth
still may drop to zero on other task instead of the freeze owner task.

Fix this issue by always verifying the last unfreeze lock on the owner
task context, and make sure both the outmost freeze &amp; unfreeze are
verified in the current task.

Fixes: f1be1788a32e ("block: model freeze &amp; enter queue as lock for supporting lockdep")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20241031133723.303835-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Rename .data.once to .data..once to fix resetting WARN*_ONCE</title>
<updated>2024-12-05T13:03:08Z</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2024-11-06T16:14:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=48dd6c71976bc8c1b665731749522a7e42df3bbe'/>
<id>urn:sha1:48dd6c71976bc8c1b665731749522a7e42df3bbe</id>
<content type='text'>
[ Upstream commit dbefa1f31a91670c9e7dac9b559625336206466f ]

Commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE")
added support for clearing the state of once warnings. However,
it is not functional when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or
CONFIG_LTO_CLANG is enabled, because .data.once matches the
.data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro.

Commit cb87481ee89d ("kbuild: linker script do not match C names unless
LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress
the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case,
providing a minimal fix for stable backporting. We were aware this did
not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The
plan was to apply correct fixes and then revert cb87481ee89d. [1]

Seven years have passed since then, yet the #ifdef workaround remains in
place. Meanwhile, commit b1fca27d384e introduced the .data.once section,
and commit dc5723b02e52 ("kbuild: add support for Clang LTO") extended
the #ifdef.

Using a ".." separator in the section name fixes the issue for
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG.

[1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/

Fixes: b1fca27d384e ("kernel debug: support resetting WARN*_ONCE")
Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO")
Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Rename .data.unlikely to .data..unlikely</title>
<updated>2024-12-05T13:03:07Z</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2024-11-06T16:14:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cc7d10339d95bf6bdff1a56666fce6a556b0d500'/>
<id>urn:sha1:cc7d10339d95bf6bdff1a56666fce6a556b0d500</id>
<content type='text'>
[ Upstream commit bb43a59944f45e89aa158740b8a16ba8f0b0fa2b ]

Commit 7ccaba5314ca ("consolidate WARN_...ONCE() static variables")
was intended to collect all .data.unlikely sections into one chunk.
However, this has not worked when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
or CONFIG_LTO_CLANG is enabled, because .data.unlikely matches the
.data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro.

Commit cb87481ee89d ("kbuild: linker script do not match C names unless
LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress
the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case,
providing a minimal fix for stable backporting. We were aware this did
not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The
plan was to apply correct fixes and then revert cb87481ee89d. [1]

Seven years have passed since then, yet the #ifdef workaround remains in
place.

Using a ".." separator in the section name fixes the issue for
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG.

[1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/

Fixes: cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")
Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: return unsigned int from bdev_io_min</title>
<updated>2024-12-05T13:03:06Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2024-11-19T07:26:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5e15cc7a1d9335cc2e8b99ecffd975e22666c407'/>
<id>urn:sha1:5e15cc7a1d9335cc2e8b99ecffd975e22666c407</id>
<content type='text'>
[ Upstream commit 46fd48ab3ea3eb3bb215684bd66ea3d260b091a9 ]

The underlying limit is defined as an unsigned int, so return that from
bdev_io_min as well.

Fixes: ac481c20ef8f ("block: Topology ioctls")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Link: https://lore.kernel.org/r/20241119072602.1059488-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: model freeze &amp; enter queue as lock for supporting lockdep</title>
<updated>2024-12-05T13:03:05Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-10-25T00:37:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a6fc2ba1c7e57cc4ef5a36f8fe56d810fd7b1a65'/>
<id>urn:sha1:a6fc2ba1c7e57cc4ef5a36f8fe56d810fd7b1a65</id>
<content type='text'>
[ Upstream commit f1be1788a32e8fa63416ad4518bbd1a85a825c9d ]

Recently we got several deadlock report[1][2][3] caused by
blk_mq_freeze_queue and blk_enter_queue().

Turns out the two are just like acquiring read/write lock, so model them
as read/write lock for supporting lockdep:

1) model q-&gt;q_usage_counter as two locks(io and queue lock)

- queue lock covers sync with blk_enter_queue()

- io lock covers sync with bio_enter_queue()

2) make the lockdep class/key as per-queue:

- different subsystem has very different lock use pattern, shared lock
 class causes false positive easily

- freeze_queue degrades to no lock in case that disk state becomes DEAD
  because bio_enter_queue() won't be blocked any more

- freeze_queue degrades to no lock in case that request queue becomes dying
  because blk_enter_queue() won't be blocked any more

3) model blk_mq_freeze_queue() as acquire_exclusive &amp; try_lock
- it is exclusive lock, so dependency with blk_enter_queue() is covered

- it is trylock because blk_mq_freeze_queue() are allowed to run
  concurrently

4) model blk_enter_queue() &amp; bio_enter_queue() as acquire_read()
- nested blk_enter_queue() are allowed

- dependency with blk_mq_freeze_queue() is covered

- blk_queue_exit() is often called from other contexts(such as irq), and
it can't be annotated as lock_release(), so simply do it in
blk_enter_queue(), this way still covered cases as many as possible

With lockdep support, such kind of reports may be reported asap and
needn't wait until the real deadlock is triggered.

For example, lockdep report can be triggered in the report[3] with this
patch applied.

[1] occasional block layer hang when setting 'echo noop &gt; /sys/block/sda/queue/scheduler'
https://bugzilla.kernel.org/show_bug.cgi?id=219166

[2] del_gendisk() vs blk_queue_enter() race condition
https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/

[3] queue_freeze &amp; queue_enter deadlock in scsi
https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20241025003722.3630252-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Stable-dep-of: 3802f73bd807 ("block: fix uaf for flush rq while iterating tags")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: add non_owner variant of start_freeze/unfreeze queue APIs</title>
<updated>2024-12-05T13:03:05Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-10-25T00:37:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9c3d53f113426d2f0f49ca13afabca13f5b1a5af'/>
<id>urn:sha1:9c3d53f113426d2f0f49ca13afabca13f5b1a5af</id>
<content type='text'>
[ Upstream commit 8acdd0e7bfadda6b5103f2960d293581954454ed ]

Add non_owner variant of start_freeze/unfreeze queue APIs, so that the
caller knows that what they are doing, and we can skip lockdep support
for non_owner variant in per-call level.

Prepare for supporting lockdep for freezing/unfreezing queue.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20241025003722.3630252-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Stable-dep-of: 3802f73bd807 ("block: fix uaf for flush rq while iterating tags")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
