summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-12-14misc: eeprom: eeprom_93cx6: Add quirk for extra read clock cycleParker Newman
[ Upstream commit 7738a7ab9d12c5371ed97114ee2132d4512e9fd5 ] Add a quirk similar to eeprom_93xx46 to add an extra clock cycle before reading data from the EEPROM. The 93Cx6 family of EEPROMs output a "dummy 0 bit" between the writing of the op-code/address from the host to the EEPROM and the reading of the actual data from the EEPROM. More info can be found on page 6 of the AT93C46 datasheet (linked below). Similar notes are found in other 93xx6 datasheets. In summary the read operation for a 93Cx6 EEPROM is: Write to EEPROM: 110[A5-A0] (9 bits) Read from EEPROM: 0[D15-D0] (17 bits) Where: 110 is the start bit and READ OpCode [A5-A0] is the address to read from 0 is a "dummy bit" preceding the actual data [D15-D0] is the actual data. Looking at the READ timing diagrams in the 93Cx6 datasheets the dummy bit should be clocked out on the last address bit clock cycle meaning it should be discarded naturally. However, depending on the hardware configuration sometimes this dummy bit is not discarded. This is the case with Exar PCI UARTs which require an extra clock cycle between sending the address and reading the data. Datasheet: https://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-5193-SEEPROM-AT93C46D-Datasheet.pdf Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Parker Newman <pnewman@connecttech.com> Link: https://lore.kernel.org/r/0f23973efefccd2544705a0480b4ad4c2353e407.1727880931.git.pnewman@connecttech.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14leds: class: Protect brightness_show() with led_cdev->led_access mutexMukesh Ojha
[ Upstream commit 4ca7cd938725a4050dcd62ae9472e931d603118d ] There is NULL pointer issue observed if from Process A where hid device being added which results in adding a led_cdev addition and later a another call to access of led_cdev attribute from Process B can result in NULL pointer issue. Use mutex led_cdev->led_access to protect access to led->cdev and its attribute inside brightness_show() and max_brightness_show() and also update the comment for mutex that it should be used to protect the led class device fields. Process A Process B kthread+0x114 worker_thread+0x244 process_scheduled_works+0x248 uhid_device_add_worker+0x24 hid_add_device+0x120 device_add+0x268 bus_probe_device+0x94 device_initial_probe+0x14 __device_attach+0xfc bus_for_each_drv+0x10c __device_attach_driver+0x14c driver_probe_device+0x3c __driver_probe_device+0xa0 really_probe+0x190 hid_device_probe+0x130 ps_probe+0x990 ps_led_register+0x94 devm_led_classdev_register_ext+0x58 led_classdev_register_ext+0x1f8 device_create_with_groups+0x48 device_create_groups_vargs+0xc8 device_add+0x244 kobject_uevent+0x14 kobject_uevent_env[jt]+0x224 mutex_unlock[jt]+0xc4 __mutex_unlock_slowpath+0xd4 wake_up_q+0x70 try_to_wake_up[jt]+0x48c preempt_schedule_common+0x28 __schedule+0x628 __switch_to+0x174 el0t_64_sync+0x1a8/0x1ac el0t_64_sync_handler+0x68/0xbc el0_svc+0x38/0x68 do_el0_svc+0x1c/0x28 el0_svc_common+0x80/0xe0 invoke_syscall+0x58/0x114 __arm64_sys_read+0x1c/0x2c ksys_read+0x78/0xe8 vfs_read+0x1e0/0x2c8 kernfs_fop_read_iter+0x68/0x1b4 seq_read_iter+0x158/0x4ec kernfs_seq_show+0x44/0x54 sysfs_kf_seq_show+0xb4/0x130 dev_attr_show+0x38/0x74 brightness_show+0x20/0x4c dualshock4_led_get_brightness+0xc/0x74 [ 3313.874295][ T4013] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060 [ 3313.874301][ T4013] Mem abort info: [ 3313.874303][ T4013] ESR = 0x0000000096000006 [ 3313.874305][ T4013] EC = 0x25: DABT (current EL), IL = 32 bits [ 3313.874307][ T4013] SET = 0, FnV = 0 [ 3313.874309][ T4013] EA = 0, S1PTW = 0 [ 3313.874311][ T4013] FSC = 0x06: level 2 translation fault [ 3313.874313][ T4013] Data abort info: [ 3313.874314][ T4013] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 3313.874316][ T4013] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 3313.874318][ T4013] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 3313.874320][ T4013] user pgtable: 4k pages, 39-bit VAs, pgdp=00000008f2b0a000 .. [ 3313.874332][ T4013] Dumping ftrace buffer: [ 3313.874334][ T4013] (ftrace buffer empty) .. .. [ dd3313.874639][ T4013] CPU: 6 PID: 4013 Comm: InputReader [ 3313.874648][ T4013] pc : dualshock4_led_get_brightness+0xc/0x74 [ 3313.874653][ T4013] lr : led_update_brightness+0x38/0x60 [ 3313.874656][ T4013] sp : ffffffc0b910bbd0 .. .. [ 3313.874685][ T4013] Call trace: [ 3313.874687][ T4013] dualshock4_led_get_brightness+0xc/0x74 [ 3313.874690][ T4013] brightness_show+0x20/0x4c [ 3313.874692][ T4013] dev_attr_show+0x38/0x74 [ 3313.874696][ T4013] sysfs_kf_seq_show+0xb4/0x130 [ 3313.874700][ T4013] kernfs_seq_show+0x44/0x54 [ 3313.874703][ T4013] seq_read_iter+0x158/0x4ec [ 3313.874705][ T4013] kernfs_fop_read_iter+0x68/0x1b4 [ 3313.874708][ T4013] vfs_read+0x1e0/0x2c8 [ 3313.874711][ T4013] ksys_read+0x78/0xe8 [ 3313.874714][ T4013] __arm64_sys_read+0x1c/0x2c [ 3313.874718][ T4013] invoke_syscall+0x58/0x114 [ 3313.874721][ T4013] el0_svc_common+0x80/0xe0 [ 3313.874724][ T4013] do_el0_svc+0x1c/0x28 [ 3313.874727][ T4013] el0_svc+0x38/0x68 [ 3313.874730][ T4013] el0t_64_sync_handler+0x68/0xbc [ 3313.874732][ T4013] el0t_64_sync+0x1a8/0x1ac Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Anish Kumar <yesanishhere@gmail.com> Link: https://lore.kernel.org/r/20241103160527.82487-1-quic_mojha@quicinc.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14xen/xenbus: fix lockingJuergen Gross
[ Upstream commit 2f69a110e7bba3ec6bc089a2f736ca0941d887ed ] Commit 060eabe8fbe726 ("xenbus/backend: Protect xenbus callback with lock") introduced a bug by holding a lock while calling a function which might schedule. Fix that by using a semaphore instead. Fixes: 060eabe8fbe726 ("xenbus/backend: Protect xenbus callback with lock") Signed-off-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20200305100323.16736-1-jgross@suse.com Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Stable-dep-of: afc545da381b ("xen: Fix the issue of resource not being properly released in xenbus_dev_probe()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14xenbus/backend: Protect xenbus callback with lockSeongJae Park
[ Upstream commit 060eabe8fbe726aca341b518366da4d79e338100 ] A driver's 'reclaim_memory' callback can race with 'probe' or 'remove' because it will be called whenever memory pressure is detected. To avoid such race, this commit embeds a spinlock in each 'xenbus_device' and make 'xenbus' to hold the lock while the corresponded callbacks are running. Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Stable-dep-of: afc545da381b ("xen: Fix the issue of resource not being properly released in xenbus_dev_probe()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14xenbus/backend: Add memory pressure handler callbackSeongJae Park
[ Upstream commit 8a105678fb3ec4763352db84745968bf2cb4aa65 ] Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Stable-dep-of: afc545da381b ("xen: Fix the issue of resource not being properly released in xenbus_dev_probe()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14tipc: add new AEAD key structure for user APITuong Lien
[ Upstream commit 134bdac397661a5841d9f27f508190c68b26232b ] The new structure 'tipc_aead_key' is added to the 'tipc.h' for user to be able to transfer a key to TIPC in kernel. Netlink will be used for this purpose in the later commits. Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 6a2fa13312e5 ("tipc: Fix use-after-free of kernel socket in cleanup_bearer().") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14util_macros.h: fix/rework find_closest() macrosAlexandru Ardelean
commit bc73b4186736341ab5cd2c199da82db6e1134e13 upstream. A bug was found in the find_closest() (find_closest_descending() is also affected after some testing), where for certain values with small progressions, the rounding (done by averaging 2 values) causes an incorrect index to be returned. The rounding issues occur for progressions of 1, 2 and 3. It goes away when the progression/interval between two values is 4 or larger. It's particularly bad for progressions of 1. For example if there's an array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0 (the index of '1'), rather than returning 1 (the index of '2'). This means that for exact values (with a progression of 1), find_closest() will misbehave and return the index of the value smaller than the one we're searching for. For progressions of 2 and 3, the exact values are obtained correctly; but values aren't approximated correctly (as one would expect). Starting with progressions of 4, all seems to be good (one gets what one would expect). While one could argue that 'find_closest()' should not be used for arrays with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still behave correctly. The bug was found while testing the 'drivers/iio/adc/ad7606.c', specifically the oversampling feature. For reference, the oversampling values are listed as: static const unsigned int ad7606_oversampling_avail[7] = { 1, 2, 4, 8, 16, 32, 64, }; When doing: 1. $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is fine 2. $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is wrong; 2 should be returned here 3. $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 2 # this is fine 4. $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine And from here-on, the values are as correct (one gets what one would expect.) While writing a kunit test for this bug, a peculiar issue was found for the array in the 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c' drivers. While running the kunit test (for 'ina226_avg_tab' from these drivers): * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, so value. * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, value 1; and now one could argue whether 3 is closer to 4 or to 1. This quirk only appears for value '3' in this array, but it seems to be a another rounding issue. * And from 4 onwards the 'find_closest'() works fine (one gets what one would expect). This change reworks the find_closest() macros to also check the difference between the left and right elements when 'x'. If the distance to the right is smaller (than the distance to the left), the index is incremented by 1. This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro. In order to accommodate for any mix of negative + positive values, the internal variables '__fc_x', '__fc_mid_x', '__fc_left' & '__fc_right' are forced to 'long' type. This also addresses any potential bugs/issues with 'x' being of an unsigned type. In those situations any comparison between signed & unsigned would be promoted to a comparison between 2 unsigned numbers; this is especially annoying when '__fc_left' & '__fc_right' underflow. The find_closest_descending() macro was also reworked and duplicated from the find_closest(), and it is being iterated in reverse. The main reason for this is to get the same indices as 'find_closest()' (but in reverse). The comparison for '__fc_right < __fc_left' favors going the array in ascending order. For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we get: __fc_mid_x = 2 __fc_left = -1 __fc_right = -2 Then '__fc_right < __fc_left' evaluates to true and '__fc_i++' becomes 7 which is not quite incorrect, but 3 is closer to 4 than to 1. This change has been validated with the kunit from the next patch. Link: https://lkml.kernel.org/r/20241105145406.554365-1-aardelean@baylibre.com Fixes: 95d119528b0b ("util_macros.h: add find_closest() macro") Signed-off-by: Alexandru Ardelean <aardelean@baylibre.com> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACETrond Myklebust
[ Upstream commit 2790a624d43084de590884934969e19c7a82316a ] The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in the socket layer, so replace it with our own flag in the transport sock_state field. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: 4db9ad82a6c8 ("sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14block: return unsigned int from bdev_io_minChristoph Hellwig
[ Upstream commit 46fd48ab3ea3eb3bb215684bd66ea3d260b091a9 ] The underlying limit is defined as an unsigned int, so return that from bdev_io_min as well. Fixes: ac481c20ef8f ("block: Topology ioctls") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241119072602.1059488-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14cgroup: Move rcu_head up near the top of cgroup_rootWaiman Long
commit a7fb0423c201ba12815877a0b5a68a6a1710b23a upstream. Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu() for freeing the cgroup_root. The current implementation of kvfree_rcu(), however, has the limitation that the offset of the rcu_head structure within the larger data structure must be less than 4096 or the compilation will fail. See the macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h for more information. By putting rcu_head below the large cgroup structure, any change to the cgroup structure that makes it larger run the risk of causing build failure under certain configurations. Commit 77070eeb8821 ("cgroup: Avoid false cacheline sharing of read mostly rstat_cpu") happens to be the last straw that breaks it. Fix this problem by moving the rcu_head structure up before the cgroup structure. Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/ Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> [Shivani: Modified to apply on v5.4.y] Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Reviewed-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14cgroup: Make operations on the cgroup root_list RCU safeYafang Shao
commit d23b5c577715892c87533b13923306acc6243f93 upstream. At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> [fp: adapt to 5.10 mainly because of changes made by e210a89f5b07 ("cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes")] Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> [Shivani: Modified to apply on v5.4.y] Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Reviewed-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14netpoll: Use rcu_access_pointer() in netpoll_poll_lockBreno Leitao
[ Upstream commit a57d5a72f8dec7db8a79d0016fb0a3bdecc82b56 ] The ndev->npinfo pointer in netpoll_poll_lock() is RCU-protected but is being accessed directly for a NULL check. While no RCU read lock is held in this context, we should still use proper RCU primitives for consistency and correctness. Replace the direct NULL check with rcu_access_pointer(), which is the appropriate primitive when only checking for NULL without dereferencing the pointer. This function provides the necessary ordering guarantees without requiring RCU read-side protection. Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-2-a1888dcb4a02@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14time: Fix references to _msecs_to_jiffies() handling of valuesMiguel Ojeda
[ Upstream commit 92b043fd995a63a57aae29ff85a39b6f30cd440c ] The details about the handling of the "normal" values were moved to the _msecs_to_jiffies() helpers in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies"). However, the same commit still mentioned __msecs_to_jiffies() in the added documentation. Thus point to _msecs_to_jiffies() instead. Fixes: ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241025110141.157205-2-ojeda@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-17mm: add remap_pfn_range_notrackChristoph Hellwig
commit 74ffa5a3e68504dd289135b1cf0422c19ffb3f2e upstream. Patch series "add remap_pfn_range_notrack instead of reinventing it in i915", v2. i915 has some reason to want to avoid the track_pfn_remap overhead in remap_pfn_range. Add a function to the core VM to do just that rather than reinventing the functionality poorly in the driver. Note that the remap_io_sg path does get exercises when using Xorg on my Thinkpad X1, so this should be considered lightly tested, I've not managed to hit the remap_io_mapping path at all. This patch (of 4): Add a version of remap_pfn_range that does not call track_pfn_range. This will be used to fix horrible abuses of VM internals in the i915 driver. Link: https://lkml.kernel.org/r/20210326055505.1424432-1-hch@lst.de Link: https://lkml.kernel.org/r/20210326055505.1424432-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 69d4e1ce9087c8767f2fe9b9426fa2755c8e9072) Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-17mm: fix ambiguous comments for better code readabilitychenqiwu
commit 552657b7b3343851916fde7e4fd6bfb6516d2bcb upstream. The parameter of remap_pfn_range() @pfn passed from the caller is actually a page-frame number converted by corresponding physical address of kernel memory, the original comment is ambiguous that may mislead the users. Meanwhile, there is an ambiguous typo "VMM" in the comment of vm_area_struct. So fixing them will make the code more readable. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1583026921-15279-1-git-send-email-qiwuchen55@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-17spi: Fix deadlock when adding SPI controllers on SPI busesMark Brown
commit 6098475d4cb48d821bdf453c61118c56e26294f0 upstream. Currently we have a global spi_add_lock which we take when adding new devices so that we can check that we're not trying to reuse a chip select that's already controlled. This means that if the SPI device is itself a SPI controller and triggers the instantiation of further SPI devices we trigger a deadlock as we try to register and instantiate those devices while in the process of doing so for the parent controller and hence already holding the global spi_add_lock. Since we only care about concurrency within a single SPI bus move the lock to be per controller, avoiding the deadlock. This can be easily triggered in the case of spi-mux. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Hardik Gohil <hgohil@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-17mtd: rawnand: protect access to rawnand devices while in suspendSean Nyekjaer
commit 8cba323437a49a45756d661f500b324fc2d486fe upstream. Prevent rawnand access while in a suspended state. Commit 013e6292aaf5 ("mtd: rawnand: Simplify the locking") allows the rawnand layer to return errors rather than waiting in a blocking wait. Tested on a iMX6ULL. Fixes: 013e6292aaf5 ("mtd: rawnand: Simplify the locking") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220208085213.1838273-1-sean@geanix.com [florian: Adjust rawnand.h members documentation and position] Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08mac80211: Add support to trigger sta disconnect on hardware restartYoughandhar Chintala
[ Upstream commit 7d352ccf1e9935b5222ca84e8baeb07a0c8f94b9 ] Currently in case of target hardware restart, we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Many ath10k wifi chipsets have sequence numbers for the data packets assigned by firmware and the mac sequence number will restart from zero after target hardware restart leading to mismatch in the sequence number expected by the remote peer vs the sequence number of the frame sent by the target firmware. This mismatch in sequence number will cause out-of-order packets on the remote peer and all the frames sent by the device are dropped until we reach the sequence number which was sent before we restarted the target hardware In order to fix this, we trigger a sta disconnect, in case of target hw restart. After this there will be a fresh connection and thereby avoiding the dropping of frames by remote peer. The right fix would be to pull the entire data path into the host which is not feasible or would need lots of complex changes and will still be inefficient. Tested on ath10k using WCN3990, QCA6174 Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org> Link: https://lore.kernel.org/r/20220308115325.5246-2-youghand@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Stable-dep-of: 07a6e3b78a65 ("wifi: iwlwifi: mvm: Fix response handling in iwl_mvm_send_recovery_cmd()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08genetlink: hold RCU in genlmsg_mcast()Eric Dumazet
[ Upstream commit 56440d7ec28d60f8da3bfa09062b3368ff9b16db ] While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw one lockdep splat [1]. genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU. Instead of letting all callers guard genlmsg_multicast_allns() with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast(). This also means the @flags parameter is useless, we need to always use GFP_ATOMIC. [1] [10882.424136] ============================= [10882.424166] WARNING: suspicious RCU usage [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted [10882.424400] ----------------------------- [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!! [10882.424469] other info that might help us debug this: [10882.424500] rcu_scheduler_active = 2, debug_locks = 1 [10882.424744] 2 locks held by ip/15677: [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219) [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209) [10882.426465] stack backtrace: [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156 [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [10882.427046] Call Trace: [10882.427131] <TASK> [10882.427244] dump_stack_lvl (lib/dump_stack.c:123) [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7)) [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115) [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210) [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201) [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551) [10882.428069] genl_rcv (net/netlink/genetlink.c:1220) [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357) [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901) [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1)) Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08mac80211: Fix NULL ptr deref for injected rate infoMathy Vanhoef
commit bddc0c411a45d3718ac535a070f349be8eca8d48 upstream. The commit cb17ed29a7a5 ("mac80211: parse radiotap header when selecting Tx queue") moved the code to validate the radiotap header from ieee80211_monitor_start_xmit to ieee80211_parse_tx_radiotap. This made is possible to share more code with the new Tx queue selection code for injected frames. But at the same time, it now required the call of ieee80211_parse_tx_radiotap at the beginning of functions which wanted to handle the radiotap header. And this broke the rate parser for radiotap header parser. The radiotap parser for rates is operating most of the time only on the data in the actual radiotap header. But for the 802.11a/b/g rates, it must also know the selected band from the chandef information. But this information is only written to the ieee80211_tx_info at the end of the ieee80211_monitor_start_xmit - long after ieee80211_parse_tx_radiotap was already called. The info->band information was therefore always 0 (NL80211_BAND_2GHZ) when the parser code tried to access it. For a 5GHz only device, injecting a frame with 802.11a rates would cause a NULL pointer dereference because local->hw.wiphy->bands[NL80211_BAND_2GHZ] would most likely have been NULL when the radiotap parser searched for the correct rate index of the driver. Cc: stable@vger.kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Fixes: cb17ed29a7a5 ("mac80211: parse radiotap header when selecting Tx queue") Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> [sven@narfation.org: added commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20210530133226.40587-1-sven@narfation.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08PCI: Add function 0 DMA alias quirk for Glenfly Arise chipWangYuli
commit 9246b487ab3c3b5993aae7552b7a4c541cc14a49 upstream. Add DMA support for audio function of Glenfly Arise chip, which uses Requester ID of function 0. Link: https://lore.kernel.org/r/CA2BBD087345B6D1+20240823095708.3237375-1-wangyuli@uniontech.com Signed-off-by: SiyuLi <siyuli@glenfly.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> [bhelgaas: lower-case hex to match local code, drop unused Device IDs] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08net: Fix an unsafe loop on the listAnastasia Kovaleva
commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9 upstream. The kernel may crash when deleting a genetlink family if there are still listeners for that family: Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0 LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0 Call Trace: __netlink_clear_multicast_users+0x74/0xc0 genl_unregister_family+0xd4/0x2d0 Change the unsafe loop on the list to a safe one, because inside the loop there is an element removal from this list. Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)") Cc: stable@vger.kernel.org Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08net: annotate lockless accesses to sk->sk_max_ack_backlogEric Dumazet
[ Upstream commit 099ecf59f05b5f30f42ebac0ab8cb94f9b18c90c ] sk->sk_max_ack_backlog can be read without any lock being held at least in TCP/DCCP cases. We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing and/or potential KCSAN warnings. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 4d5c70e6155d ("sctp: ensure sk_state is set to CLOSED if hashing fails in sctp_listen_start") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08net: annotate lockless accesses to sk->sk_ack_backlogEric Dumazet
[ Upstream commit 288efe8606b62d0753ba6722b36ef241877251fd ] sk->sk_ack_backlog can be read without any lock being held. We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing and/or potential KCSAN warnings. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 4d5c70e6155d ("sctp: ensure sk_state is set to CLOSED if hashing fails in sctp_listen_start") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08net/sched: accept TCA_STAB only for root qdiscEric Dumazet
[ Upstream commit 3cb7cf1540ddff5473d6baeb530228d19bc97b8a ] Most qdiscs maintain their backlog using qdisc_pkt_len(skb) on the assumption it is invariant between the enqueue() and dequeue() handlers. Unfortunately syzbot can crash a host rather easily using a TBF + SFQ combination, with an STAB on SFQ [1] We can't support TCA_STAB on arbitrary level, this would require to maintain per-qdisc storage. [1] [ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 88.798611] #PF: supervisor read access in kernel mode [ 88.799014] #PF: error_code(0x0000) - not-present page [ 88.799506] PGD 0 P4D 0 [ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI [ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117 [ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00 All code ======== 0: 0f b7 50 12 movzwl 0x12(%rax),%edx 4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax b: 00 c: 48 89 d6 mov %rdx,%rsi f: 48 29 d0 sub %rdx,%rax 12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx 19: 48 c1 e0 03 shl $0x3,%rax 1d: 48 01 c2 add %rax,%rdx 20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx) 25: 7e c0 jle 0xffffffffffffffe7 27: 48 8b 3a mov (%rdx),%rdi 2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction 2d: 4c 89 02 mov %r8,(%rdx) 30: 49 89 50 08 mov %rdx,0x8(%r8) 34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 3b: 00 3c: 48 rex.W 3d: c7 .byte 0xc7 3e: 07 (bad) ... Code starting with the faulting instruction =========================================== 0: 4c 8b 07 mov (%rdi),%r8 3: 4c 89 02 mov %r8,(%rdx) 6: 49 89 50 08 mov %rdx,0x8(%r8) a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 11: 00 12: 48 rex.W 13: c7 .byte 0xc7 14: 07 (bad) ... [ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206 [ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800 [ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000 [ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f [ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140 [ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac [ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000 [ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0 [ 88.808165] Call Trace: [ 88.808459] <TASK> [ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715) [ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) [ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) [ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq [ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf [ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958) [ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673) [ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517) [ 88.812505] __fput (fs/file_table.c:432 (discriminator 1)) [ 88.812735] task_work_run (kernel/task_work.c:230) [ 88.813016] do_exit (kernel/exit.c:940) [ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4)) [ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088) [ 88.813867] do_group_exit (kernel/exit.c:1070) [ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099) [ 88.814490] x64_sys_call (??:?) [ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 88.815495] RIP: 0033:0x7f44560f1975 Fixes: 175f9c1bba9b ("net_sched: Add size table for qdiscs") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flagsHans Verkuil
commit 599f6899051cb70c4e0aa9fd591b9ee220cb6f14 upstream. The cec_msg_set_reply_to() helper function never zeroed the struct cec_msg flags field, this can cause unexpected behavior if flags was uninitialized to begin with. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 0dbacebede1e ("[media] cec: move the CEC framework out of staging and to media") Cc: <stable@vger.kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08drm/printer: Allow NULL data in devcoredump printerMatthew Brost
[ Upstream commit 53369581dc0c68a5700ed51e1660f44c4b2bb524 ] We want to determine the size of the devcoredump before writing it out. To that end, we will run the devcoredump printer with NULL data to get the size, alloc data based on the generated offset, then run the devcorecump again with a valid data pointer to print. This necessitates not writing data to the data pointer on the initial pass, when it is NULL. v5: - Better commit message (Jonathan) - Add kerenl doc with examples (Jani) Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240801154118.2547543-3-matthew.brost@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTEDPhil Sutter
[ Upstream commit 76f1ed087b562a469f2153076f179854b749c09a ] Fix the comment which incorrectly defines it as NLA_U32. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08tcp: check skb is non-NULL in tcp_rto_delta_us()Josh Hunt
[ Upstream commit c8770db2d54437a5f49417ae7b46f7de23d14db6 ] We have some machines running stock Ubuntu 20.04.6 which is their 5.4.0-174-generic kernel that are running ceph and recently hit a null ptr dereference in tcp_rearm_rto(). Initially hitting it from the TLP path, but then later we also saw it getting hit from the RACK case as well. Here are examples of the oops messages we saw in each of those cases: Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, address: 0000000000000020 Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel mode Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present page Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0 Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 Tainted: G W 5.4.0-174-generic #193-Ubuntu Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246 Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60 Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8 Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900 Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30 Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000 Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0 Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554 Jul 26 15:05:02 rx [11061395.916786] Call Trace: Jul 26 15:05:02 rx [11061395.919488] Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9 Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380 Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0 Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50 Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0 Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20 Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450 Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140 Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90 Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0 Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40 Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220 Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240 Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0 Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240 Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130 Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280 Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10 Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30 Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30 Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0 Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50 Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1 Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40 Jul 26 15:05:02 rx [11061396.039480] Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50 Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60 Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20 Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0 [nf_conntrack] Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack] Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670 [nf_conntrack] Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20 [nf_conntrack] Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0 Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0 Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50 Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70 Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200 Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0 Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30 Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20 Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0 Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100 Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100 Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0 Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50 Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70 Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70 Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280 Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0 Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0 Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460 Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80 Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80 Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0 Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30 Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190 Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff 48 Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX: 000000000000084b RCX: 00007f1ef692618d Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI: 00007f1ef4a26b10 RDI: 0000000000000275 Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000020 Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000084b Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14: 0000000000000275 R15: 000055592030f1e8 Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci hid nvme i2c_piix4 nvme_core wmi Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020 Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]--- Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246 Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60 Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8 Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900 Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30 Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000 Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0 Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554 Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal exception in interrupt Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- After we hit this we disabled TLP by setting tcp_early_retrans to 0 and then hit the crash in the RACK case: Aug 7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference, address: 0000000000000020 Aug 7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode Aug 7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page Aug 7 07:26:16 rx [1006006.283343] PGD 0 P4D 0 Aug 7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI Aug 7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.4.0-174-generic #193-Ubuntu Aug 7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Aug 7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Aug 7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Aug 7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246 Aug 7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Aug 7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff92d687ed8160 Aug 7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000 R09: 00000000cd896dcc Aug 7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001 R12: ffff92d687ed8000 Aug 7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc R15: 00000000cd8fca81 Aug 7 07:26:16 rx [1006006.375381] FS: 0000000000000000(0000) GS:ffff93158ad40000(0000) knlGS:0000000000000000 Aug 7 07:26:16 rx [1006006.383632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006 CR4: 0000000000760ee0 Aug 7 07:26:16 rx [1006006.396839] PKRU: 55555554 Aug 7 07:26:16 rx [1006006.399717] Call Trace: Aug 7 07:26:16 rx [1006006.402335] Aug 7 07:26:16 rx [1006006.404525] ? show_regs.cold+0x1a/0x1f Aug 7 07:26:16 rx [1006006.408532] ? __die+0x90/0xd9 Aug 7 07:26:16 rx [1006006.411760] ? no_context+0x196/0x380 Aug 7 07:26:16 rx [1006006.415599] ? __bad_area_nosemaphore+0x50/0x1a0 Aug 7 07:26:16 rx [1006006.420392] ? _raw_spin_lock+0x1e/0x30 Aug 7 07:26:16 rx [1006006.424401] ? bad_area_nosemaphore+0x16/0x20 Aug 7 07:26:16 rx [1006006.428927] ? do_user_addr_fault+0x267/0x450 Aug 7 07:26:16 rx [1006006.433450] ? __do_page_fault+0x58/0x90 Aug 7 07:26:16 rx [1006006.437542] ? do_page_fault+0x2c/0xe0 Aug 7 07:26:16 rx [1006006.441470] ? page_fault+0x34/0x40 Aug 7 07:26:16 rx [1006006.445134] ? tcp_rearm_rto+0xe4/0x160 Aug 7 07:26:16 rx [1006006.449145] tcp_ack+0xa32/0xb30 Aug 7 07:26:16 rx [1006006.452542] tcp_rcv_established+0x13c/0x670 Aug 7 07:26:16 rx [1006006.456981] ? sk_filter_trim_cap+0x48/0x220 Aug 7 07:26:16 rx [1006006.461419] tcp_v6_do_rcv+0xdb/0x450 Aug 7 07:26:16 rx [1006006.465257] tcp_v6_rcv+0xc2b/0xd10 Aug 7 07:26:16 rx [1006006.468918] ip6_protocol_deliver_rcu+0xd3/0x4e0 Aug 7 07:26:16 rx [1006006.473706] ip6_input_finish+0x15/0x20 Aug 7 07:26:16 rx [1006006.477710] ip6_input+0xa2/0xb0 Aug 7 07:26:16 rx [1006006.481109] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0 Aug 7 07:26:16 rx [1006006.486151] ip6_sublist_rcv_finish+0x3d/0x50 Aug 7 07:26:16 rx [1006006.490679] ip6_sublist_rcv+0x1aa/0x250 Aug 7 07:26:16 rx [1006006.494779] ? ip6_rcv_finish_core.isra.0+0xa0/0xa0 Aug 7 07:26:16 rx [1006006.499828] ipv6_list_rcv+0x112/0x140 Aug 7 07:26:16 rx [1006006.503748] __netif_receive_skb_list_core+0x1a4/0x250 Aug 7 07:26:16 rx [1006006.509057] netif_receive_skb_list_internal+0x1a1/0x2b0 Aug 7 07:26:16 rx [1006006.514538] gro_normal_list.part.0+0x1e/0x40 Aug 7 07:26:16 rx [1006006.519068] napi_complete_done+0x91/0x130 Aug 7 07:26:16 rx [1006006.523352] mlx5e_napi_poll+0x18e/0x610 [mlx5_core] Aug 7 07:26:16 rx [1006006.528481] net_rx_action+0x142/0x390 Aug 7 07:26:16 rx [1006006.532398] __do_softirq+0xd1/0x2c1 Aug 7 07:26:16 rx [1006006.536142] irq_exit+0xae/0xb0 Aug 7 07:26:16 rx [1006006.539452] do_IRQ+0x5a/0xf0 Aug 7 07:26:16 rx [1006006.542590] common_interrupt+0xf/0xf Aug 7 07:26:16 rx [1006006.546421] Aug 7 07:26:16 rx [1006006.548695] RIP: 0010:native_safe_halt+0xe/0x10 Aug 7 07:26:16 rx [1006006.553399] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65 Aug 7 07:26:16 rx [1006006.572309] RSP: 0018:ffffb42600177e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc2 Aug 7 07:26:16 rx [1006006.580040] RAX: ffffffff8ed08b20 RBX: 0000000000000005 RCX: 0000000000000001 Aug 7 07:26:16 rx [1006006.587337] RDX: 00000000f48eeca2 RSI: 0000000000000082 RDI: 0000000000000082 Aug 7 07:26:16 rx [1006006.594635] RBP: ffffb42600177e90 R08: 0000000000000000 R09: 000000000000020f Aug 7 07:26:16 rx [1006006.601931] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000005 Aug 7 07:26:16 rx [1006006.609229] R13: ffff93157deb5f00 R14: 0000000000000000 R15: 0000000000000000 Aug 7 07:26:16 rx [1006006.616530] ? __cpuidle_text_start+0x8/0x8 Aug 7 07:26:16 rx [1006006.620886] ? default_idle+0x20/0x140 Aug 7 07:26:16 rx [1006006.624804] arch_cpu_idle+0x15/0x20 Aug 7 07:26:16 rx [1006006.628545] default_idle_call+0x23/0x30 Aug 7 07:26:16 rx [1006006.632640] do_idle+0x1fb/0x270 Aug 7 07:26:16 rx [1006006.636035] cpu_startup_entry+0x20/0x30 Aug 7 07:26:16 rx [1006006.640126] start_secondary+0x178/0x1d0 Aug 7 07:26:16 rx [1006006.644218] secondary_startup_64+0xa4/0xb0 Aug 7 07:26:17 rx [1006006.648568] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet ast mii drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 hid_generic mlx5_core pci_hyperv_intf crc32_pclmul usbhid ahci tls mlxfw bnxt_en hid libahci nvme i2c_piix4 nvme_core wmi [last unloaded: cpuid] Aug 7 07:26:17 rx [1006006.726180] CR2: 0000000000000020 Aug 7 07:26:17 rx [1006006.729718] ---[ end trace e0e2e37e4e612984 ]--- Prior to seeing the first crash and on other machines we also see the warning in tcp_send_loss_probe() where packets_out is non-zero, but both transmit and retrans queues are empty so we know the box is seeing some accounting issue in this area: Jul 26 09:15:27 kernel: ------------[ cut here ]------------ Jul 26 09:15:27 kernel: invalid inflight: 2 state 1 cwnd 68 mss 8988 Jul 26 09:15:27 kernel: WARNING: CPU: 16 PID: 0 at net/ipv4/tcp_output.c:2605 tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif joydev input_leds rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_he> Jul 26 09:15:27 kernel: CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-174-generic #193-Ubuntu Jul 26 09:15:27 kernel: Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Jul 26 09:15:27 kernel: RIP: 0010:tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: Code: 08 26 01 00 75 e2 41 0f b6 54 24 12 41 8b 8c 24 c0 06 00 00 45 89 f0 48 c7 c7 e0 b4 20 a7 c6 05 8d 08 26 01 01 e8 4a c0 0f 00 <0f> 0b eb ba 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 Jul 26 09:15:27 kernel: RSP: 0018:ffffb7838088ce00 EFLAGS: 00010286 Jul 26 09:15:27 kernel: RAX: 0000000000000000 RBX: ffff9b84b5630430 RCX: 0000000000000006 Jul 26 09:15:27 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9b8e4621c8c0 Jul 26 09:15:27 kernel: RBP: ffffb7838088ce18 R08: 0000000000000927 R09: 0000000000000004 Jul 26 09:15:27 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b84b5630000 Jul 26 09:15:27 kernel: R13: 0000000000000000 R14: 000000000000231c R15: ffff9b84b5630430 Jul 26 09:15:27 kernel: FS: 0000000000000000(0000) GS:ffff9b8e46200000(0000) knlGS:0000000000000000 Jul 26 09:15:27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 09:15:27 kernel: CR2: 000056238cec2380 CR3: 0000003e49ede005 CR4: 0000000000760ee0 Jul 26 09:15:27 kernel: PKRU: 55555554 Jul 26 09:15:27 kernel: Call Trace: Jul 26 09:15:27 kernel: <IRQ> Jul 26 09:15:27 kernel: ? show_regs.cold+0x1a/0x1f Jul 26 09:15:27 kernel: ? __warn+0x98/0xe0 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: ? report_bug+0xd1/0x100 Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0 Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240 Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0 Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240 Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130 Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280 Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0 Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90 Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0 Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240 Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0 Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240 Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130 Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280 Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0 Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90 Jul 26 09:15:27 kernel: ? recalibrate_cpu_khz+0x10/0x10 Jul 26 09:15:27 kernel: ? ktime_get+0x3e/0xa0 Jul 26 09:15:27 kernel: ? native_x2apic_icr_write+0x30/0x30 Jul 26 09:15:27 kernel: run_timer_softirq+0x2a/0x50 Jul 26 09:15:27 kernel: __do_softirq+0xd1/0x2c1 Jul 26 09:15:27 kernel: irq_exit+0xae/0xb0 Jul 26 09:15:27 kernel: smp_apic_timer_interrupt+0x7b/0x140 Jul 26 09:15:27 kernel: apic_timer_interrupt+0xf/0x20 Jul 26 09:15:27 kernel: </IRQ> Jul 26 09:15:27 kernel: RIP: 0010:native_safe_halt+0xe/0x10 Jul 26 09:15:27 kernel: Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65 Jul 26 09:15:27 kernel: RSP: 0018:ffffb783801cfe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jul 26 09:15:27 kernel: RAX: ffffffffa6908b20 RBX: 0000000000000010 RCX: 0000000000000001 Jul 26 09:15:27 kernel: RDX: 000000006fc0c97e RSI: 0000000000000082 RDI: 0000000000000082 Jul 26 09:15:27 kernel: RBP: ffffb783801cfe90 R08: 0000000000000000 R09: 0000000000000225 Jul 26 09:15:27 kernel: R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000010 Jul 26 09:15:27 kernel: R13: ffff9b8e390b0000 R14: 0000000000000000 R15: 0000000000000000 Jul 26 09:15:27 kernel: ? __cpuidle_text_start+0x8/0x8 Jul 26 09:15:27 kernel: ? default_idle+0x20/0x140 Jul 26 09:15:27 kernel: arch_cpu_idle+0x15/0x20 Jul 26 09:15:27 kernel: default_idle_call+0x23/0x30 Jul 26 09:15:27 kernel: do_idle+0x1fb/0x270 Jul 26 09:15:27 kernel: cpu_startup_entry+0x20/0x30 Jul 26 09:15:27 kernel: start_secondary+0x178/0x1d0 Jul 26 09:15:27 kernel: secondary_startup_64+0xa4/0xb0 Jul 26 09:15:27 kernel: ---[ end trace e7ac822987e33be1 ]--- The NULL ptr deref is coming from tcp_rto_delta_us() attempting to pull an skb off the head of the retransmit queue and then dereferencing that skb to get the skb_mstamp_ns value via tcp_skb_timestamp_us(skb). The crash is the same one that was reported a # of years ago here: https://lore.kernel.org/netdev/86c0f836-9a7c-438b-d81a-839be45f1f58@gmail.com/T/#t and the kernel we're running has the fix which was added to resolve this issue. Unfortunately we've been unsuccessful so far in reproducing this problem in the lab and do not have the luxury of pushing out a new kernel to try and test if newer kernels resolve this issue at the moment. I realize this is a report against both an Ubuntu kernel and also an older 5.4 kernel. I have reported this issue to Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657 however I feel like since this issue has possibly cropped up again it makes sense to build in some protection in this path (even on the latest kernel versions) since the code in question just blindly assumes there's a valid skb without testing if it's NULL b/f it looks at the timestamp. Given we have seen crashes in this path before and now this case it seems like we should protect ourselves for when packets_out accounting is incorrect. While we should fix that root cause we should also just make sure the skb is not NULL before dereferencing it. Also add a warn once here to capture some information if/when the problem case is hit again. Fixes: e1a10ef7fa87 ("tcp: introduce tcp_rto_delta_us() helper for xmit timer fix") Signed-off-by: Josh Hunt <johunt@akamai.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08f2fs: reduce expensive checkpoint trigger frequencyChao Yu
[ Upstream commit aaf8c0b9ae042494cb4585883b15c1332de77840 ] We may trigger high frequent checkpoint for below case: 1. mkdir /mnt/dir1; set dir1 encrypted 2. touch /mnt/file1; fsync /mnt/file1 3. mkdir /mnt/dir2; set dir2 encrypted 4. touch /mnt/file2; fsync /mnt/file2 ... Although, newly created dir and file are not related, due to commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will trigger checkpoint whenever fsync() comes after a new encrypted dir created. In order to avoid such performance regression issue, let's record an entry including directory's ino in global cache whenever we update directory's xattr data, and then triggerring checkpoint() only if xattr metadata of target file's parent was updated. This patch updates to cover below no encryption case as well: 1) parent is checkpointed 2) set_xattr(dir) w/ new xnid 3) create(file) 4) fsync(file) Fixes: bbf156f7afa7 ("f2fs: fix lost xattrs of directories") Reported-by: wangzijie <wangzijie1@honor.com> Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reported-by: Yunlei He <heyunlei@hihonor.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()Mauricio Faria de Oliveira
[ Upstream commit aa3c0c61f62d682259e3e66cdc01846290f9cd6c ] Export functions that implement the current behavior done for an inode in journal_submit|finish_inode_data_buffers(). No functional change. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-2-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 20cee68f5b44 ("ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08kthread: add kthread_work tracepointsRob Clark
[ Upstream commit f630c7c6f10546ebff15c3a856e7949feb7a2372 ] While migrating some code from wq to kthread_worker, I found that I missed the execute_start/end tracepoints. So add similar tracepoints for kthread_work. And for completeness, queue_work tracepoint (although this one differs slightly from the matching workqueue tracepoint). Link: https://lkml.kernel.org/r/20201010180323.126634-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org> Cc: Rob Clark <robdclark@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Phil Auld <pauld@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Thara Gopinath <thara.gopinath@linaro.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vincent Donnefort <vincent.donnefort@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Ilias Stamatis <stamatis.iliass@gmail.com> Cc: Liang Chen <cl@rock-chips.com> Cc: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "J. Bruce Fields" <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: e16c7b07784f ("kthread: fix task state in kthread worker if being frozen") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08mac80211: parse radiotap header when selecting Tx queueMathy Vanhoef
[ Upstream commit cb17ed29a7a5fea8c9bf70e8a05757d71650e025 ] Already parse the radiotap header in ieee80211_monitor_select_queue. In a subsequent commit this will allow us to add a radiotap flag that influences the queue on which injected packets will be sent. This also fixes the incomplete validation of the injected frame in ieee80211_monitor_select_queue: currently an out of bounds memory access may occur in in the called function ieee80211_select_queue_80211 if the 802.11 header is too small. Note that in ieee80211_monitor_start_xmit the radiotap header is parsed again, which is necessairy because ieee80211_monitor_select_queue is not always called beforehand. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-6-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com> Stable-dep-of: 9d301de12da6 ("wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08mount: warn only once about timestamp range expirationAnthony Iliopoulos
[ Upstream commit a128b054ce029554a4a52fc3abb8c1df8bafcaef ] Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry") introduced a mount warning regarding filesystem timestamp limits, that is printed upon each writable mount or remount. This can result in a lot of unnecessary messages in the kernel log in setups where filesystems are being frequently remounted (or mounted multiple times). Avoid this by setting a superblock flag which indicates that the warning has been emitted at least once for any particular mount, as suggested in [1]. Link: https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 4bcda1eaf184 ("mount: handle OOM on mnt_warn_timestamp_expiry") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs: explicitly unregister per-superblock BDIsChristoph Hellwig
[ Upstream commit 0b3ea0926afb8dde70cfab00316ae0a70b93a7cc ] Add a new SB_I_ flag to mark superblocks that have an ephemeral bdi associated with them, and unregister it when the superblock is shut down. Link: https://lkml.kernel.org/r/20211021124441.668816-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 4bcda1eaf184 ("mount: handle OOM on mnt_warn_timestamp_expiry") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08inet: inet_defrag: prevent sk release while still in useFlorian Westphal
commit 18685451fc4e546fc0e718580d32df3c0e5c8272 upstream. ip_local_out() and other functions can pass skb->sk as function argument. If the skb is a fragment and reassembly happens before such function call returns, the sk must not be released. This affects skb fragments reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline. Eric Dumazet made an initial analysis of this bug. Quoting Eric: Calling ip_defrag() in output path is also implying skb_orphan(), which is buggy because output path relies on sk not disappearing. A relevant old patch about the issue was : 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") [..] net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet socket, not an arbitrary one. If we orphan the packet in ipvlan, then downstream things like FQ packet scheduler will not work properly. We need to change ip_defrag() to only use skb_orphan() when really needed, ie whenever frag_list is going to be used. Eric suggested to stash sk in fragment queue and made an initial patch. However there is a problem with this: If skb is refragmented again right after, ip_do_fragment() will copy head->sk to the new fragments, and sets up destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem accouting to reflect the fully reassembled skb, else wmem will underflow. This change moves the orphan down into the core, to last possible moment. As ip_defrag_offset is aliased with sk_buff->sk member, we must move the offset into the FRAG_CB, else skb->sk gets clobbered. This allows to delay the orphaning long enough to learn if the skb has to be queued or if the skb is completing the reasm queue. In the former case, things work as before, skb is orphaned. This is safe because skb gets queued/stolen and won't continue past reasm engine. In the latter case, we will steal the skb->sk reference, reattach it to the head skb, and fix up wmem accouting when inet_frag inflates truesize. Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().") Diagnosed-by: Eric Dumazet <edumazet@google.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()Steven Rostedt (VMware)
[ Upstream commit bc1a72afdc4a91844928831cac85731566e03bc6 ] When the ring buffer was first created, the iterator followed the normal producer/consumer operations where it had both a peek() operation, that just returned the event at the current location, and a read(), that would return the event at the current location and also increment the iterator such that the next peek() or read() will return the next event. The only use of the ring_buffer_read() is currently to move the iterator to the next location and nothing now actually reads the event it returns. Rename this function to its actual use case to ring_buffer_iter_advance(), which also adds the "iter" part to the name, which is more meaningful. As the timestamp returned by ring_buffer_read() was never used, there's no reason that this new version should bother having returning it. It will also become a void function. Link: http://lkml.kernel.org/r/20200317213416.018928618@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Stable-dep-of: 49aa8a1f4d68 ("tracing: Avoid possible softlockup in tracing_iter_reset()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12i2c: Use IS_REACHABLE() for substituting empty ACPI functionsRichard Fitzgerald
commit 71833e79a42178d8a50b5081c98c78ace9325628 upstream. Replace IS_ENABLED() with IS_REACHABLE() to substitute empty stubs for: i2c_acpi_get_i2c_resource() i2c_acpi_client_count() i2c_acpi_find_bus_speed() i2c_acpi_new_device_by_fwnode() i2c_adapter *i2c_acpi_find_adapter_by_handle() i2c_acpi_waive_d0_probe() commit f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions") partially fixed this conditional to depend on CONFIG_I2C, but used IS_ENABLED(), which is wrong since CONFIG_I2C is tristate. CONFIG_ACPI is boolean but let's also change it to use IS_REACHABLE() to future-proof it against becoming tristate. Somehow despite testing various combinations of CONFIG_I2C and CONFIG_ACPI we missed the combination CONFIG_I2C=m, CONFIG_ACPI=y. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408141333.gYnaitcV-lkp@intel.com/ Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12i2c: Fix conditional for substituting empty ACPI functionsRichard Fitzgerald
[ Upstream commit f17c06c6608ad4ecd2ccf321753fb511812d821b ] Add IS_ENABLED(CONFIG_I2C) to the conditional around a bunch of ACPI functions. The conditional around these functions depended only on CONFIG_ACPI. But the functions are implemented in I2C core, so are only present if CONFIG_I2C is enabled. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04net: busy-poll: use ktime_get_ns() instead of local_clock()Eric Dumazet
[ Upstream commit 0870b0d8b393dde53106678a1e2cec9dfa52f9b7 ] Typically, busy-polling durations are below 100 usec. When/if the busy-poller thread migrates to another cpu, local_clock() can be off by +/-2msec or more for small values of HZ, depending on the platform. Use ktimer_get_ns() to ensure deterministic behavior, which is the whole point of busy-polling. Fixes: 060212928670 ("net: add low latency socket poll") Fixes: 9a3c71aa8024 ("net: convert low latency sockets to sched_clock()") Fixes: 37089834528b ("sched, net: Fixup busy_loop_us_clock()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04kcm: Serialise kcm_sendmsg() for the same socket.Kuniyuki Iwashima
[ Upstream commit 807067bf014d4a3ae2cc55bd3de16f22a01eb580 ] syzkaller reported UAF in kcm_release(). [0] The scenario is 1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb. 2. Thread A resumes building skb from kcm->seq_skb but is blocked by sk_stream_wait_memory() 3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb and puts the skb to the write queue 4. Thread A faces an error and finally frees skb that is already in the write queue 5. kcm_release() does double-free the skb in the write queue When a thread is building a MSG_MORE skb, another thread must not touch it. Let's add a per-sk mutex and serialise kcm_sendmsg(). [0]: BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline] BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline] BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline] BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline] BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691 Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167 CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G B 6.8.0-rc5-syzkaller-g9abbc24128bc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x178/0x518 mm/kasan/report.c:488 kasan_report+0xd8/0x138 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 __skb_unlink include/linux/skbuff.h:2366 [inline] __skb_dequeue include/linux/skbuff.h:2385 [inline] __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline] __skb_queue_purge include/linux/skbuff.h:3181 [inline] kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691 __sock_release net/socket.c:659 [inline] sock_close+0xa4/0x1e8 net/socket.c:1421 __fput+0x30c/0x738 fs/file_table.c:376 ____fput+0x20/0x30 fs/file_table.c:404 task_work_run+0x230/0x2e0 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x618/0x1f64 kernel/exit.c:871 do_group_exit+0x194/0x22c kernel/exit.c:1020 get_signal+0x1500/0x15ec kernel/signal.c:2893 do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249 do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Allocated by task 6166: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x40/0x78 mm/kasan/common.c:68 kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626 unpoison_slab_object mm/kasan/common.c:314 [inline] __kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3813 [inline] slab_alloc_node mm/slub.c:3860 [inline] kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903 __alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641 alloc_skb include/linux/skbuff.h:1296 [inline] kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_sendmsg+0x220/0x2c0 net/socket.c:768 splice_to_socket+0x7cc/0xd58 fs/splice.c:889 do_splice_from fs/splice.c:941 [inline] direct_splice_actor+0xec/0x1d8 fs/splice.c:1164 splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108 do_splice_direct_actor fs/splice.c:1207 [inline] do_splice_direct+0x1e4/0x304 fs/splice.c:1233 do_sendfile+0x460/0xb3c fs/read_write.c:1295 __do_sys_sendfile64 fs/read_write.c:1362 [inline] __se_sys_sendfile64 fs/read_write.c:1348 [inline] __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Freed by task 6167: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x40/0x78 mm/kasan/common.c:68 kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640 poison_slab_object+0x124/0x18c mm/kasan/common.c:241 __kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2121 [inline] slab_free mm/slub.c:4299 [inline] kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363 kfree_skbmem+0x10c/0x19c __kfree_skb net/core/skbuff.c:1109 [inline] kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144 kfree_skb include/linux/skbuff.h:1244 [inline] kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685 __sock_release net/socket.c:659 [inline] sock_close+0xa4/0x1e8 net/socket.c:1421 __fput+0x30c/0x738 fs/file_table.c:376 ____fput+0x20/0x30 fs/file_table.c:404 task_work_run+0x230/0x2e0 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x618/0x1f64 kernel/exit.c:871 do_group_exit+0x194/0x22c kernel/exit.c:1020 get_signal+0x1500/0x15ec kernel/signal.c:2893 do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249 do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The buggy address belongs to the object at ffff0000ced0fc80 which belongs to the cache skbuff_head_cache of size 240 The buggy address is located 0 bytes inside of freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70) The buggy address belongs to the physical page: page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004 raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc >ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60 Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04dm mpath: pass IO start time to path selectorGabriel Krisman Bertazi
[ Upstream commit 087615bf3acdafd0ba7c7c9ed5286e7b7c80fe1b ] The HST path selector needs this information to perform path prediction. For request-based mpath, struct request's io_start_time_ns is used, while for bio-based, use the start_time stored in dm_io. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Stable-dep-of: 1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04block: use "unsigned long" for blk_validate_block_size().Tetsuo Handa
commit 37ae5a0f5287a52cf51242e76ccf198d02ffe495 upstream. Since lo_simple_ioctl(LOOP_SET_BLOCK_SIZE) and ioctl(NBD_SET_BLKSIZE) pass user-controlled "unsigned long arg" to blk_validate_block_size(), "unsigned long" should be used for validation. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/9ecbf057-4375-c2db-ab53-e4cc0dff953d@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: David Hunter <david.hunter.linux@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-04overflow: Implement size_t saturating arithmetic helpersKees Cook
commit e1be43d9b5d0d1310dbd90185a8e5c7145dde40f upstream. In order to perform more open-coded replacements of common allocation size arithmetic, the kernel needs saturating (SIZE_MAX) helpers for multiplication, addition, and subtraction. For example, it is common in allocators, especially on realloc, to add to an existing size: p = krealloc(map->patch, sizeof(struct reg_sequence) * (map->patch_regs + num_regs), GFP_KERNEL); There is no existing saturating replacement for this calculation, and just leaving the addition open coded inside array_size() could potentially overflow as well. For example, an overflow in an expression for a size_t argument might wrap to zero: array_size(anything, something_at_size_max + 1) == 0 Introduce size_mul(), size_add(), and size_sub() helpers that implicitly promote arguments to size_t and saturated calculations for use in allocations. With these helpers it is also possible to redefine array_size(), array3_size(), flex_array_size(), and struct_size() in terms of the new helpers. As with the check_*_overflow() helpers, the new helpers use __must_check, though what is really desired is a way to make sure that assignment is only to a size_t lvalue. Without this, it's still possible to introduce overflow/underflow via type conversion (i.e. from size_t to int). Enforcing this will currently need to be left to static analysis or future use of -Wconversion. Additionally update the overflow unit tests to force runtime evaluation for the pathological cases. Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Keith Busch <kbusch@kernel.org> Cc: Len Baker <len.baker@gmx.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-04overflow.h: Add flex_array_size() helperGustavo A. R. Silva
commit b19d57d0f3cc6f1022edf94daf1d70506a09e3c2 upstream. Add flex_array_size() helper for the calculation of the size, in bytes, of a flexible array member contained within an enclosing structure. Example of usage: struct something { size_t count; struct foo items[]; }; struct something *instance; instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL); instance->count = count; memcpy(instance->items, src, flex_array_size(instance, items, instance->count)); The helper returns SIZE_MAX on overflow instead of wrapping around. Additionally replaces parameter "n" with "count" in struct_size() helper for greater clarity and unification. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200609012233.GA3371@embeddedor Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-04fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHAREAl Viro
commit 9a2fa1472083580b6c66bdaf291f591e1170123a upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-04bitmap: introduce generic optimized bitmap_size()Alexander Lobakin
commit a37fbe666c016fd89e4460d0ebfcea05baba46dc upstream. The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-04vfs: Don't evict inode under the inode lru traversing contextZhihao Cheng
commit 2a0629834cd82f05d424bbc193374f9a43d1f87d upstream. The inode reclaiming process(See function prune_icache_sb) collects all reclaimable inodes and mark them with I_FREEING flag at first, at that time, other processes will be stuck if they try getting these inodes (See function find_inode_fast), then the reclaiming process destroy the inodes by function dispose_list(). Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may do inode lookup in the inode evicting callback function, if the inode lookup is operated under the inode lru traversing context, deadlock problems may happen. Case 1: In function ext4_evict_inode(), the ea inode lookup could happen if ea_inode feature is enabled, the lookup process will be stuck under the evicting context like this: 1. File A has inode i_reg and an ea inode i_ea 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea 3. Then, following three processes running like this: PA PB echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // i_reg is added into lru, lru->i_ea->i_reg prune_icache_sb list_lru_walk_one inode_lru_isolate i_ea->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(i_reg) spin_unlock(&i_reg->i_lock) spin_unlock(lru_lock) rm file A i_reg->nlink = 0 iput(i_reg) // i_reg->nlink is 0, do evict ext4_evict_inode ext4_xattr_delete_inode ext4_xattr_inode_dec_ref_all ext4_xattr_inode_iget ext4_iget(i_ea->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(i_ea) ----→ AA deadlock dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&i_ea->i_state) Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file deleting process holds BASEHD's wbuf->io_mutex while getting the xattr inode, which could race with inode reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happen as following: 1. File A has inode ia and a xattr(with inode ixa), regular file B has inode ib and a xattr. 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa 3. Then, following three processes running like this: PA PB PC echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // ib and ia are added into lru, lru->ixa->ib->ia prune_icache_sb list_lru_walk_one inode_lru_isolate ixa->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(ib) spin_unlock(&ib->i_lock) spin_unlock(lru_lock) rm file B ib->nlink = 0 rm file A iput(ia) ubifs_evict_inode(ia) ubifs_jnl_delete_inode(ia) ubifs_jnl_write_inode(ia) make_reservation(BASEHD) // Lock wbuf->io_mutex ubifs_iget(ixa->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifs_evict_inode | ubifs_jnl_delete_inode(ib) ↓ ubifs_jnl_write_inode ABBA deadlock ←-----make_reservation(BASEHD) dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&ixa->i_state) Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING to pin the inode in memory while inode_lru_isolate() reclaims its pages instead of using ordinary inode reference. This way inode deletion cannot be triggered from inode_lru_isolate() thus avoiding the deadlock. evict() is made to wait for I_LRU_ISOLATING to be cleared before proceeding with inode cleanup. Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022 Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files") Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.com Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19netfilter: nf_tables: use timestamp to check for set element timeoutPablo Neira Ayuso
commit 7395dfacfff65e9938ac0889dafa1ab01e987d15 upstream Add a timestamp field at the beginning of the transaction, store it in the nftables per-netns area. Update set backend .insert, .deactivate and sync gc path to use the timestamp, this avoids that an element expires while control plane transaction is still unfinished. .lookup and .update, which are used from packet path, still use the current time to check if the element has expired. And .get path and dump also since this runs lockless under rcu read size lock. Then, there is async gc which also needs to check the current time since it runs asynchronously from a workqueue. [ NB: rbtree GC updates has been excluded because GC is asynchronous. ] Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19Fix gcc 4.9 build issue in 5.4.yJari Ruusu
Some older systems still compile kernels with old gcc version. These warnings and errors show up when compiling with gcc 4.9.2 error: "__GCC4_has_attribute___uninitialized__" is not defined [-Werror=undef] Upstream won't need this because newer kernels are not compilable with gcc 4.9. Subject: gcc-4.9 warning/error fix for 5.10.223-rc1 Fixes: fd7eea27a3ae ("Compiler Attributes: Add __uninitialized macro") Signed-off-by: Jari Ruusu <jariruusu@protonmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>