summaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)Author
8 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
8 daysMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Usual driver updates (qla2xxx, mpi3mr, mpt3sas, ufs) plus assorted cleanups and fixes. The biggest core change is the massive code motion in the sd driver to remove forward declarations and the most significant change is to enumify the queuecommand return" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (78 commits) scsi: csiostor: Fix dereference of null pointer rn scsi: buslogic: Reduce stack usage scsi: ufs: host: mediatek: Require CONFIG_PM scsi: ufs: mediatek: Fix page faults in ufs_mtk_clk_scale() trace event scsi: smartpqi: Fix memory leak in pqi_report_phys_luns() scsi: mpi3mr: Make driver probing asynchronous scsi: ufs: core: Flush exception handling work when RPM level is zero scsi: efct: Use IRQF_ONESHOT and default primary handler scsi: ufs: core: Use a host-wide tagset in SDB mode scsi: qla2xxx: target: Add WQ_PERCPU to alloc_workqueue() users scsi: qla2xxx: Add WQ_PERCPU to alloc_workqueue() users scsi: qla4xxx: Add WQ_PERCPU to alloc_workqueue() users scsi: mpi3mr: Driver version update to 8.17.0.3.50 scsi: mpi3mr: Fixed the W=1 compilation warning scsi: mpi3mr: Record and report controller firmware faults scsi: mpi3mr: Update MPI Headers to revision 39 scsi: mpi3mr: Use negotiated link rate from DevicePage0 scsi: mpi3mr: Avoid redundant diag-fault resets scsi: mpi3mr: Rename log data save helper to reflect threaded/BH context scsi: mpi3mr: Add module parameter to control threaded IRQ polling ...
8 daysMerge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-01-23scsi: Change the return type of the .queuecommand() callbackBart Van Assche
In clang version 21.1 and later the -Wimplicit-enum-enum-cast warning option has been introduced. This warning is enabled by default and can be used to catch .queuecommand() implementations that return another value than 0 or one of the SCSI_MLQUEUE_* constants. Hence this patch that changes the return type of the .queuecommand() implementations from 'int' into 'enum scsi_qc_status'. No functionality has been changed. Cc: Damien Le Moal <dlemoal@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260115210357.2501991-6-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-20kernel.h: drop hex.h and update all hex.h usersRandy Dunlap
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20net: remove legacy way to get/set HW timestamp configVadim Fedorenko
With all drivers converted to use ndo_hwstamp callbacks the legacy way can be removed, marking ioctl interface as deprecated. Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20260116062121.1230184-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13RDMA/rtrs-srv: Fix error print in process_info_req()Grzegorz Prajsner
rtrs_srv_change_state() returns bool (true on success) therefore there is no reason to print error when it fails as it always will be 0. Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-11-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs-clt: For conn rejection use actual err numberMd Haris Iqbal
When the connection establishment request is rejected from the server side, then the actual error number sent back should be used. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-10-haris.iqbal@ionos.com Reviewed-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs: Extend log message when a port failsKim Zhu
Add HCA name and port of this HCA. This would help with analysing and debugging the logs. The logs would looks something like this, rtrs_server L2516: Handling event: port error (10). HCA name: mlx4_0, port num: 2 rtrs_client L3326: Handling event: port error (10). HCA name: mlx4_0, port num: 1 Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-9-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs-srv: Rate-limit I/O path error loggingKim Zhu
Excessive error logging is making it difficult to identify the root cause of issues. Implement rate limiting to improve log clarity. Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs-srv: Add check and closure for possible zombie pathsMd Haris Iqbal
During several network incidents, a number of RTRS paths for a session went through disconnect and reconnect phase. However, some of those did not auto-reconnect successfully. Instead they failed with the following logs, On client, kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28 (consumer defined), rtrs errno -104 kernel: rtrs_client L2698: <sess-name>: init_conns() failed: err=-104 path=gid:<gid1>@gid:<gid2> [mlx4_0:1] On server, (log a) kernel: ibtrs_server L1868: <>: Connection already exists: 0 When the misbehaving path was removed, and add_path was called to re-add the path, the log on client side changed to, (log b) kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28 (consumer defined), rtrs errno -17 There was no log on the server side for this, which is expected since there is no logging in that path, if (unlikely(__is_path_w_addr_exists(srv, &cm_id->route.addr))) { err = -EEXIST; goto err; Because of the following check on server side, if (unlikely(sess->state != IBTRS_SRV_CONNECTING)) { ibtrs_err(s, "Session in wrong state: %s\n", .. we know that the path in (log a) was in CONNECTING state. The above state of the path persists for as long as we leave the session be. This means that the path is in some zombie state, probably waiting for the info_req packet to arrive, which never does. The changes in this commits does 2 things. 1) Add logs at places where we see the errors happening. The logs would shed more light at the state and lifetime of such zombie paths. 2) Close such zombie sessions, only if they are in CONNECTING state, and after an inactivity period of 30 seconds. i) The state check prevents closure of paths which are CONNECTED. Also, from the above logs and code, we already know that the path could only be on CONNECTING state, so we play safe and narrow our impact surface area by closing only CONNECTING paths. ii) The inactivity period is to allow requests for other cid to finish processing, or for any stray packets to arrive/fail. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs-clt: Remove unused members in rtrs_clt_io_reqJack Wang
Remove unused members from rtrs_clt_io_req. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs: Improve error logging for RDMA cm eventsKim Zhu
The member variable status in the struct rdma_cm_event is used for both linux errors and the errors definded in rdma stack. Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs: Add optional support for IB_MR_TYPE_SG_GAPSMd Haris Iqbal
Support IB_MR_TYPE_SG_GAPS, which has less limitations than standard IB_MR_TYPE_MEM_REG, a few ULP support this. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs: Add error description to the logsKim Zhu
Print error description instead of the error number. Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/rtrs-srv: fix SG mappingRoman Penyaev
This fixes the following error on the server side: RTRS server session allocation failed: -EINVAL caused by the caller of the `ib_dma_map_sg()`, which does not expect less mapped entries, than requested, which is in the order of things and can be easily reproduced on the machine with enabled IOMMU. The fix is to treat any positive number of mapped sg entries as a successful mapping and cache DMA addresses by traversing modified SG table. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Roman Penyaev <r.peniaev@gmail.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-12-30RDMA/rtrs: Fix clt_path::max_pages_per_mr calculationHonggang LI
If device max_mr_size bits in the range [mr_page_shift+31:mr_page_shift] are zero, the `min3` function will set clt_path::max_pages_per_mr to zero. `alloc_path_reqs` will pass zero, which is invalid, as the third parameter to `ib_alloc_mr`. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://patch.msgid.link/20251229025617.13241-1-honggangli@163.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-12-24RDMA/rtrs: server: remove dead codeHonggang LI
As rkey had been initialized to zero, the WARN_ON_ONCE should never been triggered. Remove it. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://patch.msgid.link/20251224023819.138846-1-honggangli@163.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-12-17RTRS/rtrs: clean up rtrs headers kernel-docRandy Dunlap
Fix all (30+) kernel-doc warnings in rtrs.h and rtrs-pri.h. The changes are: - add ending ':' to enum member names - change enum description separators from '-' to ':' - add "struct" keyword to kernel-doc for structs where missing - fix enum names in enum rtrs_clt_con_type - add a '-' separator and drop the "()" in enum rtrs_clt_con_type - convert struct rtrs_addr to kernel-doc format - add missing struct member descriptions for struct rtrs_attrs Link: https://patch.msgid.link/r/20251129022146.1498273-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has another new RDMA driver 'bng_en' for latest generation Broadcom NICs. There might be one more new driver still to come. Otherwise it is a fairly quite cycle. Summary: - Minor driver bug fixes and updates to cxgb4, rxe, rdmavt, bnxt_re, mlx5 - Many bug fix patches for irdma - WQ_PERCPU annotations and system_dfl_wq changes - Improved mlx5 support for "other eswitches" and multiple PFs - 1600Gbps link speed reporting support. Four Digits Now! - New driver bng_en for latest generation Broadcom NICs - Bonding support for hns - Adjust mlx5's hmm based ODP to work with the very large address space created by the new 5 level paging default on x86 - Lockdep fixups in rxe and siw" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits) RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep RDMA/siw: reclassify sockets in order to avoid false positives from lockdep RDMA/bng_re: Remove prefetch instruction RDMA/core: Reduce cond_resched() frequency in __ib_umem_release RDMA/irdma: Fix SRQ shadow area address initialization RDMA/irdma: Remove doorbell elision logic RDMA/irdma: Do not set IBK_LOCAL_DMA_LKEY for GEN3+ RDMA/irdma: Do not directly rely on IB_PD_UNSAFE_GLOBAL_RKEY RDMA/irdma: Add missing mutex destroy RDMA/irdma: Fix SIGBUS in AEQ destroy RDMA/irdma: Add a missing kfree of struct irdma_pci_f for GEN2 RDMA/irdma: Fix data race in irdma_free_pble RDMA/irdma: Fix data race in irdma_sc_ccq_arm RDMA/mlx5: Add support for 1600_8x lane speed RDMA/core: Add new IB rate for XDR (8x) support IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled RDMA/bnxt_re: Pass correct flag for dma mr creation RDMA/bnxt_re: Fix the inline size for GenP7 devices RDMA/hns: Support reset recovery for bond RDMA/hns: Support link state reporting for bond ...
2025-11-10RDMA/rtrs: server: Fix error handling in get_or_create_srvMa Ke
After device_initialize() is called, use put_device() to release the device according to kernel device management rules. While direct kfree() work in this case, using put_device() is more correct. Found by code review. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://patch.msgid.link/20251110005158.13394-1-make24@iscas.ac.cn Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09IB/isert: add WQ_PERCPU to alloc_workqueue usersMarco Crivellari
Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251107133626.190952-1-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09IB/iser: add WQ_PERCPU to alloc_workqueue usersMarco Crivellari
Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251107133306.187939-1-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-31IB/IPoIB: Add support for hwtstamp get/set ndosCarolina Jubran
Add support for the ndo_hwtstamp_get and ndo_hwtstamp_set operations in IPoIB. This allows lower devices to handle hardware timestamp configuration through the new ndos instead of the legacy ioctls. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761819910-1011051-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-21RDMA: Use %pe format specifier for error pointersLeon Romanovsky
Convert error logging throughout the RDMA subsystem to use the %pe format specifier instead of PTR_ERR() with integer format specifiers. Link: https://patch.msgid.link/e81ec02df1e474be20417fb62e779776e3f47a50.1758217936.git.leon@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-09-18IB/ipoib: Ignore L3 master deviceVlad Dumitrescu
Currently, all master upper netdevices (e.g., bond, VRF) are treated equally. When a VRF netdevice is used over an IPoIB netdevice, the expected netdev resolution is on the lower IPoIB device which has the IP address assigned to it and not the VRF device. The rdma_cm module (CMA) tries to match incoming requests to a particular netdevice. When successful, it also validates that the return path points to the same device by performing a routing table lookup. Currently, the former would resolve to the VRF netdevice, while the latter to the correct lower IPoIB netdevice, leading to failure in rdma_cm. Improve this by ignoring the VRF master netdevice, if it exists, and instead return the lower IPoIB device. Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20250916111103.84069-5-edwards@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: - Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1, rxe, erdma, mana_ib - Prefetch supprot for rxe ODP - Remove memory window support from hns as new device FW is no longer support it - Remove qib, it is very old and obsolete now, Cornelis wishes to restructure the hfi1/qib shared layer - Fix a race in destroying CQs where we can still end up with work running because the work is cancled before the driver stops triggering it - Improve interaction with namespaces: * Follow the devlink namespace for newly spawned RDMA devices * Create iopoib net devces in the parent IB device's namespace * Allow CAP_NET_RAW checks to pass in user namespaces - A new flow control scheme for IB MADs to try and avoid queue overflows in the network - Fix 2G message sizes in bnxt_re - Optimize mkey layout for mlx5 DMABUF - New "DMA Handle" concept to allow controlling PCI TPH and steering tags * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/siw: Change maintainer email address RDMA/mana_ib: add support of multiple ports RDMA/mlx5: Refactor optional counters steering code RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr IB: Extend UVERBS_METHOD_REG_MR to get DMAH RDMA/mlx5: Add DMAH object support RDMA/core: Introduce a DMAH object and its alloc/free APIs IB/core: Add UVERBS_METHOD_REG_MR on the MR object net/mlx5: Add support for device steering tag net/mlx5: Expose IFC bits for TPH PCI/TPH: Expose pcie_tph_get_st_table_size() RDMA/mlx5: Fix incorrect MKEY masking RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() RDMA/mlx5: remove redundant check on err on return expression RDMA/mana_ib: add additional port counters RDMA/mana_ib: Fix DSCP value in modify QP RDMA/efa: Add CQ with external memory support RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers RDMA/uverbs: Add a common way to create CQ with umem RDMA/mlx5: Optimize DMABUF mkey page size ...
2025-06-26RDMA/ipoib: Use parent rdma device net namespaceMark Bloch
Use the net namespace of the underlying rdma device. After honoring the rdma device's namespace, the ipoib netdev now also runs in the same net namespace of the rdma device. Add an API to read the net namespace of the rdma device so that ULP such as IPoIB can use it to initialize its netdev. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-06-24scsi: RDMA/srp: Don't set a max_segment_size when virt_boundary_mask is setChristoph Hellwig
virt_boundary_mask implies an unlimited max_segment_size. Setting both can lead to data corruption because __blk_rq_map_sg() can split requests so that the virt_boundary_mask is not respected if max_segment_size is not UINT_MAX. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250624125233.219635-2-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-12IB/iser: Remove unnecessary local variableLi Jun
The 'error' variable is no longer needed,as iscsi_iser_mtask_xmit can return the result of iser_send_control(conn,task) directly. Signed-off-by: Li Jun <lijun01@kylinos.cn> Link: https://patch.msgid.link/20250604102049.130039-1-lijun01@kylinos.cn Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-22IB/IPoIB: Allow using netdevs that require the instance lockCosmin Ratiu
After the last patch removing vlan_rwsem, it is an incremental step to allow ipoib to work with netdevs that require the instance lock. In several places, netdev_lock() is changed to netdev_lock_ops_to_full() which takes care of not acquiring the lock again when the netdev is already locked. In ipoib_ib_tx_timeout_work() and __ipoib_ib_dev_flush() for HEAVY flushes, the netdev lock is acquired/released. This is needed because these functions end up calling .ndo_stop()/.ndo_open() on subinterfaces, and the device may expect the netdev instance lock to be held. ipoib_set_mode() now explicitly acquires ops lock while manipulating the features, mtu and tx queues. Finally, ipoib_napi_enable()/ipoib_napi_disable() now use the *_locked variants of the napi_enable()/napi_disable() calls and optionally acquire the netdev lock themselves depending on the dev they operate on. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22IB/IPoIB: Replace vlan_rwsem with the netdev instance lockCosmin Ratiu
vlan_rwsem was added more than a decade ago to work around a deadlock involving the original mutex being acquired twice, once from the wq. Subsequent changes then tweaked it to partially protect access to ipoib_dev_priv->child_intfs together with the RTNL. Flushing the wq synchronously was also since then refactored to happen separately. This semaphore unfortunately prevents updating ipoib to work with devices that require the netdev lock, because of lock ordering issues between RTNL, vlan_rwsem and the netdev instance locks of parent and child devices. To uncomplicate things, this commit replaces vlan_rwsem with the netdev instance lock of the parent device. Both parent child_intfs list and the children's list membership in it require holding the parent netdev instance lock. All call paths were carefully reviewed and no-longer-needed ASSERT_RTNL calls were dropped. Some non-trivial changes: - ipoib_match_gid_pkey_addr() now only acquires the instance lock and iterates through child_intfs for the first level of recursion (the parent), as it's not possible to have multiple levels of nested subinterfaces. - ipoib_open() and ipoib_stop() schedule tasks on the global workqueue to open/stop child interfaces to avoid potentially acquiring nested netdev instance locks. To avoid the device going away between the task scheduling and execution, netdev_hold/netdev_put are used. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22IB/IPoIB: Enqueue separate work_structs for each flushed interfaceCosmin Ratiu
Previously, flushing a netdevice involved first flushing all child devices from the flush task itself. That requires holding the lock that protects the list for the entire duration of the flush. This poses a problem when converting from vlan_rwsem to the netdev instance lock (next patch), because holding the parent lock while trying to acquire a child lock makes lockdep unhappy, rightfully. Fix this by splitting a big flush task into individual flush tasks (all are already created in their respective ipoib_dev_priv structs) and defining a helper function to enqueue all of them while holding the list lock. In ipoib_set_mac, the function is not used and the task is enqueued directly, because in the subsequent patches locking is changed and this function may be called with the netdev instance lock held. This is effectively a noop, the wq is single-threaded and ordered and will execute the same flush operations in the same order as before. Furthermore, there should be no new races because ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new work generation to wait for pending work to complete. flush_workqueue() waits for all currently enqueued work to finish before returning. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: - Usual minor updates and fixes for bnxt_re, hfi1, rxe, mana, iser, mlx5, vmw_pvrdma, hns - Make rxe work on tun devices - mana gains more standard verbs as it moves toward supporting in-kernel verbs - DMABUF support for mana - Fix page size calculations when memory registration exceeds 4G - On Demand Paging support for rxe - mlx5 support for RDMA TRANSPORT flow tables and a new ucap mechanism to access control use of them - Optional RDMA_TX/RX counters per QP in mlx5 * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (73 commits) IB/mad: Check available slots before posting receive WRs RDMA/mana_ib: Fix integer overflow during queue creation RDMA/mlx5: Fix calculation of total invalidated pages RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flow RDMA/mlx5: Fix page_size variable overflow RDMA/mlx5: Drop access_flags from _mlx5_mr_cache_alloc() RDMA/mlx5: Fix cache entry update on dereg error RDMA/mlx5: Fix MR cache initialization error flow RDMA/mlx5: Support optional-counters binding for QPs RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS config RDMA/core: Pass port to counter bind/unbind operations RDMA/core: Add support to optional-counters binding configuration RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj() RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytes RDMA/core: Fix use-after-free when rename device name RDMA/bnxt_re: Support perf management counters RDMA/rxe: Fix incorrect return value of rxe_odp_atomic_op() RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject() RDMA/mana_ib: Handle net event for pointing to the current netdev net: mana: Change the function signature of mana_get_primary_netdev_rcu ...
2025-02-21net: Use link/peer netns in newlink() of rtnl_link_opsXiao Liang
Add two helper functions - rtnl_newlink_link_net() and rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back to link netns, and link netns falls back to source netns. Convert the use of params->net in netdevice drivers to one of the helper functions for clarity. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Pack newlink() params into structXiao Liang
There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18IB/iser: fix typos in iscsi_iser.c commentsImanol
Fixes multiple occurrences of the misspelled word "occured" in the comments of `iscsi_iser.c`, replacing them with the correct spelling "occurred". This improves readability without affecting functionality. Signed-off-by: Imanol <imvalient@protonmail.com> Link: https://patch.msgid.link/20250217183048.9394-1-imvalient@protonmail.com Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-26Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, fnic, qla2xx, mpi3mr). The major core change is the renaming of the slave_ methods plus a bit of constification. The rest are minor updates and fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (103 commits) scsi: fnic: Propagate SCSI error code from fnic_scsi_drv_init() scsi: fnic: Test for memory allocation failure and return error code scsi: fnic: Return appropriate error code from failure of scsi drv init scsi: fnic: Return appropriate error code for mem alloc failure scsi: fnic: Remove always-true IS_FNIC_FCP_INITIATOR macro scsi: fnic: Fix use of uninitialized value in debug message scsi: fnic: Delete incorrect debugfs error handling scsi: fnic: Remove unnecessary else to fix warning in FDLS FIP scsi: fnic: Remove extern definition from .c files scsi: fnic: Remove unnecessary else and unnecessary break in FDLS scsi: mpi3mr: Fix possible crash when setting up bsg fails scsi: ufs: bsg: Set bsg_queue to NULL after removal scsi: ufs: bsg: Delete bsg_dev when setting up bsg fails scsi: st: Don't set pos_unknown just after device recognition scsi: aic7xxx: Fix build 'aicasm' warning scsi: Revert "scsi: ufs: core: Probe for EXT_IID support" scsi: storvsc: Ratelimit warning logs to prevent VM denial of service scsi: scsi_debug: Constify sdebug_driver_template scsi: documentation: Corrections for struct updates scsi: driver-api: documentation: Change what is added to docbook ...
2025-01-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Lighter that normal, but the now usual collection of driver fixes and small improvements: - Small fixes and minor improvements to cxgb4, bnxt_re, rxe, srp, efa, cxgb4 - Update mlx4 to use the new umem APIs, avoiding direct use of scatterlist - Support ROCEv2 in erdma - Remove various uncalled functions, constify bin_attribute - Provide core infrastructure to catch netdev events and route them to drivers, consolidating duplicated driver code - Fix rare race condition crashes in mlx5 ODP flows" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (63 commits) RDMA/mlx5: Fix implicit ODP use after free RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error RDMA/qib: Constify 'struct bin_attribute' RDMA/hfi1: Constify 'struct bin_attribute' RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]" RDMA/cxgb4: Notify rdma stack for IB_EVENT_QP_LAST_WQE_REACHED event RDMA/bnxt_re: Allocate dev_attr information dynamically RDMA/bnxt_re: Pass the context for ulp_irq_stop RDMA/bnxt_re: Add support to handle DCB_CONFIG_CHANGE event RDMA/bnxt_re: Query firmware defaults of CC params during probe RDMA/bnxt_re: Add Async event handling support bnxt_en: Add ULP call to notify async events RDMA/mlx5: Fix indirect mkey ODP page count MAINTAINERS: Update the bnxt_re maintainers RDMA/hns: Clean up the legacy CONFIG_INFINIBAND_HNS RDMA/rtrs: Add missing deinit() call RDMA/efa: Align interrupt related fields to same type RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of error RDMA/mlx5: Fix link status down event for MPV RDMA/erdma: Support create_ah/destroy_ah in non-sleepable contexts ...
2025-01-06RDMA/rtrs: Add missing deinit() callLi Zhijian
A warning is triggered when repeatedly connecting and disconnecting the rnbd: list_add corruption. prev->next should be next (ffff88800b13e480), but was ffff88801ecd1338. (prev=ffff88801ecd1340). WARNING: CPU: 1 PID: 36562 at lib/list_debug.c:32 __list_add_valid_or_report+0x7f/0xa0 Workqueue: ib_cm cm_work_handler [ib_cm] RIP: 0010:__list_add_valid_or_report+0x7f/0xa0 ? __list_add_valid_or_report+0x7f/0xa0 ib_register_event_handler+0x65/0x93 [ib_core] rtrs_srv_ib_dev_init+0x29/0x30 [rtrs_server] rtrs_ib_dev_find_or_add+0x124/0x1d0 [rtrs_core] __alloc_path+0x46c/0x680 [rtrs_server] ? rtrs_rdma_connect+0xa6/0x2d0 [rtrs_server] ? rcu_is_watching+0xd/0x40 ? __mutex_lock+0x312/0xcf0 ? get_or_create_srv+0xad/0x310 [rtrs_server] ? rtrs_rdma_connect+0xa6/0x2d0 [rtrs_server] rtrs_rdma_connect+0x23c/0x2d0 [rtrs_server] ? __lock_release+0x1b1/0x2d0 cma_cm_event_handler+0x4a/0x1a0 [rdma_cm] cma_ib_req_handler+0x3a0/0x7e0 [rdma_cm] cm_process_work+0x28/0x1a0 [ib_cm] ? _raw_spin_unlock_irq+0x2f/0x50 cm_req_handler+0x618/0xa60 [ib_cm] cm_work_handler+0x71/0x520 [ib_cm] Commit 667db86bcbe8 ("RDMA/rtrs: Register ib event handler") introduced a new element .deinit but never used it at all. Fix it by invoking the `deinit()` to appropriately unregister the IB event handler. Cc: Jinpu Wang <jinpu.wang@ionos.com> Fixes: 667db86bcbe8 ("RDMA/rtrs: Register ib event handler") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20250106004516.16611-1-lizhijian@fujitsu.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-31RDMA/rtrs: Ensure 'ib_sge list' is accessibleLi Zhijian
Move the declaration of the 'ib_sge list' variable outside the 'always_invalidate' block to ensure it remains accessible for use throughout the function. Previously, 'ib_sge list' was declared within the 'always_invalidate' block, limiting its accessibility, then caused a 'BUG: kernel NULL pointer dereference'[1]. ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2d0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? memcpy_orig+0xd5/0x140 rxe_mr_copy+0x1c3/0x200 [rdma_rxe] ? rxe_pool_get_index+0x4b/0x80 [rdma_rxe] copy_data+0xa5/0x230 [rdma_rxe] rxe_requester+0xd9b/0xf70 [rdma_rxe] ? finish_task_switch.isra.0+0x99/0x2e0 rxe_sender+0x13/0x40 [rdma_rxe] do_task+0x68/0x1e0 [rdma_rxe] process_one_work+0x177/0x330 worker_thread+0x252/0x390 ? __pfx_worker_thread+0x10/0x10 This change ensures the variable is available for subsequent operations that require it. [1] https://lore.kernel.org/linux-rdma/6a1f3e8f-deb0-49f9-bc69-a9b03ecfcda7@fujitsu.com/ Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20241231013416.1290920-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-19RDMA/srp: Fix error handling in srp_add_portMa Ke
As comment of device_add() says, if device_add() succeeds, you should call device_del() when you want to get rid of it. If device_add() has not succeeded, use only put_device() to drop the reference count. Add a put_device() call before returning from the function to decrement reference count for cleanup. Found by code review. Fixes: c8e4c2397655 ("RDMA/srp: Rework the srp_add_port() error path") Signed-off-by: Ma Ke <make_ruc2021@163.com> Link: https://patch.msgid.link/20241217075538.2909996-1-make_ruc2021@163.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-04scsi: Convert SCSI drivers to .sdev_configure()Bart Van Assche
The only difference between the .sdev_configure() and .slave_configure() methods is that the former accepts an additional 'limits' argument. Convert all SCSI drivers that define a .slave_configure() method to .sdev_configure(). This patch prepares for removing the .slave_configure() method. No functionality has been changed. Acked-by: Geoff Levand <geoff@infradead.org> # for ps3rom Acked-by: Khalid Aziz <khalid@gonehiking.org> # for the BusLogic driver Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241022180839.2712439-4-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-11-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Seveal fixes scattered across the drivers and a few new features: - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns - Force disassociate the userspace FD when hns does an async reset - bnxt new features for optimized modify QP to skip certain stayes, CQ coalescing, better debug dumping - mlx5 new data placement ordering feature - Faster destruction of mlx5 devx HW objects - Improvements to RDMA CM mad handling" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits) RDMA/bnxt_re: Correct the sequence of device suspend RDMA/bnxt_re: Use the default mode of congestion control RDMA/bnxt_re: Support different traffic class IB/cm: Rework sending DREQ when destroying a cm_id IB/cm: Do not hold reference on cm_id unless needed IB/cm: Explicitly mark if a response MAD is a retransmission RDMA/mlx5: Move events notifier registration to be after device registration RDMA/bnxt_re: Cache MSIx info to a local structure RDMA/bnxt_re: Refurbish CQ to NQ hash calculation RDMA/bnxt_re: Refactor NQ allocation RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved RDMA/hns: Fix different dgids mapping to the same dip_idx RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design bnxt_en: Add support for RoCE sriov configuration RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg() RDMA/hns: Fix out-of-order issue of requester when setting FENCE RDMA/nldev: Add IB device and net device rename events RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation RDMA/core: Move ib_uverbs_file struct to uverbs_types.h ...
2024-10-30RDMA: Use ethtool string helpersRosen Penev
Avoids having to manually increment the pointer. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241021011543.5922-1-rosenp@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/ipoib: Use the networking stack default for txqueuelenGal Pressman
There is no need for a special txqueuelen value for IPoIB. This value represents the qdisc size which is not related to the SQ size, and the default value provided by the stack (DEFAULT_TX_QUEUE_LEN) is sufficient for typical use cases. Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/cc97764b5a8def4ea879b371549a5867fe75c756.1728555243.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-11RDMA/srpt: Make slab cache names uniqueBart Van Assche
Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y"), slab complains about duplicate cache names. Hence this patch. The approach is as follows: - Maintain an xarray with the slab size as index and a reference count and a kmem_cache pointer as contents. Use srpt-${slab_size} as kmem cache name. - Use 512-byte alignment for all slabs instead of only for some of the slabs. - Increment the reference count instead of calling kmem_cache_create(). - Decrement the reference count instead of calling kmem_cache_destroy(). Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data") Link: https://patch.msgid.link/r/20241009210048.4122518-1-bvanassche@acm.org Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/ Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-09-10IB/iser: Remove unused declaration in header fileZhang Zekun
The definition of iser_finalize_rdma_unaligned_sg() has been removed since commit dd0107a08996 ("IB/iser: set block queue_virt_boundary"). Let's remove the unused declaration in header file. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Link: https://patch.msgid.link/20240909121408.80079-2-zhangzekun11@huawei.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-08-28RDMA/rtrs-clt: Remove an extra spaceJack Wang
No functional changes. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Alexei Pastuchov <alexei.pastuchov@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-12-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-08-28RDMA/rtrs-clt: Do local invalidate after write io completionJack Wang
Switch local invalidate after write io completion avoid the chain usage of WR, this fixed the local protection error on LOCAL INVALIDATE WR. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-11-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>