diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-02-12 16:33:05 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-02-12 16:33:05 -0800 |
| commit | e812928be2ee1c2744adf20ed04e0ce1e2fc5c13 (patch) | |
| tree | d2685be8adaca1d097adf407b333d913d74c2582 | |
| parent | cebcffe666cc82e68842e27852a019ca54072cb7 (diff) | |
| parent | 63fbf275fa9f18f7020fb8acf54fa107e51d0f23 (diff) | |
Merge tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
- Introduce cxl_memdev_attach and pave way for soft reserved handling,
type2 accelerator enabling, and LSA 2.0 enabling. All these series
require the endpoint driver to settle before continuing the memdev
driver probe.
- Address CXL port error protocol handling and reporting.
The large patch series was split into three parts. The first two
parts are included here with the final part coming later.
The first part consists of a series of code refactoring to PCI AER
sub-system that addresses CXL and also CXL RAS code to prepare for
port error handling.
The second part refactors the CXL code to move management of
component registers to cxl_port objects to allow all CXL AER errors
to be handled through the cxl_port hierarchy.
- Provide AMD Zen5 platform address translation for CXL using ACPI
PRMT. This includes a conventions document to explain why this is
needed and how it's implemented.
- Misc CXL patches of fixes, cleanups, and updates. Including CXL
address translation for unaligned MOD3 regions.
[ TLA service: CXL is "Compute Express Link" ]
* tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (59 commits)
cxl: Disable HPA/SPA translation handlers for Normalized Addressing
cxl/region: Factor out code into cxl_region_setup_poison()
cxl/atl: Lock decoders that need address translation
cxl: Enable AMD Zen5 address translation using ACPI PRMT
cxl/acpi: Prepare use of EFI runtime services
cxl: Introduce callback for HPA address ranges translation
cxl/region: Use region data to get the root decoder
cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Simplify cxl_root_ops allocation and handling
cxl/region: Store HPA range in struct cxl_region
cxl/region: Store root decoder in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @hpa_range
Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
cxl, doc: Moving conventions in separate files
cxl, doc: Remove isonum.txt inclusion
cxl/port: Unify endpoint and switch port lookup
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Map Port RAS registers
cxl/port: Move dport RAS setup to dport add time
...
50 files changed, 2522 insertions, 1240 deletions
diff --git a/Documentation/driver-api/cxl/conventions.rst b/Documentation/driver-api/cxl/conventions.rst index e37336d7b116..0d2e07279ad9 100644 --- a/Documentation/driver-api/cxl/conventions.rst +++ b/Documentation/driver-api/cxl/conventions.rst @@ -1,9 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 -.. include:: <isonum.txt> -======================================= Compute Express Link: Linux Conventions -======================================= +####################################### There exists shipping platforms that bend or break CXL specification expectations. Record the details and the rationale for those deviations. @@ -11,172 +9,10 @@ Borrow the ACPI Code First template format to capture the assumptions and tradeoffs such that multiple platform implementations can follow the same convention. -<(template) Title> -================== +.. toctree:: + :maxdepth: 1 + :caption: Contents -Document --------- -CXL Revision <rev>, Version <ver> - -License -------- -SPDX-License Identifier: CC-BY-4.0 - -Creator/Contributors --------------------- - -Summary of the Change ---------------------- - -<Detail the conflict with the specification and where available the -assumptions and tradeoffs taken by the hardware platform.> - - -Benefits of the Change ----------------------- - -<Detail what happens if platforms and Linux do not adopt this -convention.> - -References ----------- - -Detailed Description of the Change ----------------------------------- - -<Propose spec language that corrects the conflict.> - - -Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders -============================================================================ - -Document --------- - -CXL Revision 3.2, Version 1.0 - -License -------- - -SPDX-License Identifier: CC-BY-4.0 - -Creator/Contributors --------------------- - -- Fabio M. De Francesco, Intel -- Dan J. Williams, Intel -- Mahesh Natu, Intel - -Summary of the Change ---------------------- - -According to the current Compute Express Link (CXL) Specifications (Revision -3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero -or more Host Physical Address (HPA) windows associated with each CXL Host -Bridge. Each window represents a contiguous HPA range that may be interleaved -across one or more targets, including CXL Host Bridges. Each window has a set -of restrictions that govern its usage. It is the Operating System-directed -configuration and Power Management (OSPM) responsibility to utilize each window -for the specified use. - -Table 9-22 of the current CXL Specifications states that the Window Size field -contains the total number of consecutive bytes of HPA this window describes. -This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB. - -Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a -memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases, -the CFMWS Range Size may not adhere to the NIW * 256 MB rule. - -The HPA represents the actual physical memory address space that the CXL devices -can decode and respond to, while the System Physical Address (SPA), a related -but distinct concept, represents the system-visible address space that users can -direct transaction to and so it excludes reserved regions. - -BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms -with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole, -resulting in lost capacity in the Endpoints with no SPA to map to that part of -the HPA range that intersects the hole. - -E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB: - - +--------+------------+-------------------+------------------+-------------------+------+ - | Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | - +========+============+===================+==================+===================+======+ - | 0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | - +--------+------------+-------------------+------------------+-------------------+------+ - | 1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 | - +--------+------------+-------------------+------------------+-------------------+------+ - -HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of -a 12 ways region and all the intermediate Switch Decoders. They are configured -by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of -3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root -Decoder HPA range that results smaller (2GB) than that of the Switch and -Endpoint Decoders in the hierarchy (3GB). - -This creates 2 issues which lead to a failure to construct a region: - -1) A mismatch in region size between root and any HDM decoder. The root decoders - will always be smaller due to the trim. - -2) The trim causes the root decoder to violate the (NIW * 256MB) rule. - -This change allows a region with a base address of 0GB to bypass these checks to -allow for region creation with the trimmed root decoder address range. - -This change does not allow for any other arbitrary region to violate these -checks - it is intended exclusively to enable x86 platforms which map CXL memory -under 4GB. - -Despite the HDM decoders covering the PCIE hole HPA region, it is expected that -the platform will never route address accesses to the CXL complex because the -root decoder only covers the trimmed region (which excludes this). This is -outside the ability of Linux to enforce. - -On the example platform, only the first 2GB will be potentially usable, but -Linux, aiming to adhere to the current specifications, fails to construct -Regions and attach Endpoint and intermediate Switch Decoders to them. - -There are several points of failure that due to the expectation that the Root -Decoder HPA size, that is equal to the CFMWS from which it is configured, has -to be greater or equal to the matching Switch and Endpoint HDM Decoders. - -In order to succeed with construction and attachment, Linux must construct a -Region with Root Decoder HPA range size, and then attach to that all the -intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy -regardless of their range sizes. - -Benefits of the Change ----------------------- - -Without the change, the OSPM wouldn't match intermediate Switch and Endpoint -Decoders with Root Decoders configured with CFMWS HPA sizes that don't align -with the NIW * 256MB constraint, and so it leads to lost memdev capacity. - -This change allows the OSPM to construct Regions and attach intermediate Switch -and Endpoint Decoders to them, so that the addressable part of the memory -devices total capacity is made available to the users. - -References ----------- - -Compute Express Link Specification Revision 3.2, Version 1.0 -<https://www.computeexpresslink.org/> - -Detailed Description of the Change ----------------------------------- - -The description of the Window Size field in table 9-22 needs to account for -platforms with Low Memory Holes, where SPA ranges might be subsets of the -endpoints HPA. Therefore, it has to be changed to the following: - -"The total number of consecutive bytes of HPA this window represents. This value -shall be a multiple of NIW * 256 MB. - -On platforms that reserve physical addresses below 4 GB, such as the Low Memory -Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might -have a size that doesn't align with the NIW * 256 MB constraint. - -Note that the matching intermediate Switch Decoders and the Endpoint Decoders -HPA range sizes must still align to the above-mentioned rule, but the memory -capacity that exceeds the CFMWS window size won't be accessible.". + conventions/cxl-lmh.rst + conventions/cxl-atl.rst + conventions/template.rst diff --git a/Documentation/driver-api/cxl/conventions/cxl-atl.rst b/Documentation/driver-api/cxl/conventions/cxl-atl.rst new file mode 100644 index 000000000000..3a36a84743d0 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions/cxl-atl.rst @@ -0,0 +1,304 @@ +.. SPDX-License-Identifier: GPL-2.0 + +ACPI PRM CXL Address Translation +================================ + +Document +-------- + +CXL Revision 3.2, Version 1.0 + +License +------- + +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +- Robert Richter, AMD et al. + +Summary of the Change +--------------------- + +The CXL Fixed Memory Window Structures (CFMWS) describe zero or more Host +Physical Address (HPA) windows associated with one or more CXL Host Bridges. +Each HPA range of a CXL Host Bridge is represented by a CFMWS entry. An HPA +range may include addresses currently assigned to CXL.mem devices, or an OS may +assign ranges from an address window to a device. + +Host-managed Device Memory is Device-attached memory that is mapped to system +coherent address space and accessible to the Host using standard write-back +semantics. The managed address range is configured in the CXL HDM Decoder +registers of the device. An HDM Decoder in a device is responsible for +converting HPA into DPA by stripping off specific address bits. + +CXL devices and CXL bridges use the same HPA space. It is common across all +components that belong to the same host domain. The view of the address region +must be consistent on the CXL.mem path between the Host and the Device. + +This is described in the *CXL 3.2 specification* (Table 1-1, 3.3.1, +8.2.4.20, 9.13.1, 9.18.1.3). [#cxl-spec-3.2]_ + +Depending on the interconnect architecture of the platform, components attached +to a host may not share the same host physical address space. Those platforms +need address translation to convert an HPA between the host and the attached +component, such as a CXL device. The translation mechanism is host-specific and +implementation dependent. + +For example, x86 AMD platforms use a Data Fabric that manages access to physical +memory. Devices have their own memory space and can be configured to use +'Normalized addresses' different from System Physical Addresses (SPA). Address +translation is then needed. For details, see +:doc:`x86 AMD Address Translation </admin-guide/RAS/address-translation>`. + +Those AMD platforms provide PRM [#prm-spec]_ handlers in firmware to perform +various types of address translation, including for CXL endpoints. AMD Zen5 +systems implement the ACPI PRM CXL Address Translation firmware call. The ACPI +PRM handler has a specific GUID to uniquely identify platforms with support for +Normalized addressing. This is documented in the *ACPI v6.5 Porting Guide* +(Address Translation - CXL DPA to System Physical Address). [#amd-ppr-58088]_ + +When in Normalized address mode, HDM decoder address ranges must be configured +and handled differently. Hardware addresses used in the HDM decoder +configurations of an endpoint are not SPA and need to be translated from the +address range of the endpoint to that of the CXL host bridge. This is especially +important for finding an endpoint's associated CXL Host Bridge and HPA window +described in the CFMWS. Additionally, the interleave decoding is done by the +Data Fabric and the endpoint does not perform decoding when converting HPA to +DPA. Instead, interleaving is switched off for the endpoint (1-way). Finally, +address translation might also be needed to inspect the endpoint's hardware +addresses, such as during profiling, tracing, or error handling. + +For example, with Normalized addressing the HDM decoders could look as follows:: + + ------------------------------- + | Root Decoder (CFMWS) | + | SPA Range: 0x850000000 | + | Size: 0x8000000000 (512 GB) | + | Interleave Ways: 1 | + ------------------------------- + | + v + ------------------------------- + | Host Bridge Decoder (HDM) | + | SPA Range: 0x850000000 | + | Size: 0x8000000000 (512 GB) | + | Interleave Ways: 4 | + | Targets: endpoint5,8,11,13 | + | Granularity: 256 | + ------------------------------- + | + -----------------------------+------------------------------ + | | | | + v v v v + ------------------- ------------------- ------------------- ------------------- + | endpoint5 | | endpoint8 | | endpoint11 | | endpoint13 | + | decoder5.0 | | decoder8.0 | | decoder11.0 | | decoder13.0 | + | PCIe: | | PCIe: | | PCIe: | | PCIe: | + | 0000:e2:00.0 | | 0000:e3:00.0 | | 0000:e4:00.0 | | 0000:e1:00.0 | + | DPA: | | DPA: | | DPA: | | DPA: | + | Start: 0x0 | | Start: 0x0 | | Start: 0x0 | | Start: 0x0 | + | Size: | | Size: | | Size: | | Size: | + | 0x2000000000 | | 0x2000000000 | | 0x2000000000 | | 0x2000000000 | + | (128 GB) | | (128 GB) | | (128 GB) | | (128 GB) | + | Interleaving: | | Interleaving: | | Interleaving: | | Interleaving: | + | Ways: 1 | | Ways: 1 | | Ways: 1 | | Ways: 1 | + | Gran: 256 | | Gran: 256 | | Gran: 256 | | Gran: 256 | + ------------------- ------------------- ------------------- ------------------- + | | | | + v v v v + DPA DPA DPA DPA + +This shows the representation in sysfs: + +.. code-block:: none + + /sys/bus/cxl/devices/endpoint5/decoder5.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint5/decoder5.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint5/decoder5.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint5/decoder5.0/start:0x0 + /sys/bus/cxl/devices/endpoint8/decoder8.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint8/decoder8.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint8/decoder8.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint8/decoder8.0/start:0x0 + /sys/bus/cxl/devices/endpoint11/decoder11.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint11/decoder11.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint11/decoder11.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint11/decoder11.0/start:0x0 + /sys/bus/cxl/devices/endpoint13/decoder13.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint13/decoder13.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint13/decoder13.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint13/decoder13.0/start:0x0 + +Note that the endpoint interleaving configurations use direct mapping (1-way). + +With PRM calls, the kernel can determine the following mappings: + +.. code-block:: none + + cxl decoder5.0: address mapping found for 0000:e2:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + cxl decoder8.0: address mapping found for 0000:e3:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + cxl decoder11.0: address mapping found for 0000:e4:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + cxl decoder13.0: address mapping found for 0000:e1:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + +The corresponding CXL host bridge (HDM) decoders and root decoder (CFMWS) match +the calculated endpoint mappings shown: + +.. code-block:: none + + /sys/bus/cxl/devices/port1/decoder1.0/interleave_granularity:256 + /sys/bus/cxl/devices/port1/decoder1.0/interleave_ways:4 + /sys/bus/cxl/devices/port1/decoder1.0/size:0x8000000000 + /sys/bus/cxl/devices/port1/decoder1.0/start:0x850000000 + /sys/bus/cxl/devices/port1/decoder1.0/target_list:0,1,2,3 + /sys/bus/cxl/devices/port1/decoder1.0/target_type:expander + /sys/bus/cxl/devices/root0/decoder0.0/interleave_granularity:256 + /sys/bus/cxl/devices/root0/decoder0.0/interleave_ways:1 + /sys/bus/cxl/devices/root0/decoder0.0/size:0x8000000000 + /sys/bus/cxl/devices/root0/decoder0.0/start:0x850000000 + /sys/bus/cxl/devices/root0/decoder0.0/target_list:7 + +The following changes to the specification are needed: + +* Allow a CXL device to be in an HPA space other than the host's address space. + +* Allow the platform to use implementation-specific address translation when + crossing memory domains on the CXL.mem path between the host and the device. + +* Define a PRM handler method for converting device addresses to SPAs. + +* Specify that the platform shall provide the PRM handler method to the + Operating System to detect Normalized addressing and for determining Endpoint + SPA ranges and interleaving configurations. + +* Add reference to: + + | Platform Runtime Mechanism Specification, Version 1.1 – November 2020 + | https://uefi.org/sites/default/files/resources/PRM_Platform_Runtime_Mechanism_1_1_release_candidate.pdf + +Benefits of the Change +---------------------- + +Without the change, the Operating System may be unable to determine the memory +region and Root Decoder for an Endpoint and its corresponding HDM decoder. +Region creation would fail. Platforms with a different interconnect architecture +would fail to set up and use CXL. + +References +---------- + +.. [#cxl-spec-3.2] Compute Express Link Specification, Revision 3.2, Version 1.0, + https://www.computeexpresslink.org/ + +.. [#amd-ppr-58088] AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh, + ACPI v6.5 Porting Guide, Publication # 58088, + https://www.amd.com/en/search/documentation/hub.html + +.. [#prm-spec] Platform Runtime Mechanism, Version: 1.1, + https://uefi.org/sites/default/files/resources/PRM_Platform_Runtime_Mechanism_1_1_release_candidate.pdf + +Detailed Description of the Change +---------------------------------- + +The following describes the necessary changes to the *CXL 3.2 specification* +[#cxl-spec-3.2]_: + +Add the following reference to the table: + +Table 1-2. Reference Documents + ++----------------------------+-------------------+---------------------------+ +| Document | Chapter Reference | Document No./Location | ++============================+===================+===========================+ +| Platform Runtime Mechanism | Chapter 8, 9 | https://www.uefi.org/acpi | +| Version: 1.1 | | | ++----------------------------+-------------------+---------------------------+ + +Add the following paragraphs to the end of the section: + +**8.2.4.20 CXL HDM Decoder Capability Structure** + +"A device may use an HPA space that is not common to other components of the +host domain. The platform is responsible for address translation when crossing +HPA spaces. The Operating System must determine the interleaving configuration +and perform address translation to the HPA ranges of the HDM decoders as needed. +The translation mechanism is host-specific and implementation dependent. + +The platform indicates support of independent HPA spaces and the need for +address translation by providing a Platform Runtime Mechanism (PRM) handler. The +OS shall use that handler to perform the necessary translations from the DPA +space to the HPA space. The handler is defined in Section 9.18.4 *PRM Handler +for CXL DPA to System Physical Address Translation*." + +Add the following section and sub-section including tables: + +**9.18.4 PRM Handler for CXL DPA to System Physical Address Translation** + +"A platform may be configured to use 'Normalized addresses'. Host physical +address (HPA) spaces are component-specific and differ from system physical +addresses (SPAs). The endpoint has its own physical address space. All requests +presented to the device already use Device Physical Addresses (DPAs). The CXL +endpoint decoders have interleaving disabled (1-way interleaving) and the device +does not perform HPA decoding to determine a DPA. + +The platform provides a PRM handler for CXL DPA to System Physical Address +Translation. The PRM handler translates a Device Physical Address (DPA) to a +System Physical Address (SPA) for a specified CXL endpoint. In the address space +of the host, SPA and HPA are equivalent, and the OS shall use this handler to +determine the HPA that corresponds to a device address, for example when +configuring HDM decoders on platforms with Normalized addressing. The GUID and +the parameter buffer format of the handler are specified in section 9.18.4.1. If +the OS identifies the PRM handler, the platform supports Normalized addressing +and the OS must perform DPA address translation as needed." + +**9.18.4.1 PRM Handler Invocation** + +"The OS calls the PRM handler for CXL DPA to System Physical Address Translation +using the direct invocation mechanism. Details of calling a PRM handler are +described in the Platform Runtime Mechanism (PRM) specification. + +The PRM handler is identified by the following GUID: + + EE41B397-25D4-452C-AD54-48C6E3480B94 + +The caller allocates and prepares a Parameter Buffer, then passes the PRM +handler GUID and a pointer to the Parameter Buffer to invoke the handler. The +Parameter Buffer is described in Table 9-32." + +**Table 9-32. PRM Parameter Buffer used for CXL DPA to System Physical Address Translation** + ++-------------+-----------+------------------------------------------------------------------------+ +| Byte Offset | Length in | Description | +| | Bytes | | ++=============+===========+========================================================================+ +| 00h | 8 | **CXL Device Physical Address (DPA)**: CXL DPA (e.g., from | +| | | CXL Component Event Log) | ++-------------+-----------+------------------------------------------------------------------------+ +| 08h | 4 | **CXL Endpoint SBDF**: | +| | | | +| | | - Byte 3 - PCIe Segment | +| | | - Byte 2 - Bus Number | +| | | - Byte 1: | +| | | - Device Number Bits[7:3] | +| | | - Function Number Bits[2:0] | +| | | - Byte 0 - RESERVED (MBZ) | +| | | | ++-------------+-----------+------------------------------------------------------------------------+ +| 0Ch | 8 | **Output Buffer**: Virtual Address Pointer to the buffer, | +| | | as defined in Table 9-33. | ++-------------+-----------+------------------------------------------------------------------------+ + +**Table 9-33. PRM Output Buffer used for CXL DPA to System Physical Address Translation** + ++-------------+-----------+------------------------------------------------------------------------+ +| Byte Offset | Length in | Description | +| | Bytes | | ++=============+===========+========================================================================+ +| 00h | 8 | **System Physical Address (SPA)**: The SPA converted | +| | | from the CXL DPA. | ++-------------+-----------+------------------------------------------------------------------------+ diff --git a/Documentation/driver-api/cxl/conventions/cxl-lmh.rst b/Documentation/driver-api/cxl/conventions/cxl-lmh.rst new file mode 100644 index 000000000000..baece5c35345 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions/cxl-lmh.rst @@ -0,0 +1,135 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders +============================================================================ + +Document +-------- + +CXL Revision 3.2, Version 1.0 + +License +------- + +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +- Fabio M. De Francesco, Intel +- Dan J. Williams, Intel +- Mahesh Natu, Intel + +Summary of the Change +--------------------- + +According to the current Compute Express Link (CXL) Specifications (Revision +3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero +or more Host Physical Address (HPA) windows associated with each CXL Host +Bridge. Each window represents a contiguous HPA range that may be interleaved +across one or more targets, including CXL Host Bridges. Each window has a set +of restrictions that govern its usage. It is the Operating System-directed +configuration and Power Management (OSPM) responsibility to utilize each window +for the specified use. + +Table 9-22 of the current CXL Specifications states that the Window Size field +contains the total number of consecutive bytes of HPA this window describes. +This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB. + +Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a +memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases, +the CFMWS Range Size may not adhere to the NIW * 256 MB rule. + +The HPA represents the actual physical memory address space that the CXL devices +can decode and respond to, while the System Physical Address (SPA), a related +but distinct concept, represents the system-visible address space that users can +direct transaction to and so it excludes reserved regions. + +BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms +with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole, +resulting in lost capacity in the Endpoints with no SPA to map to that part of +the HPA range that intersects the hole. + +E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB: + + +--------+------------+-------------------+------------------+-------------------+------+ + | Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | + +========+============+===================+==================+===================+======+ + | 0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | + +--------+------------+-------------------+------------------+-------------------+------+ + | 1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 | + +--------+------------+-------------------+------------------+-------------------+------+ + +HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of +a 12 ways region and all the intermediate Switch Decoders. They are configured +by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of +3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root +Decoder HPA range that results smaller (2GB) than that of the Switch and +Endpoint Decoders in the hierarchy (3GB). + +This creates 2 issues which lead to a failure to construct a region: + +1) A mismatch in region size between root and any HDM decoder. The root decoders + will always be smaller due to the trim. + +2) The trim causes the root decoder to violate the (NIW * 256MB) rule. + +This change allows a region with a base address of 0GB to bypass these checks to +allow for region creation with the trimmed root decoder address range. + +This change does not allow for any other arbitrary region to violate these +checks - it is intended exclusively to enable x86 platforms which map CXL memory +under 4GB. + +Despite the HDM decoders covering the PCIE hole HPA region, it is expected that +the platform will never route address accesses to the CXL complex because the +root decoder only covers the trimmed region (which excludes this). This is +outside the ability of Linux to enforce. + +On the example platform, only the first 2GB will be potentially usable, but +Linux, aiming to adhere to the current specifications, fails to construct +Regions and attach Endpoint and intermediate Switch Decoders to them. + +There are several points of failure that due to the expectation that the Root +Decoder HPA size, that is equal to the CFMWS from which it is configured, has +to be greater or equal to the matching Switch and Endpoint HDM Decoders. + +In order to succeed with construction and attachment, Linux must construct a +Region with Root Decoder HPA range size, and then attach to that all the +intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy +regardless of their range sizes. + +Benefits of the Change +---------------------- + +Without the change, the OSPM wouldn't match intermediate Switch and Endpoint +Decoders with Root Decoders configured with CFMWS HPA sizes that don't align +with the NIW * 256MB constraint, and so it leads to lost memdev capacity. + +This change allows the OSPM to construct Regions and attach intermediate Switch +and Endpoint Decoders to them, so that the addressable part of the memory +devices total capacity is made available to the users. + +References +---------- + +Compute Express Link Specification Revision 3.2, Version 1.0 +<https://www.computeexpresslink.org/> + +Detailed Description of the Change +---------------------------------- + +The description of the Window Size field in table 9-22 needs to account for +platforms with Low Memory Holes, where SPA ranges might be subsets of the +endpoints HPA. Therefore, it has to be changed to the following: + +"The total number of consecutive bytes of HPA this window represents. This value +shall be a multiple of NIW * 256 MB. + +On platforms that reserve physical addresses below 4 GB, such as the Low Memory +Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might +have a size that doesn't align with the NIW * 256 MB constraint. + +Note that the matching intermediate Switch Decoders and the Endpoint Decoders +HPA range sizes must still align to the above-mentioned rule, but the memory +capacity that exceeds the CFMWS window size won't be accessible.". diff --git a/Documentation/driver-api/cxl/conventions/template.rst b/Documentation/driver-api/cxl/conventions/template.rst new file mode 100644 index 000000000000..ff2fcf1b5e24 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions/template.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. :: Template Title here: + +Template File +============= + +Document +-------- +CXL Revision <rev>, Version <ver> + +License +------- +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +Summary of the Change +--------------------- + +<Detail the conflict with the specification and where available the +assumptions and tradeoffs taken by the hardware platform.> + +Benefits of the Change +---------------------- + +<Detail what happens if platforms and Linux do not adopt this +convention.> + +References +---------- + +Detailed Description of the Change +---------------------------------- + +<Propose spec language that corrects the conflict.> diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst index ec8aae9ec0d4..3dfae1d310ca 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -30,6 +30,7 @@ that have impacts on each other. The docs here break up configurations steps. platform/acpi platform/cdat platform/example-configs + platform/device-hotplug .. toctree:: :maxdepth: 2 diff --git a/Documentation/driver-api/cxl/platform/bios-and-efi.rst b/Documentation/driver-api/cxl/platform/bios-and-efi.rst index a9aa0ccd92af..a4b44c018f09 100644 --- a/Documentation/driver-api/cxl/platform/bios-and-efi.rst +++ b/Documentation/driver-api/cxl/platform/bios-and-efi.rst @@ -29,6 +29,29 @@ at :doc:`ACPI Tables <acpi>`. on physical memory region size and alignment, memory holes, HDM interleave, and what linux expects of HDM decoders trying to work with these features. + +Linux Expectations of BIOS/EFI Software +======================================= +Linux expects BIOS/EFI software to construct sufficient ACPI tables (such as +CEDT, SRAT, HMAT, etc) and platform-specific configurations (such as HPA spaces +and host-bridge interleave configurations) to allow the Linux driver to +subsequently configure the devices in the CXL fabric at runtime. + +Programming of HDM decoders and switch ports is not required, and may be +deferred to the CXL driver based on admin policy (e.g. udev rules). + +Some platforms may require pre-programming HDM decoders and locking them +due to quirks (see: Zen5 address translation), but this is not the normal, +"expected" configuration path. This should be avoided if possible. + +Some platforms may wish to pre-configure these resources to bring memory +up without requiring CXL driver support. These platform vendors should +test their configurations with the existing CXL driver and provide driver +support for their auto-configurations if features like RAS are required. + +Platforms requiring boot-time programming and/or locking of CXL fabric +components may prevent features, such as device hot-plug, from working. + UEFI Settings ============= If your platform supports it, the :code:`uefisettings` command can be used to diff --git a/Documentation/driver-api/cxl/platform/device-hotplug.rst b/Documentation/driver-api/cxl/platform/device-hotplug.rst new file mode 100644 index 000000000000..e4a065fdd3ec --- /dev/null +++ b/Documentation/driver-api/cxl/platform/device-hotplug.rst @@ -0,0 +1,130 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================== +CXL Device Hotplug +================== + +Device hotplug refers to *physical* hotplug of a device (addition or removal +of a physical device from the machine). + +BIOS/EFI software is expected to configure sufficient resources **at boot +time** to allow hotplugged devices to be configured by software (such as +proximity domains, HPA regions, and host-bridge configurations). + +BIOS/EFI is not expected (**nor suggested**) to configure hotplugged +devices at hotplug time (i.e. HDM decoders should be left unprogrammed). + +This document covers some examples of those resources, but should not +be considered exhaustive. + +Hot-Remove +========== +Hot removal of a device typically requires careful removal of software +constructs (memory regions, associated drivers) which manage these devices. + +Hard-removing a CXL.mem device without carefully tearing down driver stacks +is likely to cause the system to machine-check (or at least SIGBUS if memory +access is limited to user space). + +Memory Device Hot-Add +===================== +A device present at boot may be associated with a CXL Fixed Memory Window +reported in :doc:`CEDT<acpi/cedt>`. That CFMWS may match the size of the +device, but the construction of the CEDT CFMWS is platform-defined. + +Hot-adding a memory device requires this pre-defined, **static** CFMWS to +have sufficient HPA space to describe that device. + +There are a few common scenarios to consider. + +Single-Endpoint Memory Device Present at Boot +--------------------------------------------- +A device present at boot likely had its capacity reported in the +:doc:`CEDT<acpi/cedt>`. If a device is removed and a new device hotplugged, +the capacity of the new device will be limited to the original CFMWS capacity. + +Adding capacity larger than the original device will cause memory region +creation to fail if the region size is greater than the CFMWS size. + +The CFMWS is **static** and cannot be adjusted. Platforms which may expect +different sized devices to be hotplugged must allocate sufficient CFMWS space +**at boot time** to cover all future expected devices. + +Multi-Endpoint Memory Device Present at Boot +-------------------------------------------- +Non-switch-based Multi-Endpoint devices are outside the scope of what the +CXL specification describes, but they are technically possible. We describe +them here for instructive reasons only - this does not imply Linux support. + +A hot-plug capable CXL memory device, such as one which presents multiple +expanders as a single large-capacity device, should report the **maximum +possible capacity** for the device at boot. :: + + HB0 + RP0 + | + [Multi-Endpoint Memory Device] + _____|_____ + | | + [Endpoint0] [Empty] + + +Limiting the size to the capacity preset at boot will limit hot-add support +to replacing capacity that was present at boot. + +No CXL Device Present at Boot +----------------------------- +When no CXL memory device is present on boot, some platforms omit the CFMWS +in the :doc:`CEDT<acpi/cedt>`. When this occurs, hot-add is not possible. + +This describes the base case for any given device not being present at boot. +If a future possible device is not described in the CEDT at boot, hot-add +of that device is either limited or not possible. + +For a platform to support hot-add of a full memory device, it must allocate +a CEDT CFMWS region with sufficient memory capacity to cover all future +potentially added capacity (along with any relevant CEDT CHBS entry). + +To support memory hotplug directly on the host bridge/root port, or on a switch +downstream of the host bridge, a platform must construct a CEDT CFMWS at boot +with sufficient resources to support the max possible (or expected) hotplug +memory capacity. :: + + HB0 HB1 + RP0 RP1 RP2 + | | | + Empty Empty USP + ________|________ + | | | | + DSP DSP DSP DSP + | | | | + All Empty + +For example, a BIOS/EFI may expose an option to configure a CEDT CFMWS with +a pre-configured amount of memory capacity (per host bridge, or host bridge +interleave set), even if no device is attached to Root Ports or Downstream +Ports at boot (as depicted in the figure above). + + +Interleave Sets +=============== + +Host Bridge Interleave +---------------------- +Host-bridge interleaved memory regions are defined **statically** in the +:doc:`CEDT<acpi/cedt>`. To apply cross-host-bridge interleave, a CFMWS entry +describing that interleave must have been provided **at boot**. Hotplugged +devices cannot add host-bridge interleave capabilities at hotplug time. + +See the :doc:`Flexible CEDT Configuration<example-configurations/flexible>` +example to see how a platform can provide this kind of flexibility regarding +hotplugged memory devices. BIOS/EFI software should consider options to +present flexible CEDT configurations with hotplug support. + +HDM Interleave +-------------- +Decoder-applied interleave can flexibly handle hotplugged devices, as decoders +can be re-programmed after hotplug. + +To add or remove a device to/from an existing HDM-applied interleaved region, +that region must be torn down an re-created. diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 97b54bd0f482..2a9992758933 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1173,18 +1173,24 @@ __init static unsigned long ram_alignment(resource_size_t pos) __init void e820__reserve_resources_late(void) { - u32 idx; - struct resource *res; - /* * Register device address regions listed in the E820 map, * these can be claimed by device drivers later on: */ - res = e820_res; - for (idx = 0; idx < e820_table->nr_entries; idx++) { - if (!res->parent && res->end) + for (u32 idx = 0; idx < e820_table->nr_entries; idx++) { + struct resource *res = e820_res + idx; + + /* skip added or uninitialized resources */ + if (res->parent || !res->end) + continue; + + /* set aside soft-reserved resources for driver consideration */ + if (res->desc == IORES_DESC_SOFT_RESERVED) { + insert_resource_expand_to_fit(&soft_reserve_resource, res); + } else { + /* publish the rest immediately */ insert_resource_expand_to_fit(&iomem_resource, res); - res++; + } } /* @@ -1199,7 +1205,7 @@ __init void e820__reserve_resources_late(void) * doesn't properly list 'stolen RAM' as a system region * in the E820 map. */ - for (idx = 0; idx < e820_table->nr_entries; idx++) { + for (u32 idx = 0; idx < e820_table->nr_entries; idx++) { struct e820_entry *entry = &e820_table->entries[idx]; u64 start, end; diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 48b7314afdb8..4589bf11d3fe 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -22,6 +22,7 @@ if CXL_BUS config CXL_PCI tristate "PCI manageability" default CXL_BUS + select CXL_MEM help The CXL specification defines a "CXL memory device" sub-class in the PCI "memory controller" base class of devices. Device's identified by @@ -89,7 +90,6 @@ config CXL_PMEM config CXL_MEM tristate "CXL: Memory Expansion" - depends on CXL_PCI default CXL_BUS help The CXL.mem protocol allows a device to act as a provider of "System @@ -233,4 +233,13 @@ config CXL_MCE def_bool y depends on X86_MCE && MEMORY_FAILURE +config CXL_RAS + def_bool y + depends on ACPI_APEI_GHES && PCIEAER && CXL_BUS + +config CXL_ATL + def_bool y + depends on CXL_REGION + depends on ACPI_PRMT && AMD_NB + endif diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 49bba2b9a3c4..d78f005bd994 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -325,10 +325,6 @@ static int cxl_acpi_qos_class(struct cxl_root *cxl_root, return cxl_acpi_evaluate_qtg_dsm(handle, coord, entries, qos_class); } -static const struct cxl_root_ops acpi_root_ops = { - .qos_class = cxl_acpi_qos_class, -}; - static void del_cxl_resource(struct resource *res) { if (!res) @@ -364,7 +360,7 @@ static int add_or_reset_cxl_resource(struct resource *parent, struct resource *r return rc; } -static int cxl_acpi_set_cache_size(struct cxl_root_decoder *cxlrd) +static void cxl_setup_extended_linear_cache(struct cxl_root_decoder *cxlrd) { struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct range *hpa = &cxld->hpa_range; @@ -374,12 +370,14 @@ static int cxl_acpi_set_cache_size(struct cxl_root_decoder *cxlrd) struct resource res; int nid, rc; + /* Explicitly initialize cache size to 0 at the beginning */ + cxlrd->cache_size = 0; res = DEFINE_RES_MEM(start, size); nid = phys_to_target_node(start); rc = hmat_get_extended_linear_cache_size(&res, nid, &cache_size); if (rc) - return 0; + return; /* * The cache range is expected to be within the CFMWS. @@ -391,31 +389,10 @@ static int cxl_acpi_set_cache_size(struct cxl_root_decoder *cxlrd) dev_warn(&cxld->dev, "Extended Linear Cache size %pa != CXL size %pa. No Support!", &cache_size, &size); - return -ENXIO; + return; } cxlrd->cache_size = cache_size; - - return 0; -} - -static void cxl_setup_extended_linear_cache(struct cxl_root_decoder *cxlrd) -{ - int rc; - - rc = cxl_acpi_set_cache_size(cxlrd); - if (rc) { - /* - * Failing to retrieve extended linear cache region resize does not - * prevent the region from functioning. Only causes cxl list showing - * incorrect region size. - */ - dev_warn(cxlrd->cxlsd.cxld.dev.parent, - "Extended linear cache retrieval failed rc:%d\n", rc); - - /* Ignoring return code */ - cxlrd->cache_size = 0; - } } DEFINE_FREE(put_cxlrd, struct cxl_root_decoder *, @@ -930,11 +907,14 @@ static int cxl_acpi_probe(struct platform_device *pdev) cxl_res->end = -1; cxl_res->flags = IORESOURCE_MEM; - cxl_root = devm_cxl_add_root(host, &acpi_root_ops); + cxl_root = devm_cxl_add_root(host); if (IS_ERR(cxl_root)) return PTR_ERR(cxl_root); + cxl_root->ops.qos_class = cxl_acpi_qos_class; root_port = &cxl_root->port; + cxl_setup_prm_address_translation(cxl_root); + rc = bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport); if (rc < 0) @@ -1015,8 +995,12 @@ static void __exit cxl_acpi_exit(void) cxl_bus_drain(); } -/* load before dax_hmem sees 'Soft Reserved' CXL ranges */ -subsys_initcall(cxl_acpi_init); +/* + * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use + * subsys_initcall_sync() since there is an order dependency with + * subsys_initcall(efisubsys_init), which must run first. + */ +subsys_initcall_sync(cxl_acpi_init); /* * Arrange for host-bridge ports to be active synchronous with diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 5ad8fef210b5..a639a9499972 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -14,9 +14,11 @@ cxl_core-y += pci.o cxl_core-y += hdm.o cxl_core-y += pmu.o cxl_core-y += cdat.o -cxl_core-y += ras.o cxl_core-$(CONFIG_TRACING) += trace.o cxl_core-$(CONFIG_CXL_REGION) += region.o cxl_core-$(CONFIG_CXL_MCE) += mce.o cxl_core-$(CONFIG_CXL_FEATURES) += features.o cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o +cxl_core-$(CONFIG_CXL_RAS) += ras.o +cxl_core-$(CONFIG_CXL_RAS) += ras_rch.o +cxl_core-$(CONFIG_CXL_ATL) += atl.o diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c new file mode 100644 index 000000000000..310668786189 --- /dev/null +++ b/drivers/cxl/core/atl.c @@ -0,0 +1,211 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Advanced Micro Devices, Inc. + */ + +#include <linux/prmt.h> +#include <linux/pci.h> +#include <linux/acpi.h> + +#include <cxlmem.h> +#include "core.h" + +/* + * PRM Address Translation - CXL DPA to System Physical Address + * + * Reference: + * + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh + * ACPI v6.5 Porting Guide, Publication # 58088 + */ + +static const guid_t prm_cxl_dpa_spa_guid = + GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3, + 0x48, 0x0b, 0x94); + +struct prm_cxl_dpa_spa_data { + u64 dpa; + u8 reserved; + u8 devfn; + u8 bus; + u8 segment; + u64 *spa; +} __packed; + +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa) +{ + struct prm_cxl_dpa_spa_data data; + u64 spa; + int rc; + + data = (struct prm_cxl_dpa_spa_data) { + .dpa = dpa, + .devfn = pci_dev->devfn, + .bus = pci_dev->bus->number, + .segment = pci_domain_nr(pci_dev->bus), + .spa = &spa, + }; + + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data); + if (rc) { + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc); + return ULLONG_MAX; + } + + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa); + + return spa; +} + +static int cxl_prm_setup_root(struct cxl_root *cxl_root, void *data) +{ + struct cxl_region_context *ctx = data; + struct cxl_endpoint_decoder *cxled = ctx->cxled; + struct cxl_decoder *cxld = &cxled->cxld; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct range hpa_range = ctx->hpa_range; + struct pci_dev *pci_dev; + u64 spa_len, len; + u64 addr, base_spa, base; + int ways, gran; + + /* + * When Normalized Addressing is enabled, the endpoint maintains a 1:1 + * mapping between HPA and DPA. If disabled, skip address translation + * and perform only a range check. + */ + if (hpa_range.start != cxled->dpa_res->start) + return 0; + + /* + * Endpoints are programmed passthrough in Normalized Addressing mode. + */ + if (ctx->interleave_ways != 1) { + dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n", + ctx->interleave_ways, ctx->interleave_granularity); + return -ENXIO; + } + + if (!cxlmd || !dev_is_pci(cxlmd->dev.parent)) { + dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n", + dev_name(cxld->dev.parent), hpa_range.start, + hpa_range.end); + return -ENXIO; + } + + pci_dev = to_pci_dev(cxlmd->dev.parent); + + /* Translate HPA range to SPA. */ + base = hpa_range.start; + hpa_range.start = prm_cxl_dpa_spa(pci_dev, hpa_range.start); + hpa_range.end = prm_cxl_dpa_spa(pci_dev, hpa_range.end); + base_spa = hpa_range.start; + + if (hpa_range.start == ULLONG_MAX || hpa_range.end == ULLONG_MAX) { + dev_dbg(cxld->dev.parent, + "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n", + hpa_range.start, hpa_range.end, ctx->hpa_range.start, + ctx->hpa_range.end, dev_name(&cxld->dev)); + return -ENXIO; + } + + /* + * Since translated addresses include the interleaving offsets, align + * the range to 256 MB. + */ + hpa_range.start = ALIGN_DOWN(hpa_range.start, SZ_256M); + hpa_range.end = ALIGN(hpa_range.end, SZ_256M) - 1; + + len = range_len(&ctx->hpa_range); + spa_len = range_len(&hpa_range); + if (!len || !spa_len || spa_len % len) { + dev_dbg(cxld->dev.parent, + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n", + hpa_range.start, hpa_range.end, ctx->hpa_range.start, + ctx->hpa_range.end, dev_name(&cxld->dev)); + return -ENXIO; + } + + ways = spa_len / len; + gran = SZ_256; + + /* + * Determine interleave granularity + * + * Note: The position of the chunk from one interleaving block to the + * next may vary and thus cannot be considered constant. Address offsets + * larger than the interleaving block size cannot be used to calculate + * the granularity. + */ + if (ways > 1) { + while (gran <= SZ_16M) { + addr = prm_cxl_dpa_spa(pci_dev, base + gran); + if (addr != base_spa + gran) + break; + gran <<= 1; + } + } + + if (gran > SZ_16M) { + dev_dbg(cxld->dev.parent, + "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n", + hpa_range.start, hpa_range.end, ctx->hpa_range.start, + ctx->hpa_range.end, dev_name(&cxld->dev)); + return -ENXIO; + } + + /* + * The current kernel implementation does not support endpoint + * setup with Normalized Addressing. It only translates an + * endpoint's DPA to the SPA range of the host bridge. + * Therefore, the endpoint address range cannot be determined, + * making a non-auto setup impossible. If a decoder requires + * address translation, reprogramming should be disabled and + * the decoder locked. + * + * The BIOS, however, provides all the necessary address + * translation data, which the kernel can use to reconfigure + * endpoint decoders with normalized addresses. Locking the + * decoders in the BIOS would prevent a capable kernel (or + * other operating systems) from shutting down auto-generated + * regions and managing resources dynamically. + * + * Indicate that Normalized Addressing is enabled. + */ + cxld->flags |= CXL_DECODER_F_LOCK; + cxld->flags |= CXL_DECODER_F_NORMALIZED_ADDRESSING; + + ctx->hpa_range = hpa_range; + ctx->interleave_ways = ways; + ctx->interleave_granularity = gran; + + dev_dbg(&cxld->dev, + "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n", + dev_name(cxlmd->dev.parent), base, len, hpa_range.start, + spa_len, ways, gran); + + return 0; +} + +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root) +{ + struct device *host = cxl_root->port.uport_dev; + u64 spa; + struct prm_cxl_dpa_spa_data data = { .spa = &spa }; + int rc; + + /* + * Applies only to PCIe Host Bridges which are children of the CXL Root + * Device (HID=“ACPI0017”). Check this and drop cxl_test instances. + */ + if (!acpi_match_device(host->driver->acpi_match_table, host)) + return; + + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */ + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data); + if (rc == -EOPNOTSUPP || rc == -ENODEV) + return; + + cxl_root->ops.translation_setup_root = cxl_prm_setup_root; +} +EXPORT_SYMBOL_NS_GPL(cxl_setup_prm_address_translation, "CXL"); diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 7120b5f2e31f..18f0f2a25113 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -213,7 +213,7 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, if (!cxl_root) return -ENODEV; - if (!cxl_root->ops || !cxl_root->ops->qos_class) + if (!cxl_root->ops.qos_class) return -EOPNOTSUPP; xa_for_each(dsmas_xa, index, dent) { @@ -221,9 +221,9 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, cxl_coordinates_combine(dent->coord, dent->cdat_coord, ep_c); dent->entries = 1; - rc = cxl_root->ops->qos_class(cxl_root, - &dent->coord[ACCESS_COORDINATE_CPU], - 1, &qos_class); + rc = cxl_root->ops.qos_class(cxl_root, + &dent->coord[ACCESS_COORDINATE_CPU], + 1, &qos_class); if (rc != 1) continue; diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 1fb66132b777..007b8aff0238 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -19,6 +19,14 @@ enum cxl_detach_mode { }; #ifdef CONFIG_CXL_REGION + +struct cxl_region_context { + struct cxl_endpoint_decoder *cxled; + struct range hpa_range; + int interleave_ways; + int interleave_granularity; +}; + extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; extern struct device_attribute dev_attr_delete_region; @@ -144,8 +152,40 @@ int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c); int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port, struct access_coordinate *c); +static inline struct device *dport_to_host(struct cxl_dport *dport) +{ + struct cxl_port *port = dport->port; + + if (is_cxl_root(port)) + return port->uport_dev; + return &port->dev; +} +#ifdef CONFIG_CXL_RAS int cxl_ras_init(void); void cxl_ras_exit(void); +bool cxl_handle_ras(struct device *dev, void __iomem *ras_base); +void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base); +void cxl_dport_map_rch_aer(struct cxl_dport *dport); +void cxl_disable_rch_root_ints(struct cxl_dport *dport); +void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds); +void devm_cxl_dport_ras_setup(struct cxl_dport *dport); +#else +static inline int cxl_ras_init(void) +{ + return 0; +} +static inline void cxl_ras_exit(void) { } +static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base) +{ + return false; +} +static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { } +static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { } +static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { } +static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { } +static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { } +#endif /* CONFIG_CXL_RAS */ + int cxl_gpf_port_setup(struct cxl_dport *dport); struct cxl_hdm; diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index 79994ca9bc9f..81160260e26b 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -1988,6 +1988,40 @@ static int cxl_memdev_soft_ppr_init(struct cxl_memdev *cxlmd, return 0; } +static void err_rec_free(void *_cxlmd) +{ + struct cxl_memdev *cxlmd = _cxlmd; + struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; + struct cxl_event_gen_media *rec_gen_media; + struct cxl_event_dram *rec_dram; + unsigned long index; + + cxlmd->err_rec_array = NULL; + xa_for_each(&array_rec->rec_dram, index, rec_dram) + kfree(rec_dram); + xa_destroy(&array_rec->rec_dram); + + xa_for_each(&array_rec->rec_gen_media, index, rec_gen_media) + kfree(rec_gen_media); + xa_destroy(&array_rec->rec_gen_media); + kfree(array_rec); +} + +static int devm_cxl_memdev_setup_err_rec(struct cxl_memdev *cxlmd) +{ + struct cxl_mem_err_rec *array_rec = + kzalloc(sizeof(*array_rec), GFP_KERNEL); + + if (!array_rec) + return -ENOMEM; + + xa_init(&array_rec->rec_gen_media); + xa_init(&array_rec->rec_dram); + cxlmd->err_rec_array = array_rec; + + return devm_add_action_or_reset(&cxlmd->dev, err_rec_free, cxlmd); +} + int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) { struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES]; @@ -2038,15 +2072,9 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) } if (repair_inst) { - struct cxl_mem_err_rec *array_rec = - devm_kzalloc(&cxlmd->dev, sizeof(*array_rec), - GFP_KERNEL); - if (!array_rec) - return -ENOMEM; - - xa_init(&array_rec->rec_gen_media); - xa_init(&array_rec->rec_dram); - cxlmd->err_rec_array = array_rec; + rc = devm_cxl_memdev_setup_err_rec(cxlmd); + if (rc) + return rc; } } @@ -2088,22 +2116,4 @@ int devm_cxl_region_edac_register(struct cxl_region *cxlr) } EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL"); -void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd) -{ - struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; - struct cxl_event_gen_media *rec_gen_media; - struct cxl_event_dram *rec_dram; - unsigned long index; - - if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec) - return; - - xa_for_each(&array_rec->rec_dram, index, rec_dram) - kfree(rec_dram); - xa_destroy(&array_rec->rec_dram); - xa_for_each(&array_rec->rec_gen_media, index, rec_gen_media) - kfree(rec_gen_media); - xa_destroy(&array_rec->rec_gen_media); -} -EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_release, "CXL"); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index eb5a3a7640c6..e3f0c39e6812 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -844,14 +844,13 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) scoped_guard(rwsem_read, &cxl_rwsem.dpa) setup_hw_decoder(cxld, hdm); - port->commit_end++; rc = cxld_await_commit(hdm, cxld->id); if (rc) { dev_dbg(&port->dev, "%s: error %d committing decoder\n", dev_name(&cxld->dev), rc); - cxld->reset(cxld); return rc; } + port->commit_end++; cxld->flags |= CXL_DECODER_F_ENABLE; return 0; @@ -966,7 +965,7 @@ static int cxl_setup_hdm_decoder_from_dvsec( rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0); if (rc) { dev_err(&port->dev, - "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)", + "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx: %d\n", port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc); return rc; } @@ -1117,7 +1116,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, rc = devm_cxl_dpa_reserve(cxled, *dpa_base + skip, dpa_size, skip); if (rc) { dev_err(&port->dev, - "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)", + "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx: %d\n", port->id, cxld->id, *dpa_base, *dpa_base + dpa_size + skip - 1, rc); return rc; @@ -1219,12 +1218,12 @@ static int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm, } /** - * __devm_cxl_switch_port_decoders_setup - allocate and setup switch decoders + * devm_cxl_switch_port_decoders_setup - allocate and setup switch decoders * @port: CXL port context * * Return 0 or -errno on error */ -int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port) +int devm_cxl_switch_port_decoders_setup(struct cxl_port *port) { struct cxl_hdm *cxlhdm; @@ -1248,7 +1247,7 @@ int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port) dev_err(&port->dev, "HDM decoder capability not found\n"); return -ENXIO; } -EXPORT_SYMBOL_NS_GPL(__devm_cxl_switch_port_decoders_setup, "CXL"); +EXPORT_SYMBOL_NS_GPL(devm_cxl_switch_port_decoders_setup, "CXL"); /** * devm_cxl_endpoint_decoders_setup - allocate and setup endpoint decoders diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index e370d733e440..af3d0cc65138 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -27,7 +27,6 @@ static void cxl_memdev_release(struct device *dev) struct cxl_memdev *cxlmd = to_cxl_memdev(dev); ida_free(&cxl_memdev_ida, cxlmd->id); - devm_cxl_memdev_edac_release(cxlmd); kfree(cxlmd); } @@ -642,14 +641,24 @@ static void detach_memdev(struct work_struct *work) struct cxl_memdev *cxlmd; cxlmd = container_of(work, typeof(*cxlmd), detach_work); - device_release_driver(&cxlmd->dev); + + /* + * When the creator of @cxlmd sets ->attach it indicates CXL operation + * is required. In that case, @cxlmd detach escalates to parent device + * detach. + */ + if (cxlmd->attach) + device_release_driver(cxlmd->dev.parent); + else + device_release_driver(&cxlmd->dev); put_device(&cxlmd->dev); } static struct lock_class_key cxl_memdev_key; static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, - const struct file_operations *fops) + const struct file_operations *fops, + const struct cxl_memdev_attach *attach) { struct cxl_memdev *cxlmd; struct device *dev; @@ -665,6 +674,8 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, goto err; cxlmd->id = rc; cxlmd->depth = -1; + cxlmd->attach = attach; + cxlmd->endpoint = ERR_PTR(-ENXIO); dev = &cxlmd->dev; device_initialize(dev); @@ -1051,50 +1062,84 @@ static const struct file_operations cxl_memdev_fops = { .llseek = noop_llseek, }; -struct cxl_memdev *devm_cxl_add_memdev(struct device *host, - struct cxl_dev_state *cxlds) +/* + * Activate ioctl operations, no cxl_memdev_rwsem manipulation needed as this is + * ordered with cdev_add() publishing the device. + */ +static int cxlmd_add(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds) +{ + int rc; + + cxlmd->cxlds = cxlds; + cxlds->cxlmd = cxlmd; + + rc = cdev_device_add(&cxlmd->cdev, &cxlmd->dev); + if (rc) { + /* + * The cdev was briefly live, shutdown any ioctl operations that + * saw that state. + */ + cxl_memdev_shutdown(&cxlmd->dev); + return rc; + } + + return 0; +} + +DEFINE_FREE(put_cxlmd, struct cxl_memdev *, + if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev)) + +static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd) +{ + int rc; + + /* + * If @attach is provided fail if the driver is not attached upon + * return. Note that failure here could be the result of a race to + * teardown the CXL port topology. I.e. cxl_mem_probe() could have + * succeeded and then cxl_mem unbound before the lock is acquired. + */ + guard(device)(&cxlmd->dev); + if (cxlmd->attach && !cxlmd->dev.driver) { + cxl_memdev_unregister(cxlmd); + return ERR_PTR(-ENXIO); + } + + rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister, + cxlmd); + if (rc) + return ERR_PTR(rc); + + return cxlmd; +} + +/* + * Core helper for devm_cxl_add_memdev() that wants to both create a device and + * assert to the caller that upon return cxl_mem::probe() has been invoked. + */ +struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds, + const struct cxl_memdev_attach *attach) { - struct cxl_memdev *cxlmd; struct device *dev; - struct cdev *cdev; int rc; - cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops); + struct cxl_memdev *cxlmd __free(put_cxlmd) = + cxl_memdev_alloc(cxlds, &cxl_memdev_fops, attach); if (IS_ERR(cxlmd)) return cxlmd; dev = &cxlmd->dev; rc = dev_set_name(dev, "mem%d", cxlmd->id); if (rc) - goto err; - - /* - * Activate ioctl operations, no cxl_memdev_rwsem manipulation - * needed as this is ordered with cdev_add() publishing the device. - */ - cxlmd->cxlds = cxlds; - cxlds->cxlmd = cxlmd; - - cdev = &cxlmd->cdev; - rc = cdev_device_add(cdev, dev); - if (rc) - goto err; + return ERR_PTR(rc); - rc = devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd); + rc = cxlmd_add(cxlmd, cxlds); if (rc) return ERR_PTR(rc); - return cxlmd; -err: - /* - * The cdev was briefly live, shutdown any ioctl operations that - * saw that state. - */ - cxl_memdev_shutdown(dev); - put_device(dev); - return ERR_PTR(rc); + return cxl_memdev_autoremove(no_free_ptr(cxlmd)); } -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL"); +EXPORT_SYMBOL_FOR_MODULES(__devm_cxl_add_memdev, "cxl_mem"); static void sanitize_teardown_notifier(void *data) { diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 5b023a0178a4..f96ce884a213 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -41,14 +41,14 @@ static int pci_get_port_num(struct pci_dev *pdev) } /** - * __devm_cxl_add_dport_by_dev - allocate a dport by dport device + * devm_cxl_add_dport_by_dev - allocate a dport by dport device * @port: cxl_port that hosts the dport * @dport_dev: 'struct device' of the dport * * Returns the allocated dport on success or ERR_PTR() of -errno on error */ -struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port, - struct device *dport_dev) +struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port, + struct device *dport_dev) { struct cxl_register_map map; struct pci_dev *pdev; @@ -69,7 +69,7 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port, device_lock_assert(&port->dev); return devm_cxl_add_dport(port, dport_dev, port_num, map.resource); } -EXPORT_SYMBOL_NS_GPL(__devm_cxl_add_dport_by_dev, "CXL"); +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport_by_dev, "CXL"); static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id) { @@ -86,12 +86,12 @@ static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id) i = 1; do { rc = pci_read_config_dword(pdev, - d + CXL_DVSEC_RANGE_SIZE_LOW(id), + d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id), &temp); if (rc) return rc; - valid = FIELD_GET(CXL_DVSEC_MEM_INFO_VALID, temp); + valid = FIELD_GET(PCI_DVSEC_CXL_MEM_INFO_VALID, temp); if (valid) break; msleep(1000); @@ -121,11 +121,11 @@ static int cxl_dvsec_mem_range_active(struct cxl_dev_state *cxlds, int id) /* Check MEM ACTIVE bit, up to 60s timeout by default */ for (i = media_ready_timeout; i; i--) { rc = pci_read_config_dword( - pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(id), &temp); + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id), &temp); if (rc) return rc; - active = FIELD_GET(CXL_DVSEC_MEM_ACTIVE, temp); + active = FIELD_GET(PCI_DVSEC_CXL_MEM_ACTIVE, temp); if (active) break; msleep(1000); @@ -154,11 +154,11 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds) u16 cap; rc = pci_read_config_word(pdev, - d + CXL_DVSEC_CAP_OFFSET, &cap); + d + PCI_DVSEC_CXL_CAP, &cap); if (rc) return rc; - hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap); + hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT, cap); for (i = 0; i < hdm_count; i++) { rc = cxl_dvsec_mem_range_valid(cxlds, i); if (rc) @@ -186,16 +186,16 @@ static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val) u16 ctrl; int rc; - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl); + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, &ctrl); if (rc < 0) return rc; - if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val) + if ((ctrl & PCI_DVSEC_CXL_MEM_ENABLE) == val) return 1; - ctrl &= ~CXL_DVSEC_MEM_ENABLE; + ctrl &= ~PCI_DVSEC_CXL_MEM_ENABLE; ctrl |= val; - rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl); + rc = pci_write_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, ctrl); if (rc < 0) return rc; @@ -211,7 +211,7 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds) { int rc; - rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE); + rc = cxl_set_mem_enable(cxlds, PCI_DVSEC_CXL_MEM_ENABLE); if (rc < 0) return rc; if (rc > 0) @@ -273,11 +273,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds, return -ENXIO; } - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CAP_OFFSET, &cap); + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CAP, &cap); if (rc) return rc; - if (!(cap & CXL_DVSEC_MEM_CAPABLE)) { + if (!(cap & PCI_DVSEC_CXL_MEM_CAPABLE)) { dev_dbg(dev, "Not MEM Capable\n"); return -ENXIO; } @@ -288,7 +288,7 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds, * driver is for a spec defined class code which must be CXL.mem * capable, there is no point in continuing to enable CXL.mem. */ - hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap); + hdm_count = FIELD_GET(PCI_DVSEC_CXL_HDM_COUNT, cap); if (!hdm_count || hdm_count > 2) return -EINVAL; @@ -297,11 +297,11 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds, * disabled, and they will remain moot after the HDM Decoder * capability is enabled. */ - rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl); + rc = pci_read_config_word(pdev, d + PCI_DVSEC_CXL_CTRL, &ctrl); if (rc) return rc; - info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl); + info->mem_enabled = FIELD_GET(PCI_DVSEC_CXL_MEM_ENABLE, ctrl); if (!info->mem_enabled) return 0; @@ -314,35 +314,35 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds, return rc; rc = pci_read_config_dword( - pdev, d + CXL_DVSEC_RANGE_SIZE_HIGH(i), &temp); + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i), &temp); if (rc) return rc; size = (u64)temp << 32; rc = pci_read_config_dword( - pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(i), &temp); + pdev, d + PCI_DVSEC_CXL_RANGE_SIZE_LOW(i), &temp); if (rc) return rc; - size |= temp & CXL_DVSEC_MEM_SIZE_LOW_MASK; + size |= temp & PCI_DVSEC_CXL_MEM_SIZE_LOW; if (!size) { continue; } rc = pci_read_config_dword( - pdev, d + CXL_DVSEC_RANGE_BASE_HIGH(i), &temp); + pdev, d + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i), &temp); if (rc) return rc; base = (u64)temp << 32; rc = pci_read_config_dword( - pdev, d + CXL_DVSEC_RANGE_BASE_LOW(i), &temp); + pdev, d + PCI_DVSEC_CXL_RANGE_BASE_LOW(i), &temp); if (rc) return rc; - base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK; + base |= temp & PCI_DVSEC_CXL_MEM_BASE_LOW; info->dvsec_range[ranges++] = (struct range) { .start = base, @@ -632,324 +632,6 @@ err: } EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL"); -static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds, - void __iomem *ras_base) -{ - void __iomem *addr; - u32 status; - - if (!ras_base) - return; - - addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; - status = readl(addr); - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); - trace_cxl_aer_correctable_error(cxlds->cxlmd, status); - } -} - -static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds) -{ - return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras); -} - -/* CXL spec rev3.0 8.2.4.16.1 */ -static void header_log_copy(void __iomem *ras_base, u32 *log) -{ - void __iomem *addr; - u32 *log_addr; - int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32); - - addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET; - log_addr = log; - - for (i = 0; i < log_u32_size; i++) { - *log_addr = readl(addr); - log_addr++; - addr += sizeof(u32); - } -} - -/* - * Log the state of the RAS status registers and prepare them to log the - * next error status. Return 1 if reset needed. - */ -static bool __cxl_handle_ras(struct cxl_dev_state *cxlds, - void __iomem *ras_base) -{ - u32 hl[CXL_HEADERLOG_SIZE_U32]; - void __iomem *addr; - u32 status; - u32 fe; - - if (!ras_base) - return false; - - addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET; - status = readl(addr); - if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK)) - return false; - - /* If multiple errors, log header points to first error from ctrl reg */ - if (hweight32(status) > 1) { - void __iomem *rcc_addr = - ras_base + CXL_RAS_CAP_CONTROL_OFFSET; - - fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK, - readl(rcc_addr))); - } else { - fe = status; - } - - header_log_copy(ras_base, hl); - trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl); - writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); - - return true; -} - -static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds) -{ - return __cxl_handle_ras(cxlds, cxlds->regs.ras); -} - -#ifdef CONFIG_PCIEAER_CXL - -static void cxl_dport_map_rch_aer(struct cxl_dport *dport) -{ - resource_size_t aer_phys; - struct device *host; - u16 aer_cap; - - aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base); - if (aer_cap) { - host = dport->reg_map.host; - aer_phys = aer_cap + dport->rcrb.base; - dport->regs.dport_aer = devm_cxl_iomap_block(host, aer_phys, - sizeof(struct aer_capability_regs)); - } -} - -static void cxl_dport_map_ras(struct cxl_dport *dport) -{ - struct cxl_register_map *map = &dport->reg_map; - struct device *dev = dport->dport_dev; - - if (!map->component_map.ras.valid) - dev_dbg(dev, "RAS registers not found\n"); - else if (cxl_map_component_regs(map, &dport->regs.component, - BIT(CXL_CM_CAP_CAP_ID_RAS))) - dev_dbg(dev, "Failed to map RAS capability.\n"); -} - -static void cxl_disable_rch_root_ints(struct cxl_dport *dport) -{ - void __iomem *aer_base = dport->regs.dport_aer; - u32 aer_cmd_mask, aer_cmd; - - if (!aer_base) - return; - - /* - * Disable RCH root port command interrupts. - * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors - * - * This sequence may not be necessary. CXL spec states disabling - * the root cmd register's interrupts is required. But, PCI spec - * shows these are disabled by default on reset. - */ - aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN | - PCI_ERR_ROOT_CMD_NONFATAL_EN | - PCI_ERR_ROOT_CMD_FATAL_EN); - aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND); - aer_cmd &= ~aer_cmd_mask; - writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND); -} - -/** - * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport - * @dport: the cxl_dport that needs to be initialized - * @host: host device for devm operations - */ -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host) -{ - dport->reg_map.host = host; - cxl_dport_map_ras(dport); - - if (dport->rch) { - struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev); - - if (!host_bridge->native_aer) - return; - - cxl_dport_map_rch_aer(dport); - cxl_disable_rch_root_ints(dport); - } -} -EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL"); - -static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds, - struct cxl_dport *dport) -{ - return __cxl_handle_cor_ras(cxlds, dport->regs.ras); -} - -static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds, - struct cxl_dport *dport) -{ - return __cxl_handle_ras(cxlds, dport->regs.ras); -} - -/* - * Copy the AER capability registers using 32 bit read accesses. - * This is necessary because RCRB AER capability is MMIO mapped. Clear the - * status after copying. - * - * @aer_base: base address of AER capability block in RCRB - * @aer_regs: destination for copying AER capability - */ -static bool cxl_rch_get_aer_info(void __iomem *aer_base, - struct aer_capability_regs *aer_regs) -{ - int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32); - u32 *aer_regs_buf = (u32 *)aer_regs; - int n; - - if (!aer_base) - return false; - - /* Use readl() to guarantee 32-bit accesses */ - for (n = 0; n < read_cnt; n++) - aer_regs_buf[n] = readl(aer_base + n * sizeof(u32)); - - writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS); - writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS); - - return true; -} - -/* Get AER severity. Return false if there is no error. */ -static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs, - int *severity) -{ - if (aer_regs->uncor_status & ~aer_regs->uncor_mask) { - if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV) - *severity = AER_FATAL; - else - *severity = AER_NONFATAL; - return true; - } - - if (aer_regs->cor_status & ~aer_regs->cor_mask) { - *severity = AER_CORRECTABLE; - return true; - } - - return false; -} - -static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) -{ - struct pci_dev *pdev = to_pci_dev(cxlds->dev); - struct aer_capability_regs aer_regs; - struct cxl_dport *dport; - int severity; - - struct cxl_port *port __free(put_cxl_port) = - cxl_pci_find_port(pdev, &dport); - if (!port) - return; - - if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs)) - return; - - if (!cxl_rch_get_aer_severity(&aer_regs, &severity)) - return; - - pci_print_aer(pdev, severity, &aer_regs); - - if (severity == AER_CORRECTABLE) - cxl_handle_rdport_cor_ras(cxlds, dport); - else - cxl_handle_rdport_ras(cxlds, dport); -} - -#else -static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { } -#endif - -void cxl_cor_error_detected(struct pci_dev *pdev) -{ - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); - struct device *dev = &cxlds->cxlmd->dev; - - scoped_guard(device, dev) { - if (!dev->driver) { - dev_warn(&pdev->dev, - "%s: memdev disabled, abort error handling\n", - dev_name(dev)); - return; - } - - if (cxlds->rcd) - cxl_handle_rdport_errors(cxlds); - - cxl_handle_endpoint_cor_ras(cxlds); - } -} -EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL"); - -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, - pci_channel_state_t state) -{ - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); - struct cxl_memdev *cxlmd = cxlds->cxlmd; - struct device *dev = &cxlmd->dev; - bool ue; - - scoped_guard(device, dev) { - if (!dev->driver) { - dev_warn(&pdev->dev, - "%s: memdev disabled, abort error handling\n", - dev_name(dev)); - return PCI_ERS_RESULT_DISCONNECT; - } - - if (cxlds->rcd) - cxl_handle_rdport_errors(cxlds); - /* - * A frozen channel indicates an impending reset which is fatal to - * CXL.mem operation, and will likely crash the system. On the off - * chance the situation is recoverable dump the status of the RAS - * capability registers and bounce the active state of the memdev. - */ - ue = cxl_handle_endpoint_ras(cxlds); - } - - - switch (state) { - case pci_channel_io_normal: - if (ue) { - device_release_driver(dev); - return PCI_ERS_RESULT_NEED_RESET; - } - return PCI_ERS_RESULT_CAN_RECOVER; - case pci_channel_io_frozen: - dev_warn(&pdev->dev, - "%s: frozen state error detected, disable CXL.mem\n", - dev_name(dev)); - device_release_driver(dev); - return PCI_ERS_RESULT_NEED_RESET; - case pci_channel_io_perm_failure: - dev_warn(&pdev->dev, - "failure state error detected, request disconnect\n"); - return PCI_ERS_RESULT_DISCONNECT; - } - return PCI_ERS_RESULT_NEED_RESET; -} -EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL"); - static int cxl_flit_size(struct pci_dev *pdev) { if (cxl_pci_flit_256(pdev)) @@ -1068,7 +750,7 @@ u16 cxl_gpf_get_dvsec(struct device *dev) is_port = false; dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL, - is_port ? CXL_DVSEC_PORT_GPF : CXL_DVSEC_DEVICE_GPF); + is_port ? PCI_DVSEC_CXL_PORT_GPF : PCI_DVSEC_CXL_DEVICE_GPF); if (!dvsec) dev_warn(dev, "%s GPF DVSEC not present\n", is_port ? "Port" : "Device"); @@ -1084,14 +766,14 @@ static int update_gpf_port_dvsec(struct pci_dev *pdev, int dvsec, int phase) switch (phase) { case 1: - offset = CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET; - base = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK; - scale = CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK; + offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL; + base = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE; + scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE; break; case 2: - offset = CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET; - base = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK; - scale = CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK; + offset = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL; + base = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE; + scale = PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE; break; default: return -EINVAL; diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index 8853415c106a..e7b1e6fa0ea0 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -237,12 +237,13 @@ static void cxlmd_release_nvdimm(void *_cxlmd) /** * devm_cxl_add_nvdimm() - add a bridge between a cxl_memdev and an nvdimm - * @parent_port: parent port for the (to be added) @cxlmd endpoint port - * @cxlmd: cxl_memdev instance that will perform LIBNVDIMM operations + * @host: host device for devm operations + * @port: any port in the CXL topology to find the nvdimm-bridge device + * @cxlmd: parent of the to be created cxl_nvdimm device * * Return: 0 on success negative error code on failure. */ -int devm_cxl_add_nvdimm(struct cxl_port *parent_port, +int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port, struct cxl_memdev *cxlmd) { struct cxl_nvdimm_bridge *cxl_nvb; @@ -250,7 +251,7 @@ int devm_cxl_add_nvdimm(struct cxl_port *parent_port, struct device *dev; int rc; - cxl_nvb = cxl_find_nvdimm_bridge(parent_port); + cxl_nvb = cxl_find_nvdimm_bridge(port); if (!cxl_nvb) return -ENODEV; @@ -270,10 +271,10 @@ int devm_cxl_add_nvdimm(struct cxl_port *parent_port, if (rc) goto err; - dev_dbg(&cxlmd->dev, "register %s\n", dev_name(dev)); + dev_dbg(host, "register %s\n", dev_name(dev)); /* @cxlmd carries a reference on @cxl_nvb until cxlmd_release_nvdimm */ - return devm_add_action_or_reset(&cxlmd->dev, cxlmd_release_nvdimm, cxlmd); + return devm_add_action_or_reset(host, cxlmd_release_nvdimm, cxlmd); err: put_device(dev); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 3310dbfae9d6..fea8d5f5f331 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -778,7 +778,7 @@ static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map return cxl_setup_regs(map); } -static int cxl_port_setup_regs(struct cxl_port *port, +int cxl_port_setup_regs(struct cxl_port *port, resource_size_t component_reg_phys) { if (dev_is_platform(port->uport_dev)) @@ -786,6 +786,7 @@ static int cxl_port_setup_regs(struct cxl_port *port, return cxl_setup_comp_regs(&port->dev, &port->reg_map, component_reg_phys); } +EXPORT_SYMBOL_NS_GPL(cxl_port_setup_regs, "CXL"); static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport, resource_size_t component_reg_phys) @@ -822,16 +823,18 @@ DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport) { + struct cxl_port *parent = parent_port_of(dport->port); struct dentry *dir; if (!einj_cxl_is_initialized()) return; /* - * dport_dev needs to be a PCIe port for CXL 2.0+ ports because - * EINJ expects a dport SBDF to be specified for 2.0 error injection. + * Protocol error injection is only available for CXL 2.0+ root ports + * and CXL 1.1 downstream ports */ - if (!dport->rch && !dev_is_pci(dport->dport_dev)) + if (!dport->rch && + !(dev_is_pci(dport->dport_dev) && parent && is_cxl_root(parent))) return; dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev)); @@ -954,19 +957,15 @@ struct cxl_port *devm_cxl_add_port(struct device *host, } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_port, "CXL"); -struct cxl_root *devm_cxl_add_root(struct device *host, - const struct cxl_root_ops *ops) +struct cxl_root *devm_cxl_add_root(struct device *host) { - struct cxl_root *cxl_root; struct cxl_port *port; port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL); if (IS_ERR(port)) return ERR_CAST(port); - cxl_root = to_cxl_root(port); - cxl_root->ops = ops; - return cxl_root; + return to_cxl_root(port); } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_root, "CXL"); @@ -1066,11 +1065,15 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport) return -EBUSY; } + /* Arrange for dport_dev to be valid through remove_dport() */ + struct device *dev __free(put_device) = get_device(dport->dport_dev); + rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport, GFP_KERNEL); if (rc) return rc; + retain_and_null_ptr(dev); port->nr_dports++; return 0; } @@ -1099,6 +1102,7 @@ static void cxl_dport_remove(void *data) struct cxl_dport *dport = data; struct cxl_port *port = dport->port; + port->nr_dports--; xa_erase(&port->dports, (unsigned long) dport->dport_dev); put_device(dport->dport_dev); } @@ -1113,6 +1117,48 @@ static void cxl_dport_unlink(void *data) sysfs_remove_link(&port->dev.kobj, link_name); } +static void free_dport(void *dport) +{ + kfree(dport); +} + +/* + * Upon return either a group is established with one action (free_dport()), or + * no group established and @dport is freed. + */ +static void *cxl_dport_open_dr_group_or_free(struct cxl_dport *dport) +{ + int rc; + struct device *host = dport_to_host(dport); + void *group = devres_open_group(host, dport, GFP_KERNEL); + + if (!group) { + kfree(dport); + return NULL; + } + + rc = devm_add_action_or_reset(host, free_dport, dport); + if (rc) { + devres_release_group(host, group); + return NULL; + } + + return group; +} + +static void cxl_dport_close_dr_group(struct cxl_dport *dport, void *group) +{ + devres_close_group(dport_to_host(dport), group); +} + +static void del_dport(struct cxl_dport *dport) +{ + devres_release_group(dport_to_host(dport), dport); +} + +/* The dport group id is the dport */ +DEFINE_FREE(cxl_dport_release_dr_group, void *, if (_T) del_dport(_T)) + static struct cxl_dport * __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, int port_id, resource_size_t component_reg_phys, @@ -1138,14 +1184,20 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, CXL_TARGET_STRLEN) return ERR_PTR(-EINVAL); - dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL); + dport = kzalloc(sizeof(*dport), GFP_KERNEL); if (!dport) return ERR_PTR(-ENOMEM); + /* Just enough init to manage the devres group */ dport->dport_dev = dport_dev; dport->port_id = port_id; dport->port = port; + void *dport_dr_group __free(cxl_dport_release_dr_group) = + cxl_dport_open_dr_group_or_free(dport); + if (!dport_dr_group) + return ERR_PTR(-ENOMEM); + if (rcrb == CXL_RESOURCE_NONE) { rc = cxl_dport_setup_regs(&port->dev, dport, component_reg_phys); @@ -1181,21 +1233,6 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, if (rc) return ERR_PTR(rc); - /* - * Setup port register if this is the first dport showed up. Having - * a dport also means that there is at least 1 active link. - */ - if (port->nr_dports == 1 && - port->component_reg_phys != CXL_RESOURCE_NONE) { - rc = cxl_port_setup_regs(port, port->component_reg_phys); - if (rc) { - xa_erase(&port->dports, (unsigned long)dport->dport_dev); - return ERR_PTR(rc); - } - port->component_reg_phys = CXL_RESOURCE_NONE; - } - - get_device(dport_dev); rc = devm_add_action_or_reset(host, cxl_dport_remove, dport); if (rc) return ERR_PTR(rc); @@ -1213,6 +1250,12 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, cxl_debugfs_create_dport_dir(dport); + if (!dport->rch) + devm_cxl_dport_ras_setup(dport); + + /* keep the group, and mark the end of devm actions */ + cxl_dport_close_dr_group(dport, no_free_ptr(dport_dr_group)); + return dport; } @@ -1439,15 +1482,6 @@ static void delete_switch_port(struct cxl_port *port) devm_release_action(port->dev.parent, unregister_port, port); } -static void del_dport(struct cxl_dport *dport) -{ - struct cxl_port *port = dport->port; - - devm_release_action(&port->dev, cxl_dport_unlink, dport); - devm_release_action(&port->dev, cxl_dport_remove, dport); - devm_kfree(&port->dev, dport); -} - static void del_dports(struct cxl_port *port) { struct cxl_dport *dport; @@ -1554,10 +1588,20 @@ static int match_port_by_uport(struct device *dev, const void *data) return 0; port = to_cxl_port(dev); + /* Endpoint ports are hosted by memdevs */ + if (is_cxl_memdev(port->uport_dev)) + return uport_dev == port->uport_dev->parent; return uport_dev == port->uport_dev; } -/* +/** + * find_cxl_port_by_uport - Find a CXL port device companion + * @uport_dev: Device that acts as a switch or endpoint in the CXL hierarchy + * + * In the case of endpoint ports recall that port->uport_dev points to a 'struct + * cxl_memdev' device. So, the @uport_dev argument is the parent device of the + * 'struct cxl_memdev' in that case. + * * Function takes a device reference on the port device. Caller should do a * put_device() when done. */ @@ -1597,47 +1641,44 @@ static int update_decoder_targets(struct device *dev, void *data) return 0; } -DEFINE_FREE(del_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) del_dport(_T)) -static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port, - struct device *dport_dev) +void cxl_port_update_decoder_targets(struct cxl_port *port, + struct cxl_dport *dport) { - struct cxl_dport *dport; - int rc; + device_for_each_child(&port->dev, dport, update_decoder_targets); +} +EXPORT_SYMBOL_NS_GPL(cxl_port_update_decoder_targets, "CXL"); - device_lock_assert(&port->dev); - if (!port->dev.driver) - return ERR_PTR(-ENXIO); +static bool dport_exists(struct cxl_port *port, struct device *dport_dev) +{ + struct cxl_dport *dport = cxl_find_dport_by_dev(port, dport_dev); - dport = cxl_find_dport_by_dev(port, dport_dev); if (dport) { dev_dbg(&port->dev, "dport%d:%s already exists\n", dport->port_id, dev_name(dport_dev)); - return ERR_PTR(-EBUSY); + return true; } - struct cxl_dport *new_dport __free(del_cxl_dport) = - devm_cxl_add_dport_by_dev(port, dport_dev); - if (IS_ERR(new_dport)) - return new_dport; + return false; +} - cxl_switch_parse_cdat(new_dport); +static struct cxl_dport *probe_dport(struct cxl_port *port, + struct device *dport_dev) +{ + struct cxl_driver *drv; - if (ida_is_empty(&port->decoder_ida)) { - rc = devm_cxl_switch_port_decoders_setup(port); - if (rc) - return ERR_PTR(rc); - dev_dbg(&port->dev, "first dport%d:%s added with decoders\n", - new_dport->port_id, dev_name(dport_dev)); - return no_free_ptr(new_dport); - } + device_lock_assert(&port->dev); + if (!port->dev.driver) + return ERR_PTR(-ENXIO); - /* New dport added, update the decoder targets */ - device_for_each_child(&port->dev, new_dport, update_decoder_targets); + if (dport_exists(port, dport_dev)) + return ERR_PTR(-EBUSY); - dev_dbg(&port->dev, "dport%d:%s added\n", new_dport->port_id, - dev_name(dport_dev)); + drv = container_of(port->dev.driver, struct cxl_driver, drv); + if (!drv->add_dport) + return ERR_PTR(-ENXIO); - return no_free_ptr(new_dport); + /* see cxl_port_add_dport() */ + return drv->add_dport(port, dport_dev); } static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev, @@ -1684,7 +1725,7 @@ static struct cxl_dport *devm_cxl_create_port(struct device *ep_dev, } guard(device)(&port->dev); - return cxl_port_add_dport(port, dport_dev); + return probe_dport(port, dport_dev); } static int add_port_attach_ep(struct cxl_memdev *cxlmd, @@ -1716,7 +1757,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, scoped_guard(device, &parent_port->dev) { parent_dport = cxl_find_dport_by_dev(parent_port, dparent); if (!parent_dport) { - parent_dport = cxl_port_add_dport(parent_port, dparent); + parent_dport = probe_dport(parent_port, dparent); if (IS_ERR(parent_dport)) return PTR_ERR(parent_dport); } @@ -1752,7 +1793,7 @@ static struct cxl_dport *find_or_add_dport(struct cxl_port *port, device_lock_assert(&port->dev); dport = cxl_find_dport_by_dev(port, dport_dev); if (!dport) { - dport = cxl_port_add_dport(port, dport_dev); + dport = probe_dport(port, dport_dev); if (IS_ERR(dport)) return dport; diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c index a90480d07c87..006c6ffc2f56 100644 --- a/drivers/cxl/core/ras.c +++ b/drivers/cxl/core/ras.c @@ -5,6 +5,7 @@ #include <linux/aer.h> #include <cxl/event.h> #include <cxlmem.h> +#include <cxlpci.h> #include "trace.h" static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev, @@ -125,3 +126,200 @@ void cxl_ras_exit(void) cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work); cancel_work_sync(&cxl_cper_prot_err_work); } + +static void cxl_dport_map_ras(struct cxl_dport *dport) +{ + struct cxl_register_map *map = &dport->reg_map; + struct device *dev = dport->dport_dev; + + if (!map->component_map.ras.valid) + dev_dbg(dev, "RAS registers not found\n"); + else if (cxl_map_component_regs(map, &dport->regs.component, + BIT(CXL_CM_CAP_CAP_ID_RAS))) + dev_dbg(dev, "Failed to map RAS capability.\n"); +} + +/** + * devm_cxl_dport_ras_setup - Setup CXL RAS report on this dport + * @dport: the cxl_dport that needs to be initialized + */ +void devm_cxl_dport_ras_setup(struct cxl_dport *dport) +{ + dport->reg_map.host = dport_to_host(dport); + cxl_dport_map_ras(dport); +} + +void devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport) +{ + struct pci_host_bridge *host_bridge; + + if (!dev_is_pci(dport->dport_dev)) + return; + + devm_cxl_dport_ras_setup(dport); + + host_bridge = to_pci_host_bridge(dport->dport_dev); + if (!host_bridge->native_aer) + return; + + cxl_dport_map_rch_aer(dport); + cxl_disable_rch_root_ints(dport); +} +EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_rch_ras_setup, "CXL"); + +void devm_cxl_port_ras_setup(struct cxl_port *port) +{ + struct cxl_register_map *map = &port->reg_map; + + if (!map->component_map.ras.valid) { + dev_dbg(&port->dev, "RAS registers not found\n"); + return; + } + + map->host = &port->dev; + if (cxl_map_component_regs(map, &port->regs, + BIT(CXL_CM_CAP_CAP_ID_RAS))) + dev_dbg(&port->dev, "Failed to map RAS capability\n"); +} +EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL"); + +void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) +{ + void __iomem *addr; + u32 status; + + if (!ras_base) + return; + + addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; + status = readl(addr); + if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); + trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status); + } +} + +/* CXL spec rev3.0 8.2.4.16.1 */ +static void header_log_copy(void __iomem *ras_base, u32 *log) +{ + void __iomem *addr; + u32 *log_addr; + int i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32); + + addr = ras_base + CXL_RAS_HEADER_LOG_OFFSET; + log_addr = log; + + for (i = 0; i < log_u32_size; i++) { + *log_addr = readl(addr); + log_addr++; + addr += sizeof(u32); + } +} + +/* + * Log the state of the RAS status registers and prepare them to log the + * next error status. Return 1 if reset needed. + */ +bool cxl_handle_ras(struct device *dev, void __iomem *ras_base) +{ + u32 hl[CXL_HEADERLOG_SIZE_U32]; + void __iomem *addr; + u32 status; + u32 fe; + + if (!ras_base) + return false; + + addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET; + status = readl(addr); + if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK)) + return false; + + /* If multiple errors, log header points to first error from ctrl reg */ + if (hweight32(status) > 1) { + void __iomem *rcc_addr = + ras_base + CXL_RAS_CAP_CONTROL_OFFSET; + + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK, + readl(rcc_addr))); + } else { + fe = status; + } + + header_log_copy(ras_base, hl); + trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl); + writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); + + return true; +} + +void cxl_cor_error_detected(struct pci_dev *pdev) +{ + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); + struct cxl_memdev *cxlmd = cxlds->cxlmd; + struct device *dev = &cxlds->cxlmd->dev; + + scoped_guard(device, dev) { + if (!dev->driver) { + dev_warn(&pdev->dev, + "%s: memdev disabled, abort error handling\n", + dev_name(dev)); + return; + } + + if (cxlds->rcd) + cxl_handle_rdport_errors(cxlds); + + cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras); + } +} +EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL"); + +pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); + struct cxl_memdev *cxlmd = cxlds->cxlmd; + struct device *dev = &cxlmd->dev; + bool ue; + + scoped_guard(device, dev) { + if (!dev->driver) { + dev_warn(&pdev->dev, + "%s: memdev disabled, abort error handling\n", + dev_name(dev)); + return PCI_ERS_RESULT_DISCONNECT; + } + + if (cxlds->rcd) + cxl_handle_rdport_errors(cxlds); + /* + * A frozen channel indicates an impending reset which is fatal to + * CXL.mem operation, and will likely crash the system. On the off + * chance the situation is recoverable dump the status of the RAS + * capability registers and bounce the active state of the memdev. + */ + ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras); + } + + switch (state) { + case pci_channel_io_normal: + if (ue) { + device_release_driver(dev); + return PCI_ERS_RESULT_NEED_RESET; + } + return PCI_ERS_RESULT_CAN_RECOVER; + case pci_channel_io_frozen: + dev_warn(&pdev->dev, + "%s: frozen state error detected, disable CXL.mem\n", + dev_name(dev)); + device_release_driver(dev); + return PCI_ERS_RESULT_NEED_RESET; + case pci_channel_io_perm_failure: + dev_warn(&pdev->dev, + "failure state error detected, request disconnect\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + return PCI_ERS_RESULT_NEED_RESET; +} +EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL"); diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c new file mode 100644 index 000000000000..0a8b3b9b6388 --- /dev/null +++ b/drivers/cxl/core/ras_rch.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2025 AMD Corporation. All rights reserved. */ + +#include <linux/types.h> +#include <linux/aer.h> +#include "cxl.h" +#include "core.h" +#include "cxlmem.h" + +void cxl_dport_map_rch_aer(struct cxl_dport *dport) +{ + resource_size_t aer_phys; + struct device *host; + u16 aer_cap; + + aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base); + if (aer_cap) { + host = dport->reg_map.host; + aer_phys = aer_cap + dport->rcrb.base; + dport->regs.dport_aer = + devm_cxl_iomap_block(host, aer_phys, + sizeof(struct aer_capability_regs)); + } +} + +void cxl_disable_rch_root_ints(struct cxl_dport *dport) +{ + void __iomem *aer_base = dport->regs.dport_aer; + u32 aer_cmd_mask, aer_cmd; + + if (!aer_base) + return; + + /* + * Disable RCH root port command interrupts. + * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors + * + * This sequence may not be necessary. CXL spec states disabling + * the root cmd register's interrupts is required. But, PCI spec + * shows these are disabled by default on reset. + */ + aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN | + PCI_ERR_ROOT_CMD_NONFATAL_EN | + PCI_ERR_ROOT_CMD_FATAL_EN); + aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND); + aer_cmd &= ~aer_cmd_mask; + writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND); +} + +/* + * Copy the AER capability registers using 32 bit read accesses. + * This is necessary because RCRB AER capability is MMIO mapped. Clear the + * status after copying. + * + * @aer_base: base address of AER capability block in RCRB + * @aer_regs: destination for copying AER capability + */ +static bool cxl_rch_get_aer_info(void __iomem *aer_base, + struct aer_capability_regs *aer_regs) +{ + int read_cnt = sizeof(struct aer_capability_regs) / sizeof(u32); + u32 *aer_regs_buf = (u32 *)aer_regs; + int n; + + if (!aer_base) + return false; + + /* Use readl() to guarantee 32-bit accesses */ + for (n = 0; n < read_cnt; n++) + aer_regs_buf[n] = readl(aer_base + n * sizeof(u32)); + + writel(aer_regs->uncor_status, aer_base + PCI_ERR_UNCOR_STATUS); + writel(aer_regs->cor_status, aer_base + PCI_ERR_COR_STATUS); + + return true; +} + +/* Get AER severity. Return false if there is no error. */ +static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs, + int *severity) +{ + if (aer_regs->uncor_status & ~aer_regs->uncor_mask) { + if (aer_regs->uncor_status & PCI_ERR_ROOT_FATAL_RCV) + *severity = AER_FATAL; + else + *severity = AER_NONFATAL; + return true; + } + + if (aer_regs->cor_status & ~aer_regs->cor_mask) { + *severity = AER_CORRECTABLE; + return true; + } + + return false; +} + +void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) +{ + struct pci_dev *pdev = to_pci_dev(cxlds->dev); + struct aer_capability_regs aer_regs; + struct cxl_dport *dport; + int severity; + + struct cxl_port *port __free(put_cxl_port) = + cxl_pci_find_port(pdev, &dport); + if (!port) + return; + + if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs)) + return; + + if (!cxl_rch_get_aer_severity(&aer_regs, &severity)) + return; + + pci_print_aer(pdev, severity, &aer_regs); + if (severity == AER_CORRECTABLE) + cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras); + else + cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras); +} diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 5bd1213737fa..08fa3deef70a 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -489,9 +489,9 @@ static ssize_t interleave_ways_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); - struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region_params *p = &cxlr->params; unsigned int val, save; int rc; @@ -552,9 +552,9 @@ static ssize_t interleave_granularity_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); - struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region_params *p = &cxlr->params; int rc, val; u16 ig; @@ -628,7 +628,7 @@ static DEVICE_ATTR_RO(mode); static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_region_params *p = &cxlr->params; struct resource *res; u64 remainder = 0; @@ -664,6 +664,8 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) return PTR_ERR(res); } + cxlr->hpa_range = DEFINE_RANGE(res->start, res->end); + p->res = res; p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; @@ -700,6 +702,8 @@ static int free_hpa(struct cxl_region *cxlr) if (p->state >= CXL_CONFIG_ACTIVE) return -EBUSY; + cxlr->hpa_range = DEFINE_RANGE(0, -1); + cxl_region_iomem_release(cxlr); p->state = CXL_CONFIG_IDLE; return 0; @@ -1093,14 +1097,16 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr, return 0; } -static void cxl_region_set_lock(struct cxl_region *cxlr, - struct cxl_decoder *cxld) +static void cxl_region_setup_flags(struct cxl_region *cxlr, + struct cxl_decoder *cxld) { - if (!test_bit(CXL_DECODER_F_LOCK, &cxld->flags)) - return; + if (test_bit(CXL_DECODER_F_LOCK, &cxld->flags)) { + set_bit(CXL_REGION_F_LOCK, &cxlr->flags); + clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); + } - set_bit(CXL_REGION_F_LOCK, &cxlr->flags); - clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); + if (test_bit(CXL_DECODER_F_NORMALIZED_ADDRESSING, &cxld->flags)) + set_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags); } /** @@ -1214,7 +1220,7 @@ static int cxl_port_attach_region(struct cxl_port *port, } } - cxl_region_set_lock(cxlr, cxld); + cxl_region_setup_flags(cxlr, cxld); rc = cxl_rr_ep_add(cxl_rr, cxled); if (rc) { @@ -1373,7 +1379,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; int parent_iw, parent_ig, ig, iw, rc, pos = cxled->pos; struct cxl_port *parent_port = to_cxl_port(port->dev.parent); struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr); @@ -1731,10 +1737,10 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, } static int cxl_region_attach_position(struct cxl_region *cxlr, - struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled, const struct cxl_dport *dport, int pos) { + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd; struct cxl_decoder *cxld = &cxlsd->cxld; @@ -1874,6 +1880,7 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range, /** * cxl_calc_interleave_pos() - calculate an endpoint position in a region * @cxled: endpoint decoder member of given region + * @hpa_range: translated HPA range of the endpoint * * The endpoint position is calculated by traversing the topology from * the endpoint to the root decoder and iteratively applying this @@ -1886,11 +1893,11 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range, * Return: position >= 0 on success * -ENXIO on failure */ -static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled) +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled, + struct range *hpa_range) { struct cxl_port *iter, *port = cxled_to_port(cxled); struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct range *range = &cxled->cxld.hpa_range; int parent_ways = 0, parent_pos = 0, pos = 0; int rc; @@ -1928,7 +1935,8 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled) if (is_cxl_root(iter)) break; - rc = find_pos_and_ways(iter, range, &parent_pos, &parent_ways); + rc = find_pos_and_ways(iter, hpa_range, &parent_pos, + &parent_ways); if (rc) return rc; @@ -1938,7 +1946,7 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled) dev_dbg(&cxlmd->dev, "decoder:%s parent:%s port:%s range:%#llx-%#llx pos:%d\n", dev_name(&cxled->cxld.dev), dev_name(cxlmd->dev.parent), - dev_name(&port->dev), range->start, range->end, pos); + dev_name(&port->dev), hpa_range->start, hpa_range->end, pos); return pos; } @@ -1951,7 +1959,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) for (i = 0; i < p->nr_targets; i++) { struct cxl_endpoint_decoder *cxled = p->targets[i]; - cxled->pos = cxl_calc_interleave_pos(cxled); + cxled->pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range); /* * Record that sorting failed, but still continue to calc * cxled->pos so that follow-on code paths can reliably @@ -1971,7 +1979,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_region_params *p = &cxlr->params; @@ -2076,8 +2084,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, ep_port = cxled_to_port(cxled); dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge); - rc = cxl_region_attach_position(cxlr, cxlrd, cxled, - dport, i); + rc = cxl_region_attach_position(cxlr, cxled, dport, i); if (rc) return rc; } @@ -2100,7 +2107,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, if (rc) return rc; - rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos); + rc = cxl_region_attach_position(cxlr, cxled, dport, pos); if (rc) return rc; @@ -2136,7 +2143,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled = p->targets[i]; int test_pos; - test_pos = cxl_calc_interleave_pos(cxled); + test_pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range); dev_dbg(&cxled->cxld.dev, "Test cxl_calc_interleave_pos(): %s test_pos:%d cxled->pos:%d\n", (test_pos == cxled->pos) ? "success" : "fail", @@ -2396,8 +2403,8 @@ static const struct attribute_group *region_groups[] = { static void cxl_region_release(struct device *dev) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; int id = atomic_read(&cxlrd->region_id); /* @@ -2454,6 +2461,8 @@ static void unregister_region(void *_cxlr) for (i = 0; i < p->interleave_ways; i++) detach_target(cxlr, i); + cxlr->hpa_range = DEFINE_RANGE(0, -1); + cxl_region_iomem_release(cxlr); put_device(&cxlr->dev); } @@ -2480,11 +2489,13 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i * region id allocations */ get_device(dev->parent); + cxlr->cxlrd = cxlrd; + cxlr->id = id; + device_set_pm_not_required(dev); dev->bus = &cxl_bus_type; dev->type = &cxl_region_type; - cxlr->id = id; - cxl_region_set_lock(cxlr, &cxlrd->cxlsd.cxld); + cxl_region_setup_flags(cxlr, &cxlrd->cxlsd.cxld); return cxlr; } @@ -3112,17 +3123,157 @@ u64 cxl_calculate_hpa_offset(u64 dpa_offset, int pos, u8 eiw, u16 eig) } EXPORT_SYMBOL_FOR_MODULES(cxl_calculate_hpa_offset, "cxl_translate"); +static int decode_pos(int region_ways, int hb_ways, int pos, int *pos_port, + int *pos_hb) +{ + int devices_per_hb; + + /* + * Decode for 3-6-12 way interleaves as defined in the CXL + * Spec 4.0 9.13.1.1 Legal Interleaving Configurations. + * Region creation should prevent invalid combinations but + * sanity check here to avoid a silent bad decode. + */ + switch (hb_ways) { + case 3: + if (region_ways != 3 && region_ways != 6 && region_ways != 12) + return -EINVAL; + break; + case 6: + if (region_ways != 6 && region_ways != 12) + return -EINVAL; + break; + case 12: + if (region_ways != 12) + return -EINVAL; + break; + default: + return -EINVAL; + } + /* + * Each host bridge contributes an equal number of endpoints + * that are laid out contiguously per host bridge. Modulo + * selects the port within a host bridge and division selects + * the host bridge position. + */ + devices_per_hb = region_ways / hb_ways; + *pos_port = pos % devices_per_hb; + *pos_hb = pos / devices_per_hb; + + return 0; +} + +/* + * restore_parent() reconstruct the address in parent + * + * This math, specifically the bitmask creation 'mask = gran - 1' relies + * on the CXL Spec requirement that interleave granularity is always a + * power of two. + * + * [mask] isolate the offset with the granularity + * [addr & ~mask] remove the offset leaving the aligned portion + * [* ways] distribute across all interleave ways + * [+ (pos * gran)] add the positional offset + * [+ (addr & mask)] restore the masked offset + */ +static u64 restore_parent(u64 addr, u64 pos, u64 gran, u64 ways) +{ + u64 mask = gran - 1; + + return ((addr & ~mask) * ways) + (pos * gran) + (addr & mask); +} + +/* + * unaligned_dpa_to_hpa() translates a DPA to HPA when the region resource + * start address is not aligned at Host Bridge Interleave Ways * 256MB. + * + * Unaligned start addresses only occur with MOD3 interleaves. All power- + * of-two interleaves are guaranteed aligned. + */ +static u64 unaligned_dpa_to_hpa(struct cxl_decoder *cxld, + struct cxl_region_params *p, int pos, u64 dpa) +{ + int ways_port = p->interleave_ways / cxld->interleave_ways; + int gran_port = p->interleave_granularity; + int gran_hb = cxld->interleave_granularity; + int ways_hb = cxld->interleave_ways; + int pos_port, pos_hb, gran_shift; + u64 hpa_port = 0; + + /* Decode an endpoint 'pos' into port and host-bridge components */ + if (decode_pos(p->interleave_ways, ways_hb, pos, &pos_port, &pos_hb)) { + dev_dbg(&cxld->dev, "not supported for region ways:%d\n", + p->interleave_ways); + return ULLONG_MAX; + } + + /* Restore the port parent address if needed */ + if (gran_hb != gran_port) + hpa_port = restore_parent(dpa, pos_port, gran_port, ways_port); + else + hpa_port = dpa; + + /* + * Complete the HPA reconstruction by restoring the address as if + * each HB position is a candidate. Test against expected pos_hb + * to confirm match. + */ + gran_shift = ilog2(gran_hb); + for (int position = 0; position < ways_hb; position++) { + u64 shifted, hpa; + + hpa = restore_parent(hpa_port, position, gran_hb, ways_hb); + hpa += p->res->start; + + shifted = hpa >> gran_shift; + if (do_div(shifted, ways_hb) == pos_hb) + return hpa; + } + + dev_dbg(&cxld->dev, "fail dpa:%#llx region:%pr pos:%d\n", dpa, p->res, + pos); + dev_dbg(&cxld->dev, " port-w/g/p:%d/%d/%d hb-w/g/p:%d/%d/%d\n", + ways_port, gran_port, pos_port, ways_hb, gran_hb, pos_hb); + + return ULLONG_MAX; +} + +static bool region_is_unaligned_mod3(struct cxl_region *cxlr) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct cxl_region_params *p = &cxlr->params; + int hbiw = cxld->interleave_ways; + u64 rem; + + if (is_power_of_2(hbiw)) + return false; + + div64_u64_rem(p->res->start, (u64)hbiw * SZ_256M, &rem); + + return (rem != 0); +} + u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region_params *p = &cxlr->params; struct cxl_endpoint_decoder *cxled = NULL; u64 base, dpa_offset, hpa_offset, hpa; + bool unaligned = false; u16 eig = 0; u8 eiw = 0; int pos; + /* + * Conversion between SPA and DPA is not supported in + * Normalized Address mode. + */ + if (test_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags)) + return ULLONG_MAX; + for (int i = 0; i < p->nr_targets; i++) { if (cxlmd == cxled_to_memdev(p->targets[i])) { cxled = p->targets[i]; @@ -3132,21 +3283,38 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, if (!cxled) return ULLONG_MAX; - pos = cxled->pos; - ways_to_eiw(p->interleave_ways, &eiw); - granularity_to_eig(p->interleave_granularity, &eig); - base = cxl_dpa_resource_start(cxled); if (base == RESOURCE_SIZE_MAX) return ULLONG_MAX; dpa_offset = dpa - base; + + /* Unaligned calc for MOD3 interleaves not hbiw * 256MB aligned */ + unaligned = region_is_unaligned_mod3(cxlr); + if (unaligned) { + hpa = unaligned_dpa_to_hpa(cxld, p, cxled->pos, dpa_offset); + if (hpa == ULLONG_MAX) + return ULLONG_MAX; + + goto skip_aligned; + } + /* + * Aligned calc for all power-of-2 interleaves and for MOD3 + * interleaves that are aligned at hbiw * 256MB + */ + pos = cxled->pos; + ways_to_eiw(p->interleave_ways, &eiw); + granularity_to_eig(p->interleave_granularity, &eig); + hpa_offset = cxl_calculate_hpa_offset(dpa_offset, pos, eiw, eig); if (hpa_offset == ULLONG_MAX) return ULLONG_MAX; /* Apply the hpa_offset to the region base address */ - hpa = hpa_offset + p->res->start + p->cache_size; + hpa = hpa_offset + p->res->start; + +skip_aligned: + hpa += p->cache_size; /* Root decoder translation overrides typical modulo decode */ if (cxlrd->ops.hpa_to_spa) @@ -3160,9 +3328,9 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, "Addr trans fail: hpa 0x%llx not in region\n", hpa); return ULLONG_MAX; } - - /* Simple chunk check, by pos & gran, only applies to modulo decodes */ - if (!cxlrd->ops.hpa_to_spa && !cxl_is_hpa_in_chunk(hpa, cxlr, pos)) + /* Chunk check applies to aligned modulo decodes only */ + if (!unaligned && !cxlrd->ops.hpa_to_spa && + !cxl_is_hpa_in_chunk(hpa, cxlr, pos)) return ULLONG_MAX; return hpa; @@ -3173,11 +3341,54 @@ struct dpa_result { u64 dpa; }; +static int unaligned_region_offset_to_dpa_result(struct cxl_region *cxlr, + u64 offset, + struct dpa_result *result) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct cxl_region_params *p = &cxlr->params; + u64 interleave_width, interleave_index; + u64 gran, gran_offset, dpa_offset; + u64 hpa = p->res->start + offset; + u64 tmp = offset; + + /* + * Unaligned addresses are not algebraically invertible. Calculate + * a dpa_offset independent of the target device and then enumerate + * and test that dpa_offset against each candidate endpoint decoder. + */ + gran = cxld->interleave_granularity; + interleave_width = gran * cxld->interleave_ways; + interleave_index = div64_u64(offset, interleave_width); + gran_offset = do_div(tmp, gran); + + dpa_offset = interleave_index * gran + gran_offset; + + for (int i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + int pos = cxled->pos; + u64 test_hpa; + + test_hpa = unaligned_dpa_to_hpa(cxld, p, pos, dpa_offset); + if (test_hpa == hpa) { + result->cxlmd = cxled_to_memdev(cxled); + result->dpa = + cxl_dpa_resource_start(cxled) + dpa_offset; + return 0; + } + } + dev_err(&cxlr->dev, + "failed to resolve HPA %#llx in unaligned MOD3 region\n", hpa); + + return -ENXIO; +} + static int region_offset_to_dpa_result(struct cxl_region *cxlr, u64 offset, struct dpa_result *result) { struct cxl_region_params *p = &cxlr->params; - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_endpoint_decoder *cxled; u64 hpa_offset = offset; u64 dpa, dpa_offset; @@ -3206,6 +3417,10 @@ static int region_offset_to_dpa_result(struct cxl_region *cxlr, u64 offset, hpa_offset -= p->res->start; } + if (region_is_unaligned_mod3(cxlr)) + return unaligned_region_offset_to_dpa_result(cxlr, offset, + result); + pos = cxl_calculate_position(hpa_offset, eiw, eig); if (pos < 0 || pos >= p->nr_targets) { dev_dbg(&cxlr->dev, "Invalid position %d for %d targets\n", @@ -3478,47 +3693,68 @@ err: return rc; } -static int match_decoder_by_range(struct device *dev, const void *data) +static int match_root_decoder(struct device *dev, const void *data) { const struct range *r1, *r2 = data; - struct cxl_decoder *cxld; + struct cxl_root_decoder *cxlrd; - if (!is_switch_decoder(dev)) + if (!is_root_decoder(dev)) return 0; - cxld = to_cxl_decoder(dev); - r1 = &cxld->hpa_range; + cxlrd = to_cxl_root_decoder(dev); + r1 = &cxlrd->cxlsd.cxld.hpa_range; + return range_contains(r1, r2); } -static struct cxl_decoder * -cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa) +static int cxl_root_setup_translation(struct cxl_root *cxl_root, + struct cxl_region_context *ctx) { - struct device *cxld_dev = device_find_child(&port->dev, hpa, - match_decoder_by_range); + if (!cxl_root->ops.translation_setup_root) + return 0; - return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL; + return cxl_root->ops.translation_setup_root(cxl_root, ctx); } +/* + * Note, when finished with the device, drop the reference with + * put_device() or use the put_cxl_root_decoder helper. + */ static struct cxl_root_decoder * -cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled) +get_cxl_root_decoder(struct cxl_endpoint_decoder *cxled, + struct cxl_region_context *ctx) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *port = cxled_to_port(cxled); struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port); - struct cxl_decoder *root, *cxld = &cxled->cxld; - struct range *hpa = &cxld->hpa_range; + struct device *cxlrd_dev; + int rc; + + /* + * Adjust the endpoint's HPA range and interleaving + * configuration to the root decoder’s memory space before + * setting up the root decoder. + */ + rc = cxl_root_setup_translation(cxl_root, ctx); + if (rc) { + dev_err(cxlmd->dev.parent, + "%s:%s Failed to setup translation for address range %#llx:%#llx\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + ctx->hpa_range.start, ctx->hpa_range.end); + return ERR_PTR(rc); + } - root = cxl_port_find_switch_decoder(&cxl_root->port, hpa); - if (!root) { + cxlrd_dev = device_find_child(&cxl_root->port.dev, &ctx->hpa_range, + match_root_decoder); + if (!cxlrd_dev) { dev_err(cxlmd->dev.parent, "%s:%s no CXL window for range %#llx:%#llx\n", - dev_name(&cxlmd->dev), dev_name(&cxld->dev), - cxld->hpa_range.start, cxld->hpa_range.end); - return NULL; + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + ctx->hpa_range.start, ctx->hpa_range.end); + return ERR_PTR(-ENXIO); } - return to_cxl_root_decoder(&root->dev); + return to_cxl_root_decoder(cxlrd_dev); } static int match_region_by_range(struct device *dev, const void *data) @@ -3540,7 +3776,7 @@ static int match_region_by_range(struct device *dev, const void *data) static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr, struct resource *res) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_region_params *p = &cxlr->params; resource_size_t size = resource_size(res); resource_size_t cache_size, start; @@ -3576,11 +3812,12 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr, } static int __construct_region(struct cxl_region *cxlr, - struct cxl_root_decoder *cxlrd, - struct cxl_endpoint_decoder *cxled) + struct cxl_region_context *ctx) { + struct cxl_endpoint_decoder *cxled = ctx->cxled; + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct range *hpa = &cxled->cxld.hpa_range; + struct range *hpa_range = &ctx->hpa_range; struct cxl_region_params *p; struct resource *res; int rc; @@ -3596,12 +3833,13 @@ static int __construct_region(struct cxl_region *cxlr, } set_bit(CXL_REGION_F_AUTO, &cxlr->flags); + cxlr->hpa_range = *hpa_range; res = kmalloc(sizeof(*res), GFP_KERNEL); if (!res) return -ENOMEM; - *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa), + *res = DEFINE_RES_MEM_NAMED(hpa_range->start, range_len(hpa_range), dev_name(&cxlr->dev)); rc = cxl_extended_linear_cache_resize(cxlr, res); @@ -3632,8 +3870,8 @@ static int __construct_region(struct cxl_region *cxlr, } p->res = res; - p->interleave_ways = cxled->cxld.interleave_ways; - p->interleave_granularity = cxled->cxld.interleave_granularity; + p->interleave_ways = ctx->interleave_ways; + p->interleave_granularity = ctx->interleave_granularity; p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group()); @@ -3653,8 +3891,9 @@ static int __construct_region(struct cxl_region *cxlr, /* Establish an empty region covering the given HPA range */ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, - struct cxl_endpoint_decoder *cxled) + struct cxl_region_context *ctx) { + struct cxl_endpoint_decoder *cxled = ctx->cxled; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *port = cxlrd_to_port(cxlrd); struct cxl_dev_state *cxlds = cxlmd->cxlds; @@ -3674,7 +3913,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, return cxlr; } - rc = __construct_region(cxlr, cxlrd, cxled); + rc = __construct_region(cxlr, ctx); if (rc) { devm_release_action(port->uport_dev, unregister_region, cxlr); return ERR_PTR(rc); @@ -3684,11 +3923,12 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, } static struct cxl_region * -cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa) +cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, + struct range *hpa_range) { struct device *region_dev; - region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa, + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa_range, match_region_by_range); if (!region_dev) return NULL; @@ -3698,25 +3938,34 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa) int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) { - struct range *hpa = &cxled->cxld.hpa_range; + struct cxl_region_context ctx; struct cxl_region_params *p; bool attach = false; int rc; + ctx = (struct cxl_region_context) { + .cxled = cxled, + .hpa_range = cxled->cxld.hpa_range, + .interleave_ways = cxled->cxld.interleave_ways, + .interleave_granularity = cxled->cxld.interleave_granularity, + }; + struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) = - cxl_find_root_decoder(cxled); - if (!cxlrd) - return -ENXIO; + get_cxl_root_decoder(cxled, &ctx); + + if (IS_ERR(cxlrd)) + return PTR_ERR(cxlrd); /* - * Ensure that if multiple threads race to construct_region() for @hpa - * one does the construction and the others add to that. + * Ensure that, if multiple threads race to construct_region() + * for the HPA range, one does the construction and the others + * add to that. */ mutex_lock(&cxlrd->range_lock); struct cxl_region *cxlr __free(put_cxl_region) = - cxl_find_region_by_range(cxlrd, hpa); + cxl_find_region_by_range(cxlrd, &ctx.hpa_range); if (!cxlr) - cxlr = construct_region(cxlrd, cxled); + cxlr = construct_region(cxlrd, &ctx); mutex_unlock(&cxlrd->range_lock); rc = PTR_ERR_OR_ZERO(cxlr); @@ -3891,6 +4140,39 @@ static int cxl_region_debugfs_poison_clear(void *data, u64 offset) DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL, cxl_region_debugfs_poison_clear, "%llx\n"); +static int cxl_region_setup_poison(struct cxl_region *cxlr) +{ + struct device *dev = &cxlr->dev; + struct cxl_region_params *p = &cxlr->params; + struct dentry *dentry; + + /* + * Do not enable poison injection in Normalized Address mode. + * Conversion between SPA and DPA is required for this, but it is + * not supported in this mode. + */ + if (test_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags)) + return 0; + + /* Create poison attributes if all memdevs support the capabilities */ + for (int i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + + if (!cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_INJECT) || + !cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_CLEAR)) + return 0; + } + + dentry = cxl_debugfs_create_dir(dev_name(dev)); + debugfs_create_file("inject_poison", 0200, dentry, cxlr, + &cxl_poison_inject_fops); + debugfs_create_file("clear_poison", 0200, dentry, cxlr, + &cxl_poison_clear_fops); + + return devm_add_action_or_reset(dev, remove_debugfs, dentry); +} + static int cxl_region_can_probe(struct cxl_region *cxlr) { struct cxl_region_params *p = &cxlr->params; @@ -3920,7 +4202,6 @@ static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; - bool poison_supported = true; int rc; rc = cxl_region_can_probe(cxlr); @@ -3944,30 +4225,9 @@ static int cxl_region_probe(struct device *dev) if (rc) return rc; - /* Create poison attributes if all memdevs support the capabilities */ - for (int i = 0; i < p->nr_targets; i++) { - struct cxl_endpoint_decoder *cxled = p->targets[i]; - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - - if (!cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_INJECT) || - !cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_CLEAR)) { - poison_supported = false; - break; - } - } - - if (poison_supported) { - struct dentry *dentry; - - dentry = cxl_debugfs_create_dir(dev_name(dev)); - debugfs_create_file("inject_poison", 0200, dentry, cxlr, - &cxl_poison_inject_fops); - debugfs_create_file("clear_poison", 0200, dentry, cxlr, - &cxl_poison_clear_fops); - rc = devm_add_action_or_reset(dev, remove_debugfs, dentry); - if (rc) - return rc; - } + rc = cxl_region_setup_poison(cxlr); + if (rc) + return rc; switch (cxlr->mode) { case CXL_PARTMODE_PMEM: diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c index 5ca7b0eed568..a010b3214342 100644 --- a/drivers/cxl/core/regs.c +++ b/drivers/cxl/core/regs.c @@ -271,10 +271,10 @@ EXPORT_SYMBOL_NS_GPL(cxl_map_device_regs, "CXL"); static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi, struct cxl_register_map *map) { - u8 reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo); - int bar = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BIR_MASK, reg_lo); + u8 reg_type = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID, reg_lo); + int bar = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BIR, reg_lo); u64 offset = ((u64)reg_hi << 32) | - (reg_lo & CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK); + (reg_lo & PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW); if (offset > pci_resource_len(pdev, bar)) { dev_warn(&pdev->dev, @@ -311,15 +311,15 @@ static int __cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_ty }; regloc = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL, - CXL_DVSEC_REG_LOCATOR); + PCI_DVSEC_CXL_REG_LOCATOR); if (!regloc) return -ENXIO; pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size); - regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size); + regloc_size = PCI_DVSEC_HEADER1_LEN(regloc_size); - regloc += CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET; - regblocks = (regloc_size - CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET) / 8; + regloc += PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1; + regblocks = (regloc_size - PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1) / 8; for (i = 0; i < regblocks; i++, regloc += 8) { u32 reg_lo, reg_hi; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index ba17fa86d249..04c673e7cdb0 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -332,7 +332,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport); #define CXL_DECODER_F_TYPE3 BIT(3) #define CXL_DECODER_F_LOCK BIT(4) #define CXL_DECODER_F_ENABLE BIT(5) -#define CXL_DECODER_F_MASK GENMASK(5, 0) +#define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6) enum cxl_decoder_type { CXL_DECODER_DEVMEM = 2, @@ -525,10 +525,19 @@ enum cxl_partition_mode { */ #define CXL_REGION_F_LOCK 2 +/* + * Indicate Normalized Addressing. Use it to disable SPA conversion if + * HPA != SPA and an address translation callback handler does not + * exist. Flag is needed by AMD Zen5 platforms. + */ +#define CXL_REGION_F_NORMALIZED_ADDRESSING 3 + /** * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions + * @cxlrd: Region's root decoder + * @hpa_range: Address range occupied by the region * @mode: Operational mode of the mapped capacity * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown @@ -542,6 +551,8 @@ enum cxl_partition_mode { struct cxl_region { struct device dev; int id; + struct cxl_root_decoder *cxlrd; + struct range hpa_range; enum cxl_partition_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; @@ -607,6 +618,7 @@ struct cxl_dax_region { * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids * @reg_map: component and ras register mapping parameters + * @regs: mapped component registers * @nr_dports: number of entries in @dports * @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering @@ -628,6 +640,7 @@ struct cxl_port { struct cxl_dport *parent_dport; struct ida decoder_ida; struct cxl_register_map reg_map; + struct cxl_component_regs regs; int nr_dports; int hdm_end; int commit_end; @@ -642,6 +655,15 @@ struct cxl_port { resource_size_t component_reg_phys; }; +struct cxl_root; + +struct cxl_root_ops { + int (*qos_class)(struct cxl_root *cxl_root, + struct access_coordinate *coord, int entries, + int *qos_class); + int (*translation_setup_root)(struct cxl_root *cxl_root, void *data); +}; + /** * struct cxl_root - logical collection of root cxl_port items * @@ -650,7 +672,7 @@ struct cxl_port { */ struct cxl_root { struct cxl_port port; - const struct cxl_root_ops *ops; + struct cxl_root_ops ops; }; static inline struct cxl_root * @@ -659,12 +681,6 @@ to_cxl_root(const struct cxl_port *port) return container_of(port, struct cxl_root, port); } -struct cxl_root_ops { - int (*qos_class)(struct cxl_root *cxl_root, - struct access_coordinate *coord, int entries, - int *qos_class); -}; - static inline struct cxl_dport * cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev) { @@ -778,8 +794,9 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport_dev, resource_size_t component_reg_phys, struct cxl_dport *parent_dport); -struct cxl_root *devm_cxl_add_root(struct device *host, - const struct cxl_root_ops *ops); +struct cxl_root *devm_cxl_add_root(struct device *host); +int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, + struct cxl_dport *parent_dport); struct cxl_root *find_cxl_root(struct cxl_port *port); DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev)) @@ -803,12 +820,11 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, struct device *dport_dev, int port_id, resource_size_t rcrb); -#ifdef CONFIG_PCIEAER_CXL -void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport); -void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host); +#ifdef CONFIG_CXL_ATL +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root); #else -static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport, - struct device *host) { } +static inline +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root) {} #endif struct cxl_decoder *to_cxl_decoder(struct device *dev); @@ -848,8 +864,11 @@ struct cxl_endpoint_dvsec_info { }; int devm_cxl_switch_port_decoders_setup(struct cxl_port *port); -int __devm_cxl_switch_port_decoders_setup(struct cxl_port *port); int devm_cxl_endpoint_decoders_setup(struct cxl_port *port); +void cxl_port_update_decoder_targets(struct cxl_port *port, + struct cxl_dport *dport); +int cxl_port_setup_regs(struct cxl_port *port, + resource_size_t component_reg_phys); struct cxl_dev_state; int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds, @@ -859,10 +878,18 @@ bool is_cxl_region(struct device *dev); extern const struct bus_type cxl_bus_type; +/* + * Note, add_dport() is expressly for the cxl_port driver. TODO: investigate a + * type-safe driver model where probe()/remove() take the type of object implied + * by @id and the add_dport() op only defined for the CXL_DEVICE_PORT driver + * template. + */ struct cxl_driver { const char *name; int (*probe)(struct device *dev); void (*remove)(struct device *dev); + struct cxl_dport *(*add_dport)(struct cxl_port *port, + struct device *dport_dev); struct device_driver drv; int id; }; @@ -895,7 +922,8 @@ struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host, struct cxl_port *port); struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev); bool is_cxl_nvdimm(struct device *dev); -int devm_cxl_add_nvdimm(struct cxl_port *parent_port, struct cxl_memdev *cxlmd); +int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port, + struct cxl_memdev *cxlmd); struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_port *port); #ifdef CONFIG_CXL_REGION @@ -946,8 +974,6 @@ void cxl_coordinates_combine(struct access_coordinate *out, bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port); struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port, struct device *dport_dev); -struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port, - struct device *dport_dev); /* * Unit test builds overrides this to __weak, find the 'strong' version @@ -959,20 +985,4 @@ struct cxl_dport *__devm_cxl_add_dport_by_dev(struct cxl_port *port, u16 cxl_gpf_get_dvsec(struct device *dev); -/* - * Declaration for functions that are mocked by cxl_test that are called by - * cxl_core. The respective functions are defined as __foo() and called by - * cxl_core as foo(). The macros below ensures that those functions would - * exist as foo(). See tools/testing/cxl/cxl_core_exports.c and - * tools/testing/cxl/exports.h for setting up the mock functions. The dance - * is done to avoid a circular dependency where cxl_core calls a function that - * ends up being a mock function and goes to * cxl_test where it calls a - * cxl_core function. - */ -#ifndef CXL_TEST_ENABLE -#define DECLARE_TESTABLE(x) __##x -#define devm_cxl_add_dport_by_dev DECLARE_TESTABLE(devm_cxl_add_dport_by_dev) -#define devm_cxl_switch_port_decoders_setup DECLARE_TESTABLE(devm_cxl_switch_port_decoders_setup) -#endif - #endif /* __CXL_H__ */ diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 434031a0c1f7..e21d744d639b 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -34,6 +34,10 @@ (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \ CXLMDEV_RESET_NEEDED_NOT) +struct cxl_memdev_attach { + int (*probe)(struct cxl_memdev *cxlmd); +}; + /** * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device * @dev: driver core device object @@ -43,6 +47,7 @@ * @cxl_nvb: coordinate removal of @cxl_nvd if present * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem * @endpoint: connection to the CXL port topology for this memory device + * @attach: creator of this memdev depends on CXL link attach to operate * @id: id number of this memdev instance. * @depth: endpoint port depth * @scrub_cycle: current scrub cycle set for this device @@ -59,11 +64,12 @@ struct cxl_memdev { struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_nvdimm *cxl_nvd; struct cxl_port *endpoint; + const struct cxl_memdev_attach *attach; int id; int depth; u8 scrub_cycle; int scrub_region_id; - void *err_rec_array; + struct cxl_mem_err_rec *err_rec_array; }; static inline struct cxl_memdev *to_cxl_memdev(struct device *dev) @@ -95,8 +101,10 @@ static inline bool is_cxl_endpoint(struct cxl_port *port) return is_cxl_memdev(port->uport_dev); } -struct cxl_memdev *devm_cxl_add_memdev(struct device *host, - struct cxl_dev_state *cxlds); +struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds, + const struct cxl_memdev_attach *attach); +struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds, + const struct cxl_memdev_attach *attach); int devm_cxl_sanitize_setup_notifier(struct device *host, struct cxl_memdev *cxlmd); struct cxl_memdev_state; @@ -415,7 +423,7 @@ struct cxl_dpa_partition { * @dev: The device associated with this CXL state * @cxlmd: The device representing the CXL.mem capabilities of @dev * @reg_map: component and ras register mapping parameters - * @regs: Parsed register blocks + * @regs: Class device "Device" registers * @cxl_dvsec: Offset to the PCIe device DVSEC * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) * @media_ready: Indicate whether the device media is usable @@ -431,7 +439,7 @@ struct cxl_dev_state { struct device *dev; struct cxl_memdev *cxlmd; struct cxl_register_map reg_map; - struct cxl_regs regs; + struct cxl_device_regs regs; int cxl_dvsec; bool rcd; bool media_ready; @@ -877,7 +885,6 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd); int devm_cxl_region_edac_register(struct cxl_region *cxlr); int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt); int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt); -void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd); #else static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) { return 0; } @@ -889,8 +896,6 @@ static inline int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, static inline int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt) { return 0; } -static inline void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd) -{ return; } #endif #ifdef CONFIG_CXL_SUSPEND diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h index 1d526bea8431..0cf64218aa16 100644 --- a/drivers/cxl/cxlpci.h +++ b/drivers/cxl/cxlpci.h @@ -8,59 +8,6 @@ #define CXL_MEMORY_PROGIF 0x10 /* - * See section 8.1 Configuration Space Registers in the CXL 2.0 - * Specification. Names are taken straight from the specification with "CXL" and - * "DVSEC" redundancies removed. When obvious, abbreviations may be used. - */ -#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20) - -/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */ -#define CXL_DVSEC_PCIE_DEVICE 0 -#define CXL_DVSEC_CAP_OFFSET 0xA -#define CXL_DVSEC_MEM_CAPABLE BIT(2) -#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4) -#define CXL_DVSEC_CTRL_OFFSET 0xC -#define CXL_DVSEC_MEM_ENABLE BIT(2) -#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10)) -#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10)) -#define CXL_DVSEC_MEM_INFO_VALID BIT(0) -#define CXL_DVSEC_MEM_ACTIVE BIT(1) -#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28) -#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10)) -#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10)) -#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28) - -#define CXL_DVSEC_RANGE_MAX 2 - -/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */ -#define CXL_DVSEC_FUNCTION_MAP 2 - -/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */ -#define CXL_DVSEC_PORT_EXTENSIONS 3 - -/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */ -#define CXL_DVSEC_PORT_GPF 4 -#define CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET 0x0C -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK GENMASK(3, 0) -#define CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK GENMASK(11, 8) -#define CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET 0xE -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK GENMASK(3, 0) -#define CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK GENMASK(11, 8) - -/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */ -#define CXL_DVSEC_DEVICE_GPF 5 - -/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */ -#define CXL_DVSEC_PCIE_FLEXBUS_PORT 7 - -/* CXL 2.0 8.1.9: Register Locator DVSEC */ -#define CXL_DVSEC_REG_LOCATOR 8 -#define CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET 0xC -#define CXL_DVSEC_REG_LOCATOR_BIR_MASK GENMASK(2, 0) -#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8) -#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16) - -/* * NOTE: Currently all the functions which are enabled for CXL require their * vectors to be in the first 16. Use this as the default max. */ @@ -129,7 +76,29 @@ static inline bool cxl_pci_flit_256(struct pci_dev *pdev) struct cxl_dev_state; void read_cdat_data(struct cxl_port *port); + +#ifdef CONFIG_CXL_RAS void cxl_cor_error_detected(struct pci_dev *pdev); pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, pci_channel_state_t state); +void devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport); +void devm_cxl_port_ras_setup(struct cxl_port *port); +#else +static inline void cxl_cor_error_detected(struct pci_dev *pdev) { } + +static inline pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + return PCI_ERS_RESULT_NONE; +} + +static inline void devm_cxl_dport_rch_ras_setup(struct cxl_dport *dport) +{ +} + +static inline void devm_cxl_port_ras_setup(struct cxl_port *port) +{ +} +#endif + #endif /* __CXL_PCI_H__ */ diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 6e6777b7bafb..fcffe24dcb42 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -45,44 +45,6 @@ static int cxl_mem_dpa_show(struct seq_file *file, void *data) return 0; } -static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, - struct cxl_dport *parent_dport) -{ - struct cxl_port *parent_port = parent_dport->port; - struct cxl_port *endpoint, *iter, *down; - int rc; - - /* - * Now that the path to the root is established record all the - * intervening ports in the chain. - */ - for (iter = parent_port, down = NULL; !is_cxl_root(iter); - down = iter, iter = to_cxl_port(iter->dev.parent)) { - struct cxl_ep *ep; - - ep = cxl_ep_load(iter, cxlmd); - ep->next = down; - } - - /* Note: endpoint port component registers are derived from @cxlds */ - endpoint = devm_cxl_add_port(host, &cxlmd->dev, CXL_RESOURCE_NONE, - parent_dport); - if (IS_ERR(endpoint)) - return PTR_ERR(endpoint); - - rc = cxl_endpoint_autoremove(cxlmd, endpoint); - if (rc) - return rc; - - if (!endpoint->dev.driver) { - dev_err(&cxlmd->dev, "%s failed probe\n", - dev_name(&endpoint->dev)); - return -ENXIO; - } - - return 0; -} - static int cxl_debugfs_poison_inject(void *data, u64 dpa) { struct cxl_memdev *cxlmd = data; @@ -153,7 +115,7 @@ static int cxl_mem_probe(struct device *dev) } if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) { - rc = devm_cxl_add_nvdimm(parent_port, cxlmd); + rc = devm_cxl_add_nvdimm(dev, parent_port, cxlmd); if (rc) { if (rc == -ENODEV) dev_info(dev, "PMEM disabled by platform\n"); @@ -166,8 +128,6 @@ static int cxl_mem_probe(struct device *dev) else endpoint_parent = &parent_port->dev; - cxl_dport_init_ras_reporting(dport, dev); - scoped_guard(device, endpoint_parent) { if (!endpoint_parent->driver) { dev_err(dev, "CXL port topology %s not enabled\n", @@ -180,6 +140,12 @@ static int cxl_mem_probe(struct device *dev) return rc; } + if (cxlmd->attach) { + rc = cxlmd->attach->probe(cxlmd); + if (rc) + return rc; + } + rc = devm_cxl_memdev_edac_register(cxlmd); if (rc) dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc); @@ -201,6 +167,29 @@ static int cxl_mem_probe(struct device *dev) return devm_add_action_or_reset(dev, enable_suspend, NULL); } +/** + * devm_cxl_add_memdev - Add a CXL memory device + * @cxlds: CXL device state to associate with the memdev + * @attach: Caller depends on CXL topology attachment + * + * Upon return the device will have had a chance to attach to the + * cxl_mem driver, but may fail to attach if the CXL topology is not ready + * (hardware CXL link down, or software platform CXL root not attached). + * + * When @attach is NULL it indicates the caller wants the memdev to remain + * registered even if it does not immediately attach to the CXL hierarchy. When + * @attach is provided a cxl_mem_probe() failure leads to failure of this routine. + * + * The parent of the resulting device and the devm context for allocations is + * @cxlds->dev. + */ +struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds, + const struct cxl_memdev_attach *attach) +{ + return __devm_cxl_add_memdev(cxlds, attach); +} +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL"); + static ssize_t trigger_poison_list_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) @@ -248,6 +237,7 @@ static struct cxl_driver cxl_mem_driver = { .probe = cxl_mem_probe, .id = CXL_DEVICE_MEMORY_EXPANDER, .drv = { + .probe_type = PROBE_FORCE_SYNCHRONOUS, .dev_groups = cxl_mem_groups, }, }; @@ -258,8 +248,3 @@ MODULE_DESCRIPTION("CXL: Memory Expansion"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS("CXL"); MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER); -/* - * create_endpoint() wants to validate port driver attach immediately after - * endpoint registration. - */ -MODULE_SOFTDEP("pre: cxl_port"); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 0be4e508affe..fbb300a01830 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -535,52 +535,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type, return cxl_setup_regs(map); } -static int cxl_pci_ras_unmask(struct pci_dev *pdev) -{ - struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); - void __iomem *addr; - u32 orig_val, val, mask; - u16 cap; - int rc; - - if (!cxlds->regs.ras) { - dev_dbg(&pdev->dev, "No RAS registers.\n"); - return 0; - } - - /* BIOS has PCIe AER error control */ - if (!pcie_aer_is_native(pdev)) - return 0; - - rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap); - if (rc) - return rc; - - if (cap & PCI_EXP_DEVCTL_URRE) { - addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET; - orig_val = readl(addr); - - mask = CXL_RAS_UNCORRECTABLE_MASK_MASK | - CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK; - val = orig_val & ~mask; - writel(val, addr); - dev_dbg(&pdev->dev, - "Uncorrectable RAS Errors Mask: %#x -> %#x\n", - orig_val, val); - } - - if (cap & PCI_EXP_DEVCTL_CERE) { - addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET; - orig_val = readl(addr); - val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK; - writel(val, addr); - dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n", - orig_val, val); - } - - return 0; -} - static void free_event_buf(void *buf) { kvfree(buf); @@ -912,13 +866,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) unsigned int i; bool irq_avail; - /* - * Double check the anonymous union trickery in struct cxl_regs - * FIXME switch to struct_group() - */ - BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) != - offsetof(struct cxl_regs, device_regs.memdev)); - rc = pcim_enable_device(pdev); if (rc) return rc; @@ -933,7 +880,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) cxlds->rcd = is_cxl_restricted(pdev); cxlds->serial = pci_get_dsn(pdev); cxlds->cxl_dvsec = pci_find_dvsec_capability( - pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE); + pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE); if (!cxlds->cxl_dvsec) dev_warn(&pdev->dev, "Device DVSEC not present, skip CXL.mem init\n"); @@ -942,7 +889,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - rc = cxl_map_device_regs(&map, &cxlds->regs.device_regs); + rc = cxl_map_device_regs(&map, &cxlds->regs); if (rc) return rc; @@ -957,11 +904,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) else if (!cxlds->reg_map.component_map.ras.valid) dev_dbg(&pdev->dev, "RAS registers not found\n"); - rc = cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component, - BIT(CXL_CM_CAP_CAP_ID_RAS)); - if (rc) - dev_dbg(&pdev->dev, "Failed to map RAS capability.\n"); - rc = cxl_pci_type3_init_mailbox(cxlds); if (rc) return rc; @@ -1006,7 +948,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) dev_dbg(&pdev->dev, "No CXL Features discovered\n"); - cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds); + cxlmd = devm_cxl_add_memdev(cxlds, NULL); if (IS_ERR(cxlmd)) return PTR_ERR(cxlmd); @@ -1052,9 +994,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - if (cxl_pci_ras_unmask(pdev)) - dev_dbg(&pdev->dev, "No RAS reporting unmasked\n"); - pci_save_state(pdev); return rc; diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 51c8f2f84717..ada51948d52f 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ +#include <linux/aer.h> #include <linux/device.h> #include <linux/module.h> #include <linux/slab.h> @@ -68,9 +69,59 @@ static int cxl_switch_port_probe(struct cxl_port *port) return 0; } +static int cxl_ras_unmask(struct cxl_port *port) +{ + struct pci_dev *pdev; + void __iomem *addr; + u32 orig_val, val, mask; + u16 cap; + int rc; + + if (!dev_is_pci(port->uport_dev)) + return 0; + pdev = to_pci_dev(port->uport_dev); + + if (!port->regs.ras) { + pci_dbg(pdev, "No RAS registers.\n"); + return 0; + } + + /* BIOS has PCIe AER error control */ + if (!pcie_aer_is_native(pdev)) + return 0; + + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap); + if (rc) + return rc; + + if (cap & PCI_EXP_DEVCTL_URRE) { + addr = port->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET; + orig_val = readl(addr); + + mask = CXL_RAS_UNCORRECTABLE_MASK_MASK | + CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK; + val = orig_val & ~mask; + writel(val, addr); + pci_dbg(pdev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n", + orig_val, val); + } + + if (cap & PCI_EXP_DEVCTL_CERE) { + addr = port->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET; + orig_val = readl(addr); + val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK; + writel(val, addr); + pci_dbg(pdev, "Correctable RAS Errors Mask: %#x -> %#x\n", + orig_val, val); + } + + return 0; +} + static int cxl_endpoint_port_probe(struct cxl_port *port) { struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); + struct cxl_dport *dport = port->parent_dport; int rc; /* Cache the data early to ensure is_visible() works */ @@ -87,6 +138,21 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) return rc; /* + * With VH (CXL Virtual Host) topology the cxl_port::add_dport() method + * handles RAS setup for downstream ports. With RCH (CXL Restricted CXL + * Host) topologies the downstream port is enumerated early by platform + * firmware, but the RCRB (root complex register block) is not mapped + * until after the cxl_pci driver attaches to the RCIeP (root complex + * integrated endpoint). + */ + if (dport->rch) + devm_cxl_dport_rch_ras_setup(dport); + + devm_cxl_port_ras_setup(port); + if (cxl_ras_unmask(port)) + dev_dbg(&port->dev, "failed to unmask RAS interrupts\n"); + + /* * Now that all endpoint decoders are successfully enumerated, try to * assemble regions from committed decoders */ @@ -151,15 +217,111 @@ static const struct attribute_group *cxl_port_attribute_groups[] = { NULL, }; +/* note this implicitly casts the group back to its @port */ +DEFINE_FREE(cxl_port_release_dr_group, struct cxl_port *, + if (_T) devres_release_group(&_T->dev, _T)) + +static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port, + struct device *dport_dev) +{ + struct cxl_dport *dport; + int rc; + + /* Temp group for all "first dport" and "per dport" setup actions */ + void *port_dr_group __free(cxl_port_release_dr_group) = + devres_open_group(&port->dev, port, GFP_KERNEL); + if (!port_dr_group) + return ERR_PTR(-ENOMEM); + + if (port->nr_dports == 0) { + /* + * Some host bridges are known to not have component regsisters + * available until a root port has trained CXL. Perform that + * setup now. + */ + rc = cxl_port_setup_regs(port, port->component_reg_phys); + if (rc) + return ERR_PTR(rc); + + rc = devm_cxl_switch_port_decoders_setup(port); + if (rc) + return ERR_PTR(rc); + + /* + * RAS setup is optional, either driver operation can continue + * on failure, or the device does not implement RAS registers. + */ + devm_cxl_port_ras_setup(port); + } + + dport = devm_cxl_add_dport_by_dev(port, dport_dev); + if (IS_ERR(dport)) + return dport; + + /* This group was only needed for early exit above */ + devres_remove_group(&port->dev, no_free_ptr(port_dr_group)); + + cxl_switch_parse_cdat(dport); + + /* New dport added, update the decoder targets */ + cxl_port_update_decoder_targets(port, dport); + + dev_dbg(&port->dev, "dport%d:%s added\n", dport->port_id, + dev_name(dport_dev)); + + return dport; +} + static struct cxl_driver cxl_port_driver = { .name = "cxl_port", .probe = cxl_port_probe, + .add_dport = cxl_port_add_dport, .id = CXL_DEVICE_PORT, .drv = { + .probe_type = PROBE_FORCE_SYNCHRONOUS, .dev_groups = cxl_port_attribute_groups, }, }; +int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, + struct cxl_dport *parent_dport) +{ + struct cxl_port *parent_port = parent_dport->port; + struct cxl_port *endpoint, *iter, *down; + int rc; + + /* + * Now that the path to the root is established record all the + * intervening ports in the chain. + */ + for (iter = parent_port, down = NULL; !is_cxl_root(iter); + down = iter, iter = to_cxl_port(iter->dev.parent)) { + struct cxl_ep *ep; + + ep = cxl_ep_load(iter, cxlmd); + ep->next = down; + } + + /* Note: endpoint port component registers are derived from @cxlds */ + endpoint = devm_cxl_add_port(host, &cxlmd->dev, CXL_RESOURCE_NONE, + parent_dport); + if (IS_ERR(endpoint)) + return PTR_ERR(endpoint); + + rc = cxl_endpoint_autoremove(cxlmd, endpoint); + if (rc) + return rc; + + if (!endpoint->dev.driver) { + dev_err(&cxlmd->dev, "%s failed probe\n", + dev_name(&endpoint->dev)); + return -ENXIO; + } + + return 0; +} +EXPORT_SYMBOL_FOR_MODULES(devm_cxl_add_endpoint, "cxl_mem"); + static int __init cxl_port_init(void) { return cxl_driver_register(&cxl_port_driver); diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c index f9e1a76a04a9..56e3cbd181b5 100644 --- a/drivers/dax/hmem/device.c +++ b/drivers/dax/hmem/device.c @@ -83,8 +83,7 @@ static __init int hmem_register_one(struct resource *res, void *data) static __init int hmem_init(void) { - walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED, - IORESOURCE_MEM, 0, -1, NULL, hmem_register_one); + walk_soft_reserve_res(0, -1, NULL, hmem_register_one); return 0; } diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index c18451a37e4f..1cf7c2a0ee1c 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -73,11 +73,12 @@ static int hmem_register_device(struct device *host, int target_nid, return 0; } - rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM, - IORES_DESC_SOFT_RESERVED); + rc = region_intersects_soft_reserve(res->start, resource_size(res)); if (rc != REGION_INTERSECTS) return 0; + /* TODO: Add Soft-Reserved memory back to iomem */ + id = memregion_alloc(GFP_KERNEL); if (id < 0) { dev_err(host, "memregion allocation failure for %pr\n", res); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index c8a0522e2e1f..13d998fbacce 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -816,31 +816,56 @@ static inline bool pci_dev_binding_disallowed(struct pci_dev *dev) #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */ +/** + * struct aer_err_info - AER Error Information + * @dev: Devices reporting error + * @ratelimit_print: Flag to log or not log the devices' error. 0=NotLog/1=Log + * @__pad1: Padding for alignment + * @error_dev_num: Number of devices reporting an error + * @level: printk level to use in logging + * @id: Value from register PCI_ERR_ROOT_ERR_SRC + * @severity: AER severity, 0-UNCOR Non-fatal, 1-UNCOR fatal, 2-COR + * @root_ratelimit_print: Flag to log or not log the root's error. 0=NotLog/1=Log + * @multi_error_valid: If multiple errors are reported + * @first_error: First reported error + * @__pad2: Padding for alignment + * @is_cxl: Bus type error: 0-PCI Bus error, 1-CXL Bus error + * @tlp_header_valid: Indicates if TLP field contains error information + * @status: COR/UNCOR error status + * @mask: COR/UNCOR mask + * @tlp: Transaction packet information + */ struct aer_err_info { struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES]; int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES]; int error_dev_num; - const char *level; /* printk level */ + const char *level; unsigned int id:16; - unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */ - unsigned int root_ratelimit_print:1; /* 0=skip, 1=print */ + unsigned int severity:2; + unsigned int root_ratelimit_print:1; unsigned int __pad1:4; unsigned int multi_error_valid:1; unsigned int first_error:5; - unsigned int __pad2:2; + unsigned int __pad2:1; + unsigned int is_cxl:1; unsigned int tlp_header_valid:1; - unsigned int status; /* COR/UNCOR Error Status */ - unsigned int mask; /* COR/UNCOR Error Mask */ - struct pcie_tlp_log tlp; /* TLP Header */ + unsigned int status; + unsigned int mask; + struct pcie_tlp_log tlp; }; int aer_get_device_error_info(struct aer_err_info *info, int i); void aer_print_error(struct aer_err_info *info, int i); +static inline const char *aer_err_bus(struct aer_err_info *info) +{ + return info->is_cxl ? "CXL" : "PCIe"; +} + int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2, unsigned int tlp_len, bool flit, struct pcie_tlp_log *log); diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig index 17919b99fa66..207c2deae35f 100644 --- a/drivers/pci/pcie/Kconfig +++ b/drivers/pci/pcie/Kconfig @@ -49,15 +49,6 @@ config PCIEAER_INJECT gotten from: https://github.com/intel/aer-inject.git -config PCIEAER_CXL - bool "PCI Express CXL RAS support" - default y - depends on PCIEAER && CXL_PCI - help - Enables CXL error handling. - - If unsure, say Y. - # # PCI Express ECRC # diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile index 173829aa02e6..b0b43a18c304 100644 --- a/drivers/pci/pcie/Makefile +++ b/drivers/pci/pcie/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o bwctrl.o obj-y += aspm.o obj-$(CONFIG_PCIEAER) += aer.o err.o tlp.o +obj-$(CONFIG_CXL_RAS) += aer_cxl_rch.o obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o obj-$(CONFIG_PCIE_PME) += pme.o obj-$(CONFIG_PCIE_DPC) += dpc.o diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 2b0ac6bfdd76..8dfbb0fe6cf6 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -867,6 +867,7 @@ void aer_print_error(struct aer_err_info *info, int i) struct pci_dev *dev; int layer, agent, id; const char *level = info->level; + const char *bus_type = aer_err_bus(info); if (WARN_ON_ONCE(i >= AER_MAX_MULTI_ERR_DEVICES)) return; @@ -876,22 +877,22 @@ void aer_print_error(struct aer_err_info *info, int i) pci_dev_aer_stats_incr(dev, info); trace_aer_event(pci_name(dev), (info->status & ~info->mask), - info->severity, info->tlp_header_valid, &info->tlp); + info->severity, info->tlp_header_valid, &info->tlp, bus_type); if (!info->ratelimit_print[i]) return; if (!info->status) { - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", - aer_error_severity_string[info->severity]); + pci_err(dev, "%s Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", + bus_type, aer_error_severity_string[info->severity]); goto out; } layer = AER_GET_LAYER_ERROR(info->severity, info->status); agent = AER_GET_AGENT(info->severity, info->status); - aer_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n", - aer_error_severity_string[info->severity], + aer_printk(level, dev, "%s Bus Error: severity=%s, type=%s, (%s)\n", + bus_type, aer_error_severity_string[info->severity], aer_error_layer[layer], aer_agent_string[agent]); aer_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", @@ -925,6 +926,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer); void pci_print_aer(struct pci_dev *dev, int aer_severity, struct aer_capability_regs *aer) { + const char *bus_type; int layer, agent, tlp_header_valid = 0; u32 status, mask; struct aer_err_info info = { @@ -945,10 +947,13 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, info.status = status; info.mask = mask; + info.is_cxl = pcie_is_cxl(dev); + + bus_type = aer_err_bus(&info); pci_dev_aer_stats_incr(dev, &info); - trace_aer_event(pci_name(dev), (status & ~mask), - aer_severity, tlp_header_valid, &aer->header_log); + trace_aer_event(pci_name(dev), (status & ~mask), aer_severity, + tlp_header_valid, &aer->header_log, bus_type); if (!aer_ratelimit(dev, info.severity)) return; @@ -1117,8 +1122,6 @@ static bool find_source_device(struct pci_dev *parent, return true; } -#ifdef CONFIG_PCIEAER_CXL - /** * pci_aer_unmask_internal_errors - unmask internal errors * @dev: pointer to the pci_dev data structure @@ -1129,7 +1132,7 @@ static bool find_source_device(struct pci_dev *parent, * Note: AER must be enabled and supported by the device which must be * checked in advance, e.g. with pcie_aer_is_native(). */ -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) +void pci_aer_unmask_internal_errors(struct pci_dev *dev) { int aer = dev->aer_cap; u32 mask; @@ -1143,117 +1146,20 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); } -static bool is_cxl_mem_dev(struct pci_dev *dev) -{ - /* - * The capability, status, and control fields in Device 0, - * Function 0 DVSEC control the CXL functionality of the - * entire device (CXL 3.0, 8.1.3). - */ - if (dev->devfn != PCI_DEVFN(0, 0)) - return false; - - /* - * CXL Memory Devices must have the 502h class code set (CXL - * 3.0, 8.1.12.1). - */ - if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL) - return false; - - return true; -} - -static bool cxl_error_is_native(struct pci_dev *dev) -{ - struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); - - return (pcie_ports_native || host->native_aer); -} +/* + * Internal errors are too device-specific to enable generally, however for CXL + * their behavior is standardized for conveying CXL protocol errors. + */ +EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core"); -static bool is_internal_error(struct aer_err_info *info) +#ifdef CONFIG_CXL_RAS +bool is_aer_internal_error(struct aer_err_info *info) { if (info->severity == AER_CORRECTABLE) return info->status & PCI_ERR_COR_INTERNAL; return info->status & PCI_ERR_UNC_INTN; } - -static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data) -{ - struct aer_err_info *info = (struct aer_err_info *)data; - const struct pci_error_handlers *err_handler; - - if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev)) - return 0; - - /* Protect dev->driver */ - device_lock(&dev->dev); - - err_handler = dev->driver ? dev->driver->err_handler : NULL; - if (!err_handler) - goto out; - - if (info->severity == AER_CORRECTABLE) { - if (err_handler->cor_error_detected) - err_handler->cor_error_detected(dev); - } else if (err_handler->error_detected) { - if (info->severity == AER_NONFATAL) - err_handler->error_detected(dev, pci_channel_io_normal); - else if (info->severity == AER_FATAL) - err_handler->error_detected(dev, pci_channel_io_frozen); - } -out: - device_unlock(&dev->dev); - return 0; -} - -static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) -{ - /* - * Internal errors of an RCEC indicate an AER error in an - * RCH's downstream port. Check and handle them in the CXL.mem - * device driver. - */ - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC && - is_internal_error(info)) - pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); -} - -static int handles_cxl_error_iter(struct pci_dev *dev, void *data) -{ - bool *handles_cxl = data; - - if (!*handles_cxl) - *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev); - - /* Non-zero terminates iteration */ - return *handles_cxl; -} - -static bool handles_cxl_errors(struct pci_dev *rcec) -{ - bool handles_cxl = false; - - if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC && - pcie_aer_is_native(rcec)) - pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl); - - return handles_cxl; -} - -static void cxl_rch_enable_rcec(struct pci_dev *rcec) -{ - if (!handles_cxl_errors(rcec)) - return; - - pci_aer_unmask_internal_errors(rcec); - pci_info(rcec, "CXL: Internal errors unmasked"); -} - -#else -static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { } -static inline void cxl_rch_handle_error(struct pci_dev *dev, - struct aer_err_info *info) { } #endif /** @@ -1402,6 +1308,7 @@ int aer_get_device_error_info(struct aer_err_info *info, int i) /* Must reset in this function */ info->status = 0; info->tlp_header_valid = 0; + info->is_cxl = pcie_is_cxl(dev); /* The device might not support AER */ if (!aer) diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c new file mode 100644 index 000000000000..e471eefec9c4 --- /dev/null +++ b/drivers/pci/pcie/aer_cxl_rch.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2023 AMD Corporation. All rights reserved. */ + +#include <linux/pci.h> +#include <linux/aer.h> +#include <linux/bitfield.h> +#include "../pci.h" +#include "portdrv.h" + +static bool is_cxl_mem_dev(struct pci_dev *dev) +{ + /* + * The capability, status, and control fields in Device 0, + * Function 0 DVSEC control the CXL functionality of the + * entire device (CXL 3.0, 8.1.3). + */ + if (dev->devfn != PCI_DEVFN(0, 0)) + return false; + + /* + * CXL Memory Devices must have the 502h class code set (CXL + * 3.0, 8.1.12.1). + */ + if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL) + return false; + + return true; +} + +static bool cxl_error_is_native(struct pci_dev *dev) +{ + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); + + return (pcie_ports_native || host->native_aer); +} + +static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data) +{ + struct aer_err_info *info = (struct aer_err_info *)data; + const struct pci_error_handlers *err_handler; + + if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev)) + return 0; + + guard(device)(&dev->dev); + + err_handler = dev->driver ? dev->driver->err_handler : NULL; + if (!err_handler) + return 0; + + if (info->severity == AER_CORRECTABLE) { + if (err_handler->cor_error_detected) + err_handler->cor_error_detected(dev); + } else if (err_handler->error_detected) { + if (info->severity == AER_NONFATAL) + err_handler->error_detected(dev, pci_channel_io_normal); + else if (info->severity == AER_FATAL) + err_handler->error_detected(dev, pci_channel_io_frozen); + } + return 0; +} + +void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) +{ + /* + * Internal errors of an RCEC indicate an AER error in an + * RCH's downstream port. Check and handle them in the CXL.mem + * device driver. + */ + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC && + is_aer_internal_error(info)) + pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); +} + +static int handles_cxl_error_iter(struct pci_dev *dev, void *data) +{ + bool *handles_cxl = data; + + if (!*handles_cxl) + *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev); + + /* Non-zero terminates iteration */ + return *handles_cxl; +} + +static bool handles_cxl_errors(struct pci_dev *rcec) +{ + bool handles_cxl = false; + + if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC && + pcie_aer_is_native(rcec)) + pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl); + + return handles_cxl; +} + +void cxl_rch_enable_rcec(struct pci_dev *rcec) +{ + if (!handles_cxl_errors(rcec)) + return; + + pci_aer_unmask_internal_errors(rcec); + pci_info(rcec, "CXL: Internal errors unmasked"); +} diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h index bd29d1cc7b8b..cc58bf2f2c84 100644 --- a/drivers/pci/pcie/portdrv.h +++ b/drivers/pci/pcie/portdrv.h @@ -123,4 +123,16 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {} #endif /* !CONFIG_PCIE_PME */ struct device *pcie_port_find_device(struct pci_dev *dev, u32 service); + +struct aer_err_info; + +#ifdef CONFIG_CXL_RAS +bool is_aer_internal_error(struct aer_err_info *info); +void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info); +void cxl_rch_enable_rcec(struct pci_dev *rcec); +#else +static inline bool is_aer_internal_error(struct aer_err_info *info) { return false; } +static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { } +static inline void cxl_rch_enable_rcec(struct pci_dev *rcec) { } +#endif /* CONFIG_CXL_RAS */ #endif /* _PORTDRV_H_ */ diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 7711f579fa1d..2975974f35e8 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1704,6 +1704,35 @@ static void set_pcie_thunderbolt(struct pci_dev *dev) dev->is_thunderbolt = 1; } +static void set_pcie_cxl(struct pci_dev *dev) +{ + struct pci_dev *bridge; + u16 dvsec, cap; + + if (!pci_is_pcie(dev)) + return; + + /* + * Update parent's CXL state because alternate protocol training + * may have changed + */ + bridge = pci_upstream_bridge(dev); + if (bridge) + set_pcie_cxl(bridge); + + dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL, + PCI_DVSEC_CXL_FLEXBUS_PORT); + if (!dvsec) + return; + + pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS, + &cap); + + dev->is_cxl = FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_CACHE, cap) || + FIELD_GET(PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_MEM, cap); + +} + static void set_pcie_untrusted(struct pci_dev *dev) { struct pci_dev *parent = pci_upstream_bridge(dev); @@ -2041,6 +2070,8 @@ int pci_setup_device(struct pci_dev *dev) /* Need to have dev->cfg_size ready */ set_pcie_thunderbolt(dev); + set_pcie_cxl(dev); + set_pcie_untrusted(dev); if (pci_is_pcie(dev)) diff --git a/include/linux/aer.h b/include/linux/aer.h index 02940be66324..df0f5c382286 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -56,12 +56,14 @@ struct aer_capability_regs { #if defined(CONFIG_PCIEAER) int pci_aer_clear_nonfatal_status(struct pci_dev *dev); int pcie_aer_is_native(struct pci_dev *dev); +void pci_aer_unmask_internal_errors(struct pci_dev *dev); #else static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) { return -EINVAL; } static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { } #endif void pci_print_aer(struct pci_dev *dev, int aer_severity, diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 3e0f4c990297..5533a5debf3f 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -237,6 +237,7 @@ struct resource_constraint { /* PC/ISA/whatever - the normal PC address spaces: IO and memory */ extern struct resource ioport_resource; extern struct resource iomem_resource; +extern struct resource soft_reserve_resource; extern struct resource *request_resource_conflict(struct resource *root, struct resource *new); extern int request_resource(struct resource *root, struct resource *new); @@ -423,6 +424,10 @@ walk_system_ram_res_rev(u64 start, u64 end, void *arg, extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); +extern int walk_soft_reserve_res(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)); +extern int +region_intersects_soft_reserve(resource_size_t start, size_t size); struct resource *devm_request_free_mem_region(struct device *dev, struct resource *base, unsigned long size); diff --git a/include/linux/pci.h b/include/linux/pci.h index edf792a79193..1c270f1d5123 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -475,6 +475,7 @@ struct pci_dev { unsigned int is_pciehp:1; unsigned int shpc_managed:1; /* SHPC owned by shpchp */ unsigned int is_thunderbolt:1; /* Thunderbolt controller */ + unsigned int is_cxl:1; /* Compute Express Link (CXL) */ /* * Devices marked being untrusted are the ones that can potentially * execute DMA attacks and similar. They are typically connected @@ -804,6 +805,11 @@ static inline bool pci_is_display(struct pci_dev *pdev) return (pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY; } +static inline bool pcie_is_cxl(struct pci_dev *pci_dev) +{ + return pci_dev->is_cxl; +} + #define for_each_pci_bridge(dev, bus) \ list_for_each_entry(dev, &bus->devices, bus_list) \ if (!pci_is_bridge(dev)) {} else diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index eaecc3c5f772..fdb785fa4613 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -339,9 +339,11 @@ TRACE_EVENT(aer_event, const u32 status, const u8 severity, const u8 tlp_header_valid, - struct pcie_tlp_log *tlp), + struct pcie_tlp_log *tlp, + const char *bus_type), - TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp), + + TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp, bus_type), TP_STRUCT__entry( __string( dev_name, dev_name ) @@ -349,10 +351,12 @@ TRACE_EVENT(aer_event, __field( u8, severity ) __field( u8, tlp_header_valid) __array( u32, tlp_header, PCIE_STD_MAX_TLP_HEADERLOG) + __string( bus_type, bus_type ) ), TP_fast_assign( __assign_str(dev_name); + __assign_str(bus_type); __entry->status = status; __entry->severity = severity; __entry->tlp_header_valid = tlp_header_valid; @@ -364,8 +368,8 @@ TRACE_EVENT(aer_event, } ), - TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n", - __get_str(dev_name), + TP_printk("%s %s Bus Error: severity=%s, %s, TLP Header=%s\n", + __get_str(dev_name), __get_str(bus_type), __entry->severity == AER_CORRECTABLE ? "Corrected" : __entry->severity == AER_FATAL ? "Fatal" : "Uncorrected, non-fatal", diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index 8be55ece2a21..ec1c54b5a310 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -1258,11 +1258,6 @@ #define PCI_DEV3_STA 0x0c /* Device 3 Status Register */ #define PCI_DEV3_STA_SEGMENT 0x8 /* Segment Captured (end-to-end flit-mode detected) */ -/* Compute Express Link (CXL r3.1, sec 8.1.5) */ -#define PCI_DVSEC_CXL_PORT 3 -#define PCI_DVSEC_CXL_PORT_CTL 0x0c -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001 - /* Integrity and Data Encryption Extended Capability */ #define PCI_IDE_CAP 0x04 #define PCI_IDE_CAP_LINK 0x1 /* Link IDE Stream Supported */ @@ -1343,4 +1338,63 @@ #define PCI_IDE_SEL_ADDR_3(x) (28 + (x) * PCI_IDE_SEL_ADDR_BLOCK_SIZE) #define PCI_IDE_SEL_BLOCK_SIZE(nr_assoc) (20 + PCI_IDE_SEL_ADDR_BLOCK_SIZE * (nr_assoc)) +/* + * Compute Express Link (CXL r4.0, sec 8.1) + * + * Note that CXL DVSEC id 3 and 7 to be ignored when the CXL link state + * is "disconnected" (CXL r4.0, sec 9.12.3). Re-enumerate these + * registers on downstream link-up events. + */ + +/* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */ +#define PCI_DVSEC_CXL_DEVICE 0 +#define PCI_DVSEC_CXL_CAP 0xA +#define PCI_DVSEC_CXL_MEM_CAPABLE _BITUL(2) +#define PCI_DVSEC_CXL_HDM_COUNT __GENMASK(5, 4) +#define PCI_DVSEC_CXL_CTRL 0xC +#define PCI_DVSEC_CXL_MEM_ENABLE _BITUL(2) +#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10)) +#define PCI_DVSEC_CXL_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10)) +#define PCI_DVSEC_CXL_MEM_INFO_VALID _BITUL(0) +#define PCI_DVSEC_CXL_MEM_ACTIVE _BITUL(1) +#define PCI_DVSEC_CXL_MEM_SIZE_LOW __GENMASK(31, 28) +#define PCI_DVSEC_CXL_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10)) +#define PCI_DVSEC_CXL_RANGE_BASE_LOW(i) (0x24 + (i * 0x10)) +#define PCI_DVSEC_CXL_MEM_BASE_LOW __GENMASK(31, 28) + +#define CXL_DVSEC_RANGE_MAX 2 + +/* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */ +#define PCI_DVSEC_CXL_FUNCTION_MAP 2 + +/* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */ +#define PCI_DVSEC_CXL_PORT 3 +#define PCI_DVSEC_CXL_PORT_CTL 0x0c +#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001 + +/* CXL r4.0, 8.1.6: GPF DVSEC for CXL Port */ +#define PCI_DVSEC_CXL_PORT_GPF 4 +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_CONTROL 0x0C +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_BASE __GENMASK(3, 0) +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_1_TMO_SCALE __GENMASK(11, 8) +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_CONTROL 0xE +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_BASE __GENMASK(3, 0) +#define PCI_DVSEC_CXL_PORT_GPF_PHASE_2_TMO_SCALE __GENMASK(11, 8) + +/* CXL r4.0, 8.1.7: GPF DVSEC for CXL Device */ +#define PCI_DVSEC_CXL_DEVICE_GPF 5 + +/* CXL r4.0, 8.1.8: Flex Bus DVSEC */ +#define PCI_DVSEC_CXL_FLEXBUS_PORT 7 +#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS 0xE +#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_CACHE _BITUL(0) +#define PCI_DVSEC_CXL_FLEXBUS_PORT_STATUS_MEM _BITUL(2) + +/* CXL r4.0, 8.1.9: Register Locator DVSEC */ +#define PCI_DVSEC_CXL_REG_LOCATOR 8 +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 0xC +#define PCI_DVSEC_CXL_REG_LOCATOR_BIR __GENMASK(2, 0) +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID __GENMASK(15, 8) +#define PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW __GENMASK(31, 16) + #endif /* LINUX_PCI_REGS_H */ diff --git a/kernel/resource.c b/kernel/resource.c index c5f03ac78e44..31341bdd7707 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -48,6 +48,14 @@ struct resource iomem_resource = { }; EXPORT_SYMBOL(iomem_resource); +struct resource soft_reserve_resource = { + .name = "Soft Reserved", + .start = 0, + .end = -1, + .desc = IORES_DESC_SOFT_RESERVED, + .flags = IORESOURCE_MEM, +}; + static DEFINE_RWLOCK(resource_lock); /* @@ -321,13 +329,14 @@ static bool is_type_match(struct resource *p, unsigned long flags, unsigned long } /** - * find_next_iomem_res - Finds the lowest iomem resource that covers part of - * [@start..@end]. + * find_next_res - Finds the lowest resource that covers part of + * [@start..@end]. * * If a resource is found, returns 0 and @*res is overwritten with the part * of the resource that's within [@start..@end]; if none is found, returns * -ENODEV. Returns -EINVAL for invalid parameters. * + * @parent: resource tree root to search * @start: start address of the resource searched for * @end: end address of same resource * @flags: flags which the resource must have @@ -337,9 +346,9 @@ static bool is_type_match(struct resource *p, unsigned long flags, unsigned long * The caller must specify @start, @end, @flags, and @desc * (which may be IORES_DESC_NONE). */ -static int find_next_iomem_res(resource_size_t start, resource_size_t end, - unsigned long flags, unsigned long desc, - struct resource *res) +static int find_next_res(struct resource *parent, resource_size_t start, + resource_size_t end, unsigned long flags, + unsigned long desc, struct resource *res) { /* Skip children until we find a top level range that matches */ bool skip_children = true; @@ -353,7 +362,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end, read_lock(&resource_lock); - for_each_resource(&iomem_resource, p, skip_children) { + for_each_resource(parent, p, skip_children) { /* If we passed the resource we are looking for, stop */ if (p->start > end) { p = NULL; @@ -390,16 +399,23 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end, return p ? 0 : -ENODEV; } -static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end, - unsigned long flags, unsigned long desc, - void *arg, - int (*func)(struct resource *, void *)) +static int find_next_iomem_res(resource_size_t start, resource_size_t end, + unsigned long flags, unsigned long desc, + struct resource *res) +{ + return find_next_res(&iomem_resource, start, end, flags, desc, res); +} + +static int walk_res_desc(struct resource *parent, resource_size_t start, + resource_size_t end, unsigned long flags, + unsigned long desc, void *arg, + int (*func)(struct resource *, void *)) { struct resource res; int ret = -EINVAL; while (start < end && - !find_next_iomem_res(start, end, flags, desc, &res)) { + !find_next_res(parent, start, end, flags, desc, &res)) { ret = (*func)(&res, arg); if (ret) break; @@ -410,6 +426,15 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end, return ret; } +static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end, + unsigned long flags, unsigned long desc, + void *arg, + int (*func)(struct resource *, void *)) +{ + return walk_res_desc(&iomem_resource, start, end, flags, desc, arg, func); +} + + /** * walk_iomem_res_desc - Walks through iomem resources and calls func() * with matching resource ranges. @@ -435,6 +460,18 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, EXPORT_SYMBOL_GPL(walk_iomem_res_desc); /* + * In support of device drivers claiming Soft Reserved resources, walk the Soft + * Reserved resource deferral tree. + */ +int walk_soft_reserve_res(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + return walk_res_desc(&soft_reserve_resource, start, end, IORESOURCE_MEM, + IORES_DESC_SOFT_RESERVED, arg, func); +} +EXPORT_SYMBOL_GPL(walk_soft_reserve_res); + +/* * This function calls the @func callback against all memory ranges of type * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY. * Now, this function is only for System RAM, it deals with full ranges and @@ -656,6 +693,18 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags, } EXPORT_SYMBOL_GPL(region_intersects); +/* + * Check if the provided range is registered in the Soft Reserved resource + * deferral tree for driver consideration. + */ +int region_intersects_soft_reserve(resource_size_t start, size_t size) +{ + guard(read_lock)(&resource_lock); + return __region_intersects(&soft_reserve_resource, start, size, + IORESOURCE_MEM, IORES_DESC_SOFT_RESERVED); +} +EXPORT_SYMBOL_GPL(region_intersects_soft_reserve); + void __weak arch_remove_reservations(struct resource *avail) { } diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild index 0e151d0572d1..53d84a6874b7 100644 --- a/tools/testing/cxl/Kbuild +++ b/tools/testing/cxl/Kbuild @@ -7,9 +7,10 @@ ldflags-y += --wrap=nvdimm_bus_register ldflags-y += --wrap=cxl_await_media_ready ldflags-y += --wrap=devm_cxl_add_rch_dport ldflags-y += --wrap=cxl_endpoint_parse_cdat -ldflags-y += --wrap=cxl_dport_init_ras_reporting ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup ldflags-y += --wrap=hmat_get_extended_linear_cache_size +ldflags-y += --wrap=devm_cxl_add_dport_by_dev +ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup DRIVERS := ../../../drivers CXL_SRC := $(DRIVERS)/cxl @@ -57,12 +58,14 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o cxl_core-y += $(CXL_CORE_SRC)/hdm.o cxl_core-y += $(CXL_CORE_SRC)/pmu.o cxl_core-y += $(CXL_CORE_SRC)/cdat.o -cxl_core-y += $(CXL_CORE_SRC)/ras.o cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o +cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras.o +cxl_core-$(CONFIG_CXL_RAS) += $(CXL_CORE_SRC)/ras_rch.o +cxl_core-$(CONFIG_CXL_ATL) += $(CXL_CORE_SRC)/atl.o cxl_core-y += config_check.o cxl_core-y += cxl_core_test.o cxl_core-y += cxl_core_exports.o diff --git a/tools/testing/cxl/cxl_core_exports.c b/tools/testing/cxl/cxl_core_exports.c index 6754de35598d..f088792a8925 100644 --- a/tools/testing/cxl/cxl_core_exports.c +++ b/tools/testing/cxl/cxl_core_exports.c @@ -2,28 +2,6 @@ /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ #include "cxl.h" -#include "exports.h" /* Exporting of cxl_core symbols that are only used by cxl_test */ EXPORT_SYMBOL_NS_GPL(cxl_num_decoders_committed, "CXL"); - -cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev = - __devm_cxl_add_dport_by_dev; -EXPORT_SYMBOL_NS_GPL(_devm_cxl_add_dport_by_dev, "CXL"); - -struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port, - struct device *dport_dev) -{ - return _devm_cxl_add_dport_by_dev(port, dport_dev); -} -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport_by_dev, "CXL"); - -cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup = - __devm_cxl_switch_port_decoders_setup; -EXPORT_SYMBOL_NS_GPL(_devm_cxl_switch_port_decoders_setup, "CXL"); - -int devm_cxl_switch_port_decoders_setup(struct cxl_port *port) -{ - return _devm_cxl_switch_port_decoders_setup(port); -} -EXPORT_SYMBOL_NS_GPL(devm_cxl_switch_port_decoders_setup, "CXL"); diff --git a/tools/testing/cxl/exports.h b/tools/testing/cxl/exports.h deleted file mode 100644 index 7ebee7c0bd67..000000000000 --- a/tools/testing/cxl/exports.h +++ /dev/null @@ -1,13 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* Copyright(c) 2025 Intel Corporation */ -#ifndef __MOCK_CXL_EXPORTS_H_ -#define __MOCK_CXL_EXPORTS_H_ - -typedef struct cxl_dport *(*cxl_add_dport_by_dev_fn)(struct cxl_port *port, - struct device *dport_dev); -extern cxl_add_dport_by_dev_fn _devm_cxl_add_dport_by_dev; - -typedef int(*cxl_switch_decoders_setup_fn)(struct cxl_port *port); -extern cxl_switch_decoders_setup_fn _devm_cxl_switch_port_decoders_setup; - -#endif diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 176dcde570cd..cb87e8c0e63c 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -1767,7 +1767,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) cxl_mock_add_event_logs(&mdata->mes); - cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds); + cxlmd = devm_cxl_add_memdev(cxlds, NULL); if (IS_ERR(cxlmd)) return PTR_ERR(cxlmd); diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c index 44bce80ef3ff..b8fcb50c1027 100644 --- a/tools/testing/cxl/test/mock.c +++ b/tools/testing/cxl/test/mock.c @@ -10,21 +10,12 @@ #include <cxlmem.h> #include <cxlpci.h> #include "mock.h" -#include "../exports.h" static LIST_HEAD(mock); -static struct cxl_dport * -redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port, - struct device *dport_dev); -static int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port); - void register_cxl_mock_ops(struct cxl_mock_ops *ops) { list_add_rcu(&ops->list, &mock); - _devm_cxl_add_dport_by_dev = redirect_devm_cxl_add_dport_by_dev; - _devm_cxl_switch_port_decoders_setup = - redirect_devm_cxl_switch_port_decoders_setup; } EXPORT_SYMBOL_GPL(register_cxl_mock_ops); @@ -32,9 +23,6 @@ DEFINE_STATIC_SRCU(cxl_mock_srcu); void unregister_cxl_mock_ops(struct cxl_mock_ops *ops) { - _devm_cxl_switch_port_decoders_setup = - __devm_cxl_switch_port_decoders_setup; - _devm_cxl_add_dport_by_dev = __devm_cxl_add_dport_by_dev; list_del_rcu(&ops->list); synchronize_srcu(&cxl_mock_srcu); } @@ -163,7 +151,7 @@ __wrap_nvdimm_bus_register(struct device *dev, } EXPORT_SYMBOL_GPL(__wrap_nvdimm_bus_register); -int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port) +int __wrap_devm_cxl_switch_port_decoders_setup(struct cxl_port *port) { int rc, index; struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); @@ -171,11 +159,12 @@ int redirect_devm_cxl_switch_port_decoders_setup(struct cxl_port *port) if (ops && ops->is_mock_port(port->uport_dev)) rc = ops->devm_cxl_switch_port_decoders_setup(port); else - rc = __devm_cxl_switch_port_decoders_setup(port); + rc = devm_cxl_switch_port_decoders_setup(port); put_cxl_mock_ops(index); return rc; } +EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_switch_port_decoders_setup, "CXL"); int __wrap_devm_cxl_endpoint_decoders_setup(struct cxl_port *port) { @@ -245,20 +234,8 @@ void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port) } EXPORT_SYMBOL_NS_GPL(__wrap_cxl_endpoint_parse_cdat, "CXL"); -void __wrap_cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host) -{ - int index; - struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); - - if (!ops || !ops->is_mock_port(dport->dport_dev)) - cxl_dport_init_ras_reporting(dport, host); - - put_cxl_mock_ops(index); -} -EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL"); - -struct cxl_dport *redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port, - struct device *dport_dev) +struct cxl_dport *__wrap_devm_cxl_add_dport_by_dev(struct cxl_port *port, + struct device *dport_dev) { int index; struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); @@ -267,11 +244,12 @@ struct cxl_dport *redirect_devm_cxl_add_dport_by_dev(struct cxl_port *port, if (ops && ops->is_mock_port(port->uport_dev)) dport = ops->devm_cxl_add_dport_by_dev(port, dport_dev); else - dport = __devm_cxl_add_dport_by_dev(port, dport_dev); + dport = devm_cxl_add_dport_by_dev(port, dport_dev); put_cxl_mock_ops(index); return dport; } +EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_dport_by_dev, "CXL"); MODULE_LICENSE("GPL v2"); MODULE_DESCRIPTION("cxl_test: emulation module"); |
