summaryrefslogtreecommitdiff
path: root/Documentation/driver-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/driver-api')
-rw-r--r--Documentation/driver-api/hw-recoverable-errors.rst60
-rw-r--r--Documentation/driver-api/index.rst1
-rw-r--r--Documentation/driver-api/nvdimm/btt.rst2
-rw-r--r--Documentation/driver-api/pci/index.rst1
-rw-r--r--Documentation/driver-api/pci/tsm.rst21
5 files changed, 84 insertions, 1 deletions
diff --git a/Documentation/driver-api/hw-recoverable-errors.rst b/Documentation/driver-api/hw-recoverable-errors.rst
new file mode 100644
index 000000000000..fc526c3454bd
--- /dev/null
+++ b/Documentation/driver-api/hw-recoverable-errors.rst
@@ -0,0 +1,60 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================================
+Recoverable Hardware Error Tracking in vmcoreinfo
+=================================================
+
+Overview
+--------
+
+This feature provides a generic infrastructure within the Linux kernel to track
+and log recoverable hardware errors. These are hardware recoverable errors
+visible that might not cause immediate panics but may influence health, mainly
+because new code path will be executed in the kernel.
+
+By recording counts and timestamps of recoverable errors into the vmcoreinfo
+crash dump notes, this infrastructure aids post-mortem crash analysis tools in
+correlating hardware events with kernel failures. This enables faster triage
+and better understanding of root causes, especially in large-scale cloud
+environments where hardware issues are common.
+
+Benefits
+--------
+
+- Facilitates correlation of hardware recoverable errors with kernel panics or
+ unusual code paths that lead to system crashes.
+- Provides operators and cloud providers quick insights, improving reliability
+ and reducing troubleshooting time.
+- Complements existing full hardware diagnostics without replacing them.
+
+Data Exposure and Consumption
+-----------------------------
+
+- The tracked error data consists of per-error-type counts and timestamps of
+ last occurrence.
+- This data is stored in the `hwerror_data` array, categorized by error source
+ types like CPU, memory, PCI, CXL, and others.
+- It is exposed via vmcoreinfo crash dump notes and can be read using tools
+ like `crash`, `drgn`, or other kernel crash analysis utilities.
+- There is no other way to read these data other than from crash dumps.
+- These errors are divided by area, which includes CPU, Memory, PCI, CXL and
+ others.
+
+Typical usage example (in drgn REPL):
+
+.. code-block:: python
+
+ >>> prog['hwerror_data']
+ (struct hwerror_info[HWERR_RECOV_MAX]){
+ {
+ .count = (int)844,
+ .timestamp = (time64_t)1752852018,
+ },
+ ...
+ }
+
+Enabling
+--------
+
+- This feature is enabled when CONFIG_VMCORE_INFO is set.
+
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index baff96b5cf0b..1833e6a0687e 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -97,6 +97,7 @@ Subsystem-specific APIs
gpio/index
hsi
hte/index
+ hw-recoverable-errors
i2c
iio/index
infiniband
diff --git a/Documentation/driver-api/nvdimm/btt.rst b/Documentation/driver-api/nvdimm/btt.rst
index 107395c042ae..2d8269f834bd 100644
--- a/Documentation/driver-api/nvdimm/btt.rst
+++ b/Documentation/driver-api/nvdimm/btt.rst
@@ -83,7 +83,7 @@ flags, and the remaining form the internal block number.
======== =============================================================
Bit Description
======== =============================================================
-31 - 30 Error and Zero flags - Used in the following way::
+31 - 30 Error and Zero flags - Used in the following way:
== == ====================================================
31 30 Description
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
index a38e475cdbe3..9e1b801d0f74 100644
--- a/Documentation/driver-api/pci/index.rst
+++ b/Documentation/driver-api/pci/index.rst
@@ -10,6 +10,7 @@ The Linux PCI driver implementer's API guide
pci
p2pdma
+ tsm
.. only:: subproject and html
diff --git a/Documentation/driver-api/pci/tsm.rst b/Documentation/driver-api/pci/tsm.rst
new file mode 100644
index 000000000000..232b92bec93f
--- /dev/null
+++ b/Documentation/driver-api/pci/tsm.rst
@@ -0,0 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+========================================================
+PCI Trusted Execution Environment Security Manager (TSM)
+========================================================
+
+Subsystem Interfaces
+====================
+
+.. kernel-doc:: include/linux/pci-ide.h
+ :internal:
+
+.. kernel-doc:: drivers/pci/ide.c
+ :export:
+
+.. kernel-doc:: include/linux/pci-tsm.h
+ :internal:
+
+.. kernel-doc:: drivers/pci/tsm.c
+ :export: