<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/edac, branch v4.14.242</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.242</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.242'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-12-29T12:47:06Z</updated>
<entry>
<title>EDAC/amd64: Fix PCI component registration</title>
<updated>2020-12-29T12:47:06Z</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2020-11-22T14:57:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=68a850b64d72f20f81c5be713cdc81fbcd4c27a3'/>
<id>urn:sha1:68a850b64d72f20f81c5be713cdc81fbcd4c27a3</id>
<content type='text'>
commit 706657b1febf446a9ba37dc51b89f46604f57ee9 upstream.

In order to setup its PCI component, the driver needs any node private
instance in order to get a reference to the PCI device and hand that
into edac_pci_create_generic_ctl(). For convenience, it uses the 0th
memory controller descriptor under the assumption that if any, the 0th
will be always present.

However, this assumption goes wrong when the 0th node doesn't have
memory and the driver doesn't initialize an instance for it:

  EDAC amd64: F17h detected (node 0).
  ...
  EDAC amd64: Node 0: No DIMMs detected.

But looking up node instances is not really needed - all one needs is
the pointer to the proper device which gets discovered during instance
init.

So stash that pointer into a variable and use it when setting up the
EDAC PCI component.

Clear that variable when the driver needs to unwind due to some
instances failing init to avoid any registration imbalance.

Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>EDAC/i5100: Fix error handling order in i5100_init_one()</title>
<updated>2020-10-29T08:07:00Z</updated>
<author>
<name>Dinghao Liu</name>
<email>dinghao.liu@zju.edu.cn</email>
</author>
<published>2020-08-26T12:14:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4024c42c6bfc3cc9ab7b18060ae16d69d4a842f9'/>
<id>urn:sha1:4024c42c6bfc3cc9ab7b18060ae16d69d4a842f9</id>
<content type='text'>
[ Upstream commit 857a3139bd8be4f702c030c8ca06f3fd69c1741a ]

When pci_get_device_func() fails, the driver doesn't need to execute
pci_dev_put(). mci should still be freed, though, to prevent a memory
leak. When pci_enable_device() fails, the error injection PCI device
"einj" doesn't need to be disabled either.

 [ bp: Massage commit message, rename label to "bail_mc_free". ]

Fixes: 52608ba205461 ("i5100_edac: probe for device 19 function 0")
Signed-off-by: Dinghao Liu &lt;dinghao.liu@zju.edu.cn&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/ie31200: Fallback if host bridge device is already initialized</title>
<updated>2020-09-03T09:22:27Z</updated>
<author>
<name>Jason Baron</name>
<email>jbaron@akamai.com</email>
</author>
<published>2020-07-16T18:25:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=332f8c222590ef5a8fa057ff77ed436bfd4a7fe9'/>
<id>urn:sha1:332f8c222590ef5a8fa057ff77ed436bfd4a7fe9</id>
<content type='text'>
[ Upstream commit 709ed1bcef12398ac1a35c149f3e582db04456c2 ]

The Intel uncore driver may claim some of the pci ids from ie31200 which
means that the ie31200 edac driver will not initialize them as part of
pci_register_driver().

Let's add a fallback for this case to 'pci_get_device()' to get a
reference on the device such that it can still be configured. This is
similar in approach to other edac drivers.

Signed-off-by: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/1594923911-10885-1-git-send-email-jbaron@akamai.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC: Fix reference count leaks</title>
<updated>2020-08-21T07:48:03Z</updated>
<author>
<name>Qiushi Wu</name>
<email>wu000273@umn.edu</email>
</author>
<published>2020-05-28T20:22:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=46f7ec5fe967f8cb8e5ce479fa23e5476bbee2c1'/>
<id>urn:sha1:46f7ec5fe967f8cb8e5ce479fa23e5476bbee2c1</id>
<content type='text'>
[ Upstream commit 17ed808ad243192fb923e4e653c1338d3ba06207 ]

When kobject_init_and_add() returns an error, it should be handled
because kobject_init_and_add() takes a reference even when it fails. If
this function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.

Therefore, replace calling kfree() and call kobject_put() and add a
missing kobject_put() in the edac_device_register_sysfs_main_kobj()
error path.

 [ bp: Massage and merge into a single patch. ]

Fixes: b2ed215a3338 ("Kobject: change drivers/edac to use kobject_init_and_add")
Signed-off-by: Qiushi Wu &lt;wu000273@umn.edu&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu
Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/amd64: Read back the scrub rate PCI register on F15h</title>
<updated>2020-07-09T07:36:30Z</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2020-06-18T18:25:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ecfc0fdaea1ec318583add149ca34b173ad1c6f'/>
<id>urn:sha1:3ecfc0fdaea1ec318583add149ca34b173ad1c6f</id>
<content type='text'>
[ Upstream commit ee470bb25d0dcdf126f586ec0ae6dca66cb340a4 ]

Commit:

  da92110dfdfa ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")

added support for F15h, model 0x60 CPUs but in doing so, missed to read
back SCRCTRL PCI config register on F15h CPUs which are *not* model
0x60. Add that read so that doing

  $ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate

can show the previously set DRAM scrub rate.

Fixes: da92110dfdfa ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")
Reported-by: Anders Andersson &lt;pipatron@gmail.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt; #v4.4..
Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/amd64: Set grain per DIMM</title>
<updated>2020-03-11T17:02:56Z</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-03-05T16:30:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b5295b5c60d6048db2112f4bb691c9cf97631f0'/>
<id>urn:sha1:5b5295b5c60d6048db2112f4bb691c9cf97631f0</id>
<content type='text'>
[ Upstream commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc ]

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Robert Richter &lt;rrichter@marvell.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
[jwang: backport to 4.14 for fix warning during memory error. ]
Signed-off-by: Jack Wang &lt;jinpu.wang@cloud.ionos.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/mc: Fix edac_mc_find() in case no device is found</title>
<updated>2020-01-27T13:46:34Z</updated>
<author>
<name>Robert Richter</name>
<email>rrichter@marvell.com</email>
</author>
<published>2019-05-14T10:49:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=857247a21385e45dbffbc87b95f3761faa04fa81'/>
<id>urn:sha1:857247a21385e45dbffbc87b95f3761faa04fa81</id>
<content type='text'>
[ Upstream commit 29a0c843973bc385918158c6976e4dbe891df969 ]

The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.

I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.

 [ bp: Drop the if (mci-&gt;mc_idx &gt; idx) test in favor of readability. ]

Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter &lt;rrichter@marvell.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/ghes: Fix grain calculation</title>
<updated>2019-12-31T11:37:36Z</updated>
<author>
<name>Robert Richter</name>
<email>rrichter@marvell.com</email>
</author>
<published>2019-11-06T09:33:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b5c1ddeae6945b19cd7728f4dfcefa7fae3e4a40'/>
<id>urn:sha1:b5c1ddeae6945b19cd7728f4dfcefa7fae3e4a40</id>
<content type='text'>
[ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ]

The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:

	e-&gt;grain = ~(mem_err-&gt;physical_addr_mask &amp; ~PAGE_MASK);

This is broken in several ways:

1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.

2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.

Fix this with:

	e-&gt;grain = ~mem_err-&gt;physical_addr_mask + 1;

The grain_bits are defined as:

	grain = 1 &lt;&lt; grain_bits;

Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.

The value in -&gt;physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding it up.

Suggested-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Robert Richter &lt;rrichter@marvell.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()</title>
<updated>2019-12-01T08:13:19Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2018-10-13T10:28:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=99096f289bde739098cd3117c22a44cc143647fb'/>
<id>urn:sha1:99096f289bde739098cd3117c22a44cc143647fb</id>
<content type='text'>
[ Upstream commit d8c27ba86a2fd806d3957e5a9b30e66dfca2a61d ]

Fix memory leak in L2c threaded interrupt handler.

 [ bp: Rewrite commit message. ]

Fixes: 41003396f932 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
CC: David Daney &lt;david.daney@cavium.com&gt;
CC: Jan Glauber &lt;jglauber@cavium.com&gt;
CC: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
CC: Sergey Temerkhanov &lt;s.temerkhanov@gmail.com&gt;
CC: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20181013102843.GG16086@mwanda
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC, sb_edac: Return early on ADDRV bit and address type test</title>
<updated>2019-11-20T16:59:51Z</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2018-09-07T23:08:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0d533e3681c79ffa1dcbc4c97535b61fda878d37'/>
<id>urn:sha1:0d533e3681c79ffa1dcbc4c97535b61fda878d37</id>
<content type='text'>
[ Upstream commit dcc960b225ceb2bd66c45e0845d03e577f7010f9 ]

Users of the mce_register_decode_chain() are called for every logged
error. EDAC drivers should check:

1) Is this a memory error? [bit 7 in status register]
2) Is there a valid address? [bit 58 in status register]
3) Is the address a system address? [bitfield 8:6 in misc register]

The sb_edac driver performed test "1" twice. Waited far too long to
perform check "2". Didn't do check "3" at all.

Fix it by moving the test for valid address from
sbridge_mce_output_error() into sbridge_mce_check_error() and add a test
for the type immediately after. Delete the redundant check for the type
of the error from sbridge_mce_output_error().

Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Cc: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com
[ Re-word commit message. ]
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
