<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/edac, branch v3.18.84</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.84</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.84'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-05-17T18:52:13Z</updated>
<entry>
<title>EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback</title>
<updated>2016-05-17T18:52:13Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2016-04-29T13:42:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=65c2cfa5e503d8fdca05cd6b75b78e61b4415ae4'/>
<id>urn:sha1:65c2cfa5e503d8fdca05cd6b75b78e61b4415ae4</id>
<content type='text'>
[ Upstream commit c4fc1956fa31003bfbe4f597e359d751568e2954 ]

Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Acked-by: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address</title>
<updated>2016-05-10T16:20:19Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2016-04-14T17:21:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b7c4a9d51100cf69d87b758fb8aac3c2eb7b8499'/>
<id>urn:sha1:b7c4a9d51100cf69d87b758fb8aac3c2eb7b8499</id>
<content type='text'>
[ Upstream commit ff15e95c82768d589957dbb17d7eb7dba7904659 ]

In commit:

  eb1af3b71f9d ("Fix computation of channel address")

I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.

Tested-by: Patrick Geary &lt;patrickg@supermicro.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
Cc: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: eb1af3b71f9d ("Fix computation of channel address")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>EDAC/sb_edac: Fix computation of channel address</title>
<updated>2016-04-18T12:49:27Z</updated>
<author>
<name>Luck, Tony</name>
<email>tony.luck@intel.com</email>
</author>
<published>2016-03-10T00:40:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7d974c26f1dc39573f5f6c7d65fe21892e8246d2'/>
<id>urn:sha1:7d974c26f1dc39573f5f6c7d65fe21892e8246d2</id>
<content type='text'>
[ Upstream commit eb1af3b71f9d83e45f2fd2fd649356e98e1c582c ]

Large memory Haswell-EX systems with multiple DIMMs per channel were
sometimes reporting the wrong DIMM.

Found three problems:

 1) Debug printouts for socket and channel interleave were not interpreting
    the register fields correctly. The socket interleave field is a 2^X
    value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
    2=3. 3=4).

 2) Actual use of the socket interleave value didn't interpret as 2^X

 3) Conversion of address to channel address was complicated, and wrong.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()</title>
<updated>2016-04-12T13:09:50Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2016-01-20T09:54:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14b538491eadcade73b2de1fcbd843012f5e75a8'/>
<id>urn:sha1:14b538491eadcade73b2de1fcbd843012f5e75a8</id>
<content type='text'>
[ Upstream commit 6f3508f61c814ee852c199988a62bd954c50dfc1 ]

dct_sel_base_off is declared as a u64 but we're only using the lower 32
bits because of a shift wrapping bug. This can possibly truncate the
upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS
row.

Fixes: c8e518d5673d ('amd64_edac: Sanitize f10_get_base_addr_offset')
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Aravind Gopalakrishnan &lt;Aravind.Gopalakrishnan@amd.com&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>EDAC: Robustify workqueues destruction</title>
<updated>2016-02-02T18:56:51Z</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2015-11-27T09:38:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a0493b5238daa6fc5dbd23a9b46e9cab041a5c11'/>
<id>urn:sha1:a0493b5238daa6fc5dbd23a9b46e9cab041a5c11</id>
<content type='text'>
[ Upstream commit fcd5c4dd8201595d4c598c9cca5e54760277d687 ]

EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.

Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.

  EDAC i7core: Driver loaded.
  general protection fault: 0000 [#1] SMP
  CPU 12
  Modules linked in:
  Supported: Yes
  Pid: 0, comm: kworker/0:1 Tainted: G          IE   3.0.101-0-default #1 HP ProLiant DL380 G7
  RIP: 0010:[&lt;ffffffff8107dcd7&gt;]  [&lt;ffffffff8107dcd7&gt;] __queue_work+0x17/0x3f0
  &lt; ... regs ...&gt;
  Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
  Stack:
   ...
  Call Trace:
   call_timer_fn
   run_timer_softirq
   __do_softirq
   call_softirq
   do_softirq
   irq_exit
   smp_apic_timer_interrupt
   apic_timer_interrupt
   intel_idle
   cpuidle_idle_call
   cpu_idle
  Code: ...
  RIP  __queue_work
   RSP &lt;...&gt;

Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>EDAC, ppc4xx: Access mci-&gt;csrows array elements properly</title>
<updated>2015-09-16T14:00:40Z</updated>
<author>
<name>Michael Walle</name>
<email>michael@walle.cc</email>
</author>
<published>2015-07-21T09:00:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4db77254bcd9da65990488883d26110ca9bdfde4'/>
<id>urn:sha1:4db77254bcd9da65990488883d26110ca9bdfde4</id>
<content type='text'>
[ Upstream commit 5c16179b550b9fd8114637a56b153c9768ea06a5 ]

The commit

  de3910eb79ac ("edac: change the mem allocation scheme to
		 make Documentation/kobject.txt happy")

changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.

Signed-off-by: Michael Walle &lt;michael@walle.cc&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>sb_edac: Fix erroneous bytes-&gt;gigabytes conversion</title>
<updated>2015-07-13T12:50:04Z</updated>
<author>
<name>Jim Snow</name>
<email>jim.m.snow@intel.com</email>
</author>
<published>2014-11-18T13:51:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f1576d2ef7c36cae51803a33a46437dd38a873b6'/>
<id>urn:sha1:f1576d2ef7c36cae51803a33a46437dd38a873b6</id>
<content type='text'>
[ Upstream commit ece9210859abb2cc0126f8adbfbbdee668d4906a ]

commit 8c009100295597f23978c224aec5751a365bc965 upstream.

Signed-off-by: Jim Snow &lt;jim.snow@intel.com&gt;
Signed-off-by: Lukasz Anaczkowski &lt;lukasz.anaczkowski@intel.com&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
[ vlee: Backported to 3.14. Adjusted context. ]
Signed-off-by: Vinson Lee &lt;vlee@twitter.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error</title>
<updated>2015-06-09T17:43:45Z</updated>
<author>
<name>Chen Yucong</name>
<email>slaoub@gmail.com</email>
</author>
<published>2014-11-18T02:09:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c41c609c77b54451d1404d4fd2bfbf5c7efcd7b9'/>
<id>urn:sha1:c41c609c77b54451d1404d4fd2bfbf5c7efcd7b9</id>
<content type='text'>
[ Upstream commit e3480271f59253cb60d030aa5e615bf00b731fea ]

Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan &lt;Aravind.Gopalakrishnan@amd.com&gt;
Signed-off-by: Chen Yucong &lt;slaoub@gmail.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>sb_edac: Fix typo computing number of banks</title>
<updated>2015-04-17T00:13:15Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2014-12-02T17:41:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=32ac769b9d4183291b3afaa1e3222afea5614486'/>
<id>urn:sha1:32ac769b9d4183291b3afaa1e3222afea5614486</id>
<content type='text'>
[ Upstream commit fec53af531dd040e41fe358abe00b33747af2688 ]

Code will always think there are 16 banks because of a typo

Reported-by: Misha
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>sb_edac: Fix discovery of top-of-low-memory for Haswell</title>
<updated>2015-04-17T00:13:15Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2014-10-29T17:36:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e3d11d50ecb18de7393177bffae18b419c3f15de'/>
<id>urn:sha1:e3d11d50ecb18de7393177bffae18b419c3f15de</id>
<content type='text'>
[ Upstream commit f7cf2a22a2896d3b3595b71d7936b6d7a3316b00 ]

Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset.  There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).

This resulted in a bogus value for the top of low memory:

  EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523eac86
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
   MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)

With the fix we see the correct TOLM value:

   DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

and we decode address 2000000 correctly:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523e1086
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
   DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
   DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 &lt; 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
   DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
   DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 &lt; 0xffffffff, RIR interleave 0, index 0
   DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
   MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@osg.samsung.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
</feed>
