<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/lib/raid6/algos.c, branch v4.19.187</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.187</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.187'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-04-07T19:08:19Z</updated>
<entry>
<title>Merge tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux</title>
<updated>2018-04-07T19:08:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-04-07T19:08:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49a695ba723224875df50e327bd7b0b65dd9a56b'/>
<id>urn:sha1:49a695ba723224875df50e327bd7b0b65dd9a56b</id>
<content type='text'>
Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for 4PB user address space on 64-bit, opt-in via mmap().

   - Removal of POWER4 support, which was accidentally broken in 2016
     and no one noticed, and blocked use of some modern instructions.

   - Workarounds so that the hypervisor can enable Transactional Memory
     on Power9.

   - A series to disable the DAWR (Data Address Watchpoint Register) on
     Power9.

   - More information displayed in the meltdown/spectre_v1/v2 sysfs
     files.

   - A vpermxor (Power8 Altivec) implementation for the raid6 Q
     Syndrome.

   - A big series to make the allocation of our pacas (per cpu area),
     kernel page tables, and per-cpu stacks NUMA aware when using the
     Radix MMU on Power9.

  And as usual many fixes, reworks and cleanups.

  Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
  Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
  Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
  Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
  Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
  Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
  Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
  Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
  Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
  Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
  Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
  Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
  Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"

* tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
  powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
  powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
  powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
  powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
  Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
  powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
  powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
  cxl: Fix possible deadlock when processing page faults from cxllib
  powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
  powerpc/mm/radix: Update command line parsing for disable_radix
  powerpc/mm/radix: Parse disable_radix commandline correctly.
  powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
  powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
  powerpc/mm/keys: Update documentation and remove unnecessary check
  powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
  powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
  powerpc/powernv: Always stop secondaries before reboot/shutdown
  powerpc: hard disable irqs in smp_send_stop loop
  powerpc: use NMI IPI for smp_send_stop
  powerpc/powernv: Fix SMT4 forcing idle code
  ...
</content>
</entry>
<entry>
<title>raid: remove tile specific raid6 implementation</title>
<updated>2018-03-26T13:56:28Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-03-09T15:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=889ce12b1650b3c388634451872638a08faf6d6b'/>
<id>urn:sha1:889ce12b1650b3c388634451872638a08faf6d6b</id>
<content type='text'>
The Tile architecture is getting removed, so we no longer need this either.

Acked-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome</title>
<updated>2018-03-20T05:47:25Z</updated>
<author>
<name>Matt Brown</name>
<email>matthew.brown.dev@gmail.com</email>
</author>
<published>2017-08-04T03:42:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=751ba79cc552c146595cd439b21c4ff8998c3b69'/>
<id>urn:sha1:751ba79cc552c146595cd439b21c4ff8998c3b69</id>
<content type='text'>
This patch uses the vpermxor instruction to optimise the raid6 Q
syndrome. This instruction was made available with POWER8, ISA version
2.07. It allows for both vperm and vxor instructions to be done in a
single instruction. This has been tested for correctness on a ppc64le
vm with a basic RAID6 setup containing 5 drives.

The performance benchmarks are from the raid6test in the
/lib/raid6/test directory. These results are from an IBM Firestone
machine with ppc64le architecture. The benchmark results show a 35%
speed increase over the best existing algorithm for powerpc (altivec).
The raid6test has also been run on a big-endian ppc64 vm to ensure it
also works for big-endian architectures.

Performance benchmarks:
  raid6: altivecx4 gen() 18773 MB/s
  raid6: altivecx8 gen() 19438 MB/s

  raid6: vpermxor4 gen() 25112 MB/s
  raid6: vpermxor8 gen() 26279 MB/s

Signed-off-by: Matt Brown &lt;matthew.brown.dev@gmail.com&gt;
Reviewed-by: Daniel Axtens &lt;dja@axtens.net&gt;
[mpe: Add VPERMXOR macro so we can build with old binutils]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
</entry>
<entry>
<title>md/raid6: implement recovery using ARM NEON intrinsics</title>
<updated>2017-08-09T17:52:07Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-07-13T17:16:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6ec4e2514decd6fb4782a9364fa71d6244d05af4'/>
<id>urn:sha1:6ec4e2514decd6fb4782a9364fa71d6244d05af4</id>
<content type='text'>
Provide a NEON accelerated implementation of the recovery algorithm,
which supersedes the default byte-by-byte one.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2016-10-07T16:45:43Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-10-07T16:45:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c23112e0395a89c8a52cd955442240de7fba46aa'/>
<id>urn:sha1:c23112e0395a89c8a52cd955442240de7fba46aa</id>
<content type='text'>
Pull MD updates from Shaohua Li:
 "This update includes:

   - new AVX512 instruction based raid6 gen/recovery algorithm

   - a couple of md-cluster related bug fixes

   - fix a potential deadlock

   - set nonrotational bit for raid array with SSD

   - set correct max_hw_sectors for raid5/6, which hopefuly can improve
     performance a little bit

   - other minor fixes"

* tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md: set rotational bit
  raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
  raid5: handle register_shrinker failure
  raid5: fix to detect failure of register_shrinker
  md: fix a potential deadlock
  md/bitmap: fix wrong cleanup
  raid5: allow arbitrary max_hw_sectors
  lib/raid6: Add AVX512 optimized xor_syndrome functions
  lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
  lib/raid6: Add AVX512 optimized recovery functions
  lib/raid6: Add AVX512 optimized gen_syndrome functions
  md-cluster: make resync lock also could be interruptted
  md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  md-cluster: convert the completion to wait queue
  md-cluster: protect md_find_rdev_nr_rcu with rcu lock
  md-cluster: clean related infos of cluster
  md: changes for MD_STILL_CLOSED flag
  md-cluster: remove some unnecessary dlm_unlock_sync
  md-cluster: use FORCEUNLOCK in lockres_free
  md-cluster: call md_kick_rdev_from_array once ack failed
</content>
</entry>
<entry>
<title>lib/raid6: Add AVX512 optimized recovery functions</title>
<updated>2016-09-21T16:09:44Z</updated>
<author>
<name>Gayatri Kammela</name>
<email>gayatri.kammela@intel.com</email>
</author>
<published>2016-08-13T01:03:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=13c520b2993c9faae6770264d33ff1e1ea4c2ceb'/>
<id>urn:sha1:13c520b2993c9faae6770264d33ff1e1ea4c2ceb</id>
<content type='text'>
Optimize RAID6 recovery functions to take advantage of
the 512-bit ZMM integer instructions introduced in AVX512.

AVX512 optimized recovery functions, which is simply based
on recov_avx2.c written by Jim Kukunas

This patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: Jim Kukunas &lt;james.t.kukunas@linux.intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Megha Dey &lt;megha.dey@linux.intel.com&gt;
Signed-off-by: Gayatri Kammela &lt;gayatri.kammela@intel.com&gt;
Reviewed-by: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>lib/raid6: Add AVX512 optimized gen_syndrome functions</title>
<updated>2016-09-21T16:09:44Z</updated>
<author>
<name>Gayatri Kammela</name>
<email>gayatri.kammela@intel.com</email>
</author>
<published>2016-08-13T01:03:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e0a491c1296874a1aca51cc68452f12a4d950029'/>
<id>urn:sha1:e0a491c1296874a1aca51cc68452f12a4d950029</id>
<content type='text'>
Optimize RAID6 gen_syndrom functions to take advantage of
the 512-bit ZMM integer instructions introduced in AVX512.

AVX512 optimized gen_syndrom functions, which is simply based
on avx2.c written by Yuanhan Liu and sse2.c written by hpa.

The patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Jim Kukunas &lt;james.t.kukunas@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Megha Dey &lt;megha.dey@linux.intel.com&gt;
Signed-off-by: Gayatri Kammela &lt;gayatri.kammela@intel.com&gt;
Reviewed-by: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>RAID/s390: provide raid6 recovery optimization</title>
<updated>2016-09-01T14:13:25Z</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2016-08-31T07:27:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5b55fa1f81d518925d68b50d2316850c525d1ad'/>
<id>urn:sha1:f5b55fa1f81d518925d68b50d2316850c525d1ad</id>
<content type='text'>
The XC instruction can be used to improve the speed of the raid6
recovery. The loops now operate on blocks of 256 bytes.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>RAID/s390: add SIMD implementation for raid6 gen/xor</title>
<updated>2016-08-29T09:05:04Z</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2016-08-23T11:30:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=474fd6e80fe529e9adeeb7ea9d4e5d6c4da0b7fe'/>
<id>urn:sha1:474fd6e80fe529e9adeeb7ea9d4e5d6c4da0b7fe</id>
<content type='text'>
Using vector registers is slightly faster:

raid6: vx128x8  gen() 19705 MB/s
raid6: vx128x8  xor() 11886 MB/s
raid6: using algorithm vx128x8 gen() 19705 MB/s
raid6: .... xor() 11886 MB/s, rmw enabled

vs the software algorithms:

raid6: int64x1  gen()  3018 MB/s
raid6: int64x1  xor()  1429 MB/s
raid6: int64x2  gen()  4661 MB/s
raid6: int64x2  xor()  3143 MB/s
raid6: int64x4  gen()  5392 MB/s
raid6: int64x4  xor()  3509 MB/s
raid6: int64x8  gen()  4441 MB/s
raid6: int64x8  xor()  3207 MB/s
raid6: using algorithm int64x4 gen() 5392 MB/s
raid6: .... xor() 3509 MB/s, rmw enabled

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>md/raid6 algorithms: delta syndrome functions</title>
<updated>2015-04-21T22:00:41Z</updated>
<author>
<name>Markus Stockhausen</name>
<email>stockhausen@collogia.de</email>
</author>
<published>2014-12-15T01:57:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fe5cbc6e06c7d8b3a86f6f5491d74766bb5c2827'/>
<id>urn:sha1:fe5cbc6e06c7d8b3a86f6f5491d74766bb5c2827</id>
<content type='text'>
v3: s-o-b comment, explanation of performance and descision for
the start/stop implementation

Implementing rmw functionality for RAID6 requires optimized syndrome
calculation. Up to now we can only generate a complete syndrome. The
target P/Q pages are always overwritten. With this patch we provide
a framework for inplace P/Q modification. In the first place simply
fill those functions with NULL values.

xor_syndrome() has two additional parameters: start &amp; stop. These
will indicate the first and last page that are changing during a
rmw run. That makes it possible to avoid several unneccessary loops
and speed up calculation. The caller needs to implement the following
logic to make the functions work.

1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
blocks inside P/Q between (and including) start and end.

2) modify any block with start &lt;= block &lt;= stop

3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
source blocks into P/Q between (and including) start and end.

Pages between start and stop that won't be changed should be filled
with a pointer to the kernel zero page. The reasons for not taking NULL
pages are:

1) Algorithms cross the whole source data line by line. Thus avoid
additional branches.

2) Having a NULL page avoids calculating the XOR P parity but still
need calulation steps for the Q parity. Depending on the algorithm
unrolling that might be only a difference of 2 instructions per loop.

The benchmark numbers of the gen_syndrome() functions are displayed in
the kernel log. Do the same for the xor_syndrome() functions. This
will help to analyze performance problems and give an rough estimate
how well the algorithm works. The choice of the fastest algorithm will
still depend on the gen_syndrome() performance.

With the start/stop page implementation the speed can vary a lot in real
life. E.g. a change of page 0 &amp; page 15 on a stripe will be harder to
compute than the case where page 0 &amp; page 1 are XOR candidates. To be not
to enthusiatic about the expected speeds we will run a worse case test
that simulates a change on the upper half of the stripe. So we do:

1) calculation of P/Q for the upper pages

2) continuation of Q for the lower (empty) pages

Signed-off-by: Markus Stockhausen &lt;stockhausen@collogia.de&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
</feed>
