| Age | Commit message (Collapse) | Author |
|
Add the following test cases to cover gaps in the SHAKE testing:
- test_shake_all_lens_up_to_4096()
- test_shake_multiple_squeezes()
- test_shake_with_guarded_bufs()
Remove test_shake256_tiling() and test_shake256_tiling2() since they are
superseded by test_shake_multiple_squeezes(). It provides better test
coverage by using randomized testing. E.g., it's able to generate a
zero-length squeeze followed by a nonzero-length squeeze, which the
first 7 versions of the SHA-3 patchset handled incorrectly.
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a SHA3 kunit test suite, providing the following:
(*) A simple test of each of SHA3-224, SHA3-256, SHA3-384, SHA3-512,
SHAKE128 and SHAKE256.
(*) NIST 0- and 1600-bit test vectors for SHAKE128 and SHAKE256.
(*) Output tiling (multiple squeezing) tests for SHAKE256.
(*) Standard hash template test for SHA3-256. To make this possible,
gen-hash-testvecs.py is modified to support sha3-256.
(*) Standard benchmark test for SHA3-256.
[EB: dropped some unnecessary changes to gen-hash-testvecs.py, moved
addition of Testing section in doc file into this commit, and
other small cleanups]
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20251026055032.1413733-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a KUnit test suite for the BLAKE2b library API, mirroring the
BLAKE2s test suite very closely.
As with the BLAKE2s test suite, a benchmark is included.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface. This makes the POLYVAL library be
properly optimized on x86_64.
This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there. But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.
Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.
Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned. Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Migrate the arm64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface. This makes the POLYVAL library be
properly optimized on arm64.
This drops the arm64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there. But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.
Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add support for POLYVAL to lib/crypto/.
This will replace the polyval crypto_shash algorithm and its use in the
hctr2 template, simplifying the code and reducing overhead.
Specifically, this commit introduces the POLYVAL library API and a
generic implementation of it. Later commits will migrate the existing
architecture-optimized implementations of POLYVAL into lib/crypto/ and
add a KUnit test suite.
I've also rewritten the generic implementation completely, using a more
modern approach instead of the traditional table-based approach. It's
now constant-time, requires no precomputation or dynamic memory
allocations, decreases the per-key memory usage from 4096 bytes to 16
bytes, and is faster than the old polyval-generic even on bulk data
reusing the same key (at least on x86_64, where I measured 15% faster).
We should do this for GHASH too, but for now just do it for POLYVAL.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq)
instruction with immediate 0x96. This approach, vs. the alternative of
two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS
code, since it reduces the instruction count and is faster on some CPUs.
Make blake2s_compress_avx512() take advantage of it too.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Just before returning, blake2s_compress_ssse3() and
blake2s_compress_avx512() store updated values to the 'h', 't', and 'f'
fields of struct blake2s_ctx. But 'f' is always unchanged (which is
correct; only the C code changes it). So, there's no need to write to
'f'. Use 64-bit stores (movq and vmovq) instead of 128-bit stores
(movdqu and vmovdqu) so that only 't' is written.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Various cleanups for readability. No change to the generated code:
- Add some comments
- Add #defines for arguments
- Rename some labels
- Use decimal constants instead of hex where it makes sense.
(The pshufd immediates intentionally remain as hex.)
- Add blank lines when there's a logical break
The round loop still could use some work, but this is at least a start.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Following the usual practice, prefix the names of the data labels with
".L" so that the assembler treats them as truly local. This more
clearly expresses the intent and is less error-prone.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Since blake2s_compress() is always passed nblocks != 0, remove the
unnecessary check for nblocks == 0 from blake2s_compress_ssse3().
Note that this makes it consistent with blake2s_compress_avx512() in the
same file as well as the arm32 blake2s_compress().
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
In the C code, the 'inc' argument to the assembly functions
blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with
type u32, matching blake2s_compress(). The assembly code then reads it
from the 64-bit %rcx. However, the ABI doesn't guarantee zero-extension
to 64 bits, nor do gcc or clang guarantee it. Therefore, fix these
functions to read this argument from the 32-bit %ecx.
In theory, this bug could have caused the wrong 'inc' value to be used,
causing incorrect BLAKE2s hashes. In practice, probably not: I've fixed
essentially this same bug in many other assembly files too, but there's
never been a real report of it having caused a problem. In x86_64, all
writes to 32-bit registers are zero-extended to 64 bits. That results
in zero-extension in nearly all situations. I've only been able to
demonstrate a lack of zero-extension with a somewhat contrived example
involving truncation, e.g. when the C code has a u64 variable holding
0x1234567800000040 and passes it as a u32 expecting it to be truncated
to 0x40 (64). But that's not what the real code does, of course.
Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Remove self-references to filenames from assembly files in
lib/crypto/arm/ and lib/crypto/arm64/. This follows the recommended
practice and eliminates an outdated reference to sha2-ce-core.S.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102014809.170713-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Fix the indices in some comments in blake2s-core.S.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102021553.176587-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Some z/Architecture processors can compute a SHA-3 digest in a single
instruction. arch/s390/crypto/ already uses this capability to optimize
the SHA-3 crypto_shash algorithms.
Use this capability to implement the sha3_224(), sha3_256(), sha3_384(),
and sha3_512() library functions too.
SHA3-256 benchmark results provided by Harald Freudenberger
(https://lore.kernel.org/r/4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com/)
on a z/Architecture machine with "facility 86" (MSA level 12):
Length (bytes) Before (MB/s) After (MB/s)
============== ============= ============
16 212 225
64 820 915
256 1850 3350
1024 5400 8300
4096 11200 11300
Note: the original data from Harald was given in the form of a graph for
each length, showing the distribution of throughputs from 500 runs. I
guesstimated the peak of each one.
Harald also reported that the generic SHA-3 code was at most 259 MB/s
(https://lore.kernel.org/r/c39f6b6c110def0095e5da5becc12085@linux.ibm.com/).
So as expected, the earlier commit that optimized sha3_absorb_blocks()
and sha3_keccakf() is the more important one; it optimized the Keccak
permutation which is the most performance-critical part of SHA-3.
Still, this additional commit does notably improve performance further
on some lengths.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20251026055032.1413733-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add support for architecture-specific overrides of sha3_224(),
sha3_256(), sha3_384(), and sha3_512(). This will be used to implement
these functions more efficiently on s390 than is possible via the usual
init + update + final flow.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20251026055032.1413733-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Implement sha3_absorb_blocks() and sha3_keccakf() using the hardware-
accelerated SHA-3 support in Message-Security-Assist Extension 6.
This accelerates the SHA3-224, SHA3-256, SHA3-384, SHA3-512, and
SHAKE256 library functions.
Note that arch/s390/crypto/ already has SHA-3 code that uses this
extension, but it is exposed only via crypto_shash. This commit brings
the same acceleration to the SHA-3 library. The arch/s390/crypto/
version will become redundant and be removed in later changes.
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Instead of exposing the arm64-optimized SHA-3 code via arm64-specific
crypto_shash algorithms, instead just implement the sha3_absorb_blocks()
and sha3_keccakf() library functions. This is much simpler, it makes
the SHA-3 library functions be arm64-optimized, and it fixes the
longstanding issue where the arm64-optimized SHA-3 code was disabled by
default. SHA-3 still remains available through crypto_shash, but
individual architectures no longer need to handle it.
Note: to see the diff from arch/arm64/crypto/sha3-ce-glue.c to
lib/crypto/arm64/sha3.h, view this commit with 'git show -M10'.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Since the SHA-3 algorithms are FIPS-approved, add the boot-time
self-test which is apparently required. This closely follows the
corresponding SHA-1, SHA-256, and SHA-512 tests.
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
In crypto/sha3_generic.c, the keccakf() function calls keccakf_round()
to do four of Keccak-f's five step mappings. However, it does not do
the Iota step mapping - presumably because that is dependent on round
number, whereas Theta, Rho, Pi and Chi are not.
Note that the keccakf_round() function needs to be explicitly
non-inlined on certain architectures as gcc's produced output will (or
used to) use over 1KiB of stack space if inlined.
Now, this code was copied more or less verbatim into lib/crypto/sha3.c,
so that has the same aesthetic issue. Fix this there by passing the
round number into sha3_keccakf_one_round_generic() and doing the Iota
step mapping there.
crypto/sha3_generic.c is left untouched as that will be converted to use
lib/crypto/sha3.c at some point.
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add SHA-3 support to lib/crypto/. All six algorithms in the SHA-3
family are supported: four digests (SHA3-224, SHA3-256, SHA3-384, and
SHA3-512) and two extendable-output functions (SHAKE128 and SHAKE256).
The SHAKE algorithms will be required for ML-DSA.
[EB: simplified the API to use fewer types and functions, fixed bug that
sometimes caused incorrect SHAKE output, cleaned up the
documentation, dropped an ad-hoc test that was inconsistent with
the rest of lib/crypto/, and many other cleanups]
Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Commit 2f13daee2a72 ("lib/crypto/curve25519-hacl64: Disable KASAN with
clang-17 and older") inadvertently disabled KASAN in curve25519-hacl64.o
for GCC unconditionally because clang-min-version will always evaluate
to nothing for GCC. Add a check for CONFIG_CC_IS_CLANG to avoid applying
the workaround for GCC, which is only needed for clang-17 and older.
Cc: stable@vger.kernel.org
Fixes: 2f13daee2a72 ("lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251103-curve25519-hacl64-fix-kasan-workaround-v2-1-ab581cbd8035@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Migrate the arm-optimized BLAKE2b code from arch/arm/crypto/ to
lib/crypto/arm/. This makes the BLAKE2b library able to use it, and it
also simplifies the code because it's easier to integrate with the
library than crypto_shash.
This temporarily makes the arm-optimized BLAKE2b code unavailable via
crypto_shash. A later commit reimplements the blake2b-* crypto_shash
algorithms on top of the BLAKE2b library API, making it available again.
Note that as per the lib/crypto/ convention, the optimized code is now
enabled by default. So, this also fixes the longstanding issue where
the optimized BLAKE2b code was not enabled by default.
To see the diff from arch/arm/crypto/blake2b-neon-glue.c to
lib/crypto/arm/blake2b.h, view this commit with 'git show -M10'.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a library API for BLAKE2b, closely modeled after the BLAKE2s API.
This will allow in-kernel users such as btrfs to use BLAKE2b without
going through the generic crypto layer. In addition, as usual the
BLAKE2b crypto_shash algorithms will be reimplemented on top of this.
Note: to create lib/crypto/blake2b.c I made a copy of
lib/crypto/blake2s.c and made the updates from BLAKE2s => BLAKE2b. This
way, the BLAKE2s and BLAKE2b code is kept consistent. Therefore, it
borrows the SPDX-License-Identifier and Copyright from
lib/crypto/blake2s.c rather than crypto/blake2b_generic.c.
The library API uses 'struct blake2b_ctx', consistent with other
lib/crypto/ APIs. The existing 'struct blake2b_state' will be removed
once the blake2b crypto_shash algorithms are updated to stop using it.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
A couple more small cleanups to the BLAKE2s code before these things get
propagated into the BLAKE2b code:
- Drop 'const' from some non-pointer function parameters. It was a bit
excessive and not conventional.
- Rename 'block' argument of blake2s_compress*() to 'data'. This is for
consistency with the SHA-* code, and also to avoid the implication
that it points to a singular "block".
No functional changes.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
For consistency with the SHA-1, SHA-2, SHA-3 (in development), and MD5
library APIs, rename blake2s_state to blake2s_ctx.
As a refresher, the ctx name:
- Is a bit shorter.
- Avoids confusion with the compression function state, which is also
often called the state (but is just part of the full context).
- Is consistent with OpenSSL.
Not a big deal, of course. But consistency is nice. With a BLAKE2b
library API about to be added, this is a convenient time to update this.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Reorder the parameters of blake2s() from (out, in, key, outlen, inlen,
keylen) to (key, keylen, in, inlen, out, outlen).
This aligns BLAKE2s with the common conventions of pairing buffers and
their lengths, and having outputs follow inputs. This is widely used
elsewhere in lib/crypto/ and crypto/, and even elsewhere in the BLAKE2s
code itself such as blake2s_init_key() and blake2s_final(). So
blake2s() was a bit of an exception.
Notably, this results in the same order as hmac_*_usingrawkey().
Note that since the type signature changed, it's not possible for a
blake2s() call site to be silently missed.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add FIPS cryptographic algorithm self-tests for all SHA-1 and SHA-2
algorithms. Following the "Implementation Guidance for FIPS 140-3"
document, to achieve this it's sufficient to just test a single test
vector for each of HMAC-SHA1, HMAC-SHA256, and HMAC-SHA512.
Just run these tests in the initcalls, following the example of e.g.
crypto/kdf_sp800108.c. Note that this should meet the FIPS self-test
requirement even in the built-in case, given that the initcalls run
before userspace, storage, network, etc. are accessible.
This does not fix a regression, seeing as lib/ has had SHA-1 support
since 2005 and SHA-256 support since 2018. Neither ever had FIPS
self-tests. Moreover, fips=1 support has always been an unfinished
feature upstream. However, with lib/ now being used more widely, it's
now seeing more scrutiny and people seem to want these now [1][2].
[1] https://lore.kernel.org/r/3226361.1758126043@warthog.procyon.org.uk/
[2] https://lore.kernel.org/r/f31dbb22-0add-481c-aee0-e337a7731f8e@oracle.com/
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251011001047.51886-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Restore the dependency of the architecture-optimized Poly1305 code on
!KMSAN. It was dropped by commit b646b782e522 ("lib/crypto: poly1305:
Consolidate into single module").
Unlike the other hash algorithms in lib/crypto/ (e.g., SHA-512), the way
the architecture-optimized Poly1305 code is integrated results in
assembly code initializing memory, for several different architectures.
Thus, it generates false positive KMSAN warnings. These could be
suppressed with kmsan_unpoison_memory(), but it would be needed in quite
a few places. For now let's just restore the dependency on !KMSAN.
Note: this should have been caught by running poly1305_kunit with
CONFIG_KMSAN=y, which I did. However, due to an unrelated KMSAN bug
(https://lore.kernel.org/r/20251022030213.GA35717@sol/), KMSAN currently
isn't working reliably. Thus, the warning wasn't noticed until later.
Fixes: b646b782e522 ("lib/crypto: poly1305: Consolidate into single module")
Reported-by: syzbot+01fcd39a0d90cdb0e3df@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/68f6a48f.050a0220.91a22.0452.GAE@google.com/
Reported-by: Pei Xiao <xiaopei01@kylinos.cn>
Closes: https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f623c7da0.1761026343.git.xiaopei01@kylinos.cn/
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251022033405.64761-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Pull interleaved SHA-256 hashing support from Eric Biggers:
"Optimize fsverity with 2-way interleaved hashing
Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and
make fsverity use it for faster file data verification. This improves
fsverity performance on many x86_64 and arm64 processors.
Later, I plan to make dm-verity use this too"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: Use 2-way interleaved SHA-256 hashing when supported
fsverity: Remove inode parameter from fsverity_hash_block()
lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()
lib/crypto: x86/sha256: Add support for 2-way interleaved hashing
lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing
lib/crypto: sha256: Add support for 2-way interleaved hashing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
- Add a RISC-V optimized implementation of Poly1305. This code was
written by Andy Polyakov and contributed by Zhihang Shao.
- Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5.
Yes, it's still the 90s, and several kernel subsystems are still
using MD5 for legacy use cases. As long as that remains the case,
it's helpful to clean it up in the same way as I've been doing for
other algorithms.
Later, I plan to convert most of these users of MD5 to use the new
MD5 library API instead of the generic crypto API.
- Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and
Curve25519 code.
Consolidate these into one module per algorithm, and centralize the
configuration and build process. This is the same reorganization that
has already been successful for SHA-1 and SHA-2.
- Remove the unused crypto_kpp API for Curve25519.
- Migrate the BLAKE2s and Curve25519 self-tests to KUnit.
- Always enable the architecture-optimized BLAKE2s code.
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits)
crypto: md5 - Implement export_core() and import_core()
wireguard: kconfig: simplify crypto kconfig selections
lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS
lib/crypto: curve25519: Consolidate into single module
lib/crypto: curve25519: Move a couple functions out-of-line
lib/crypto: tests: Add Curve25519 benchmark
lib/crypto: tests: Migrate Curve25519 self-test to KUnit
crypto: curve25519 - Remove unused kpp support
crypto: testmgr - Remove curve25519 kpp tests
crypto: x86/curve25519 - Remove unused kpp support
crypto: powerpc/curve25519 - Remove unused kpp support
crypto: arm/curve25519 - Remove unused kpp support
crypto: hisilicon/hpre - Remove unused curve25519 kpp support
lib/crypto: tests: Add KUnit tests for BLAKE2s
lib/crypto: blake2s: Consolidate into single C translation unit
lib/crypto: blake2s: Move generic code into blake2s.c
lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
lib/crypto: blake2s: Remove obsolete self-test
lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2
lib/crypto: chacha: Consolidate into single module
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Update crc_kunit to test the CRC functions in softirq and hardirq
contexts, similar to what the lib/crypto/ KUnit tests do. Move the
helper function needed to do this into a common header.
This is useful mainly to test fallback code paths for when
FPU/SIMD/vector registers are unusable"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
Documentation/staging: Fix typo and incorrect citation in crc32.rst
lib/crc: Drop inline from all *_mod_init_arch() functions
lib/crc: Use underlying functions instead of crypto_simd_usable()
lib/crc: crc_kunit: Test CRC computation in interrupt contexts
kunit, lib/crypto: Move run_irq_test() to common header
|
|
Update sha256_kunit to include test cases and a benchmark for the new
sha256_finup_2x() function.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add an implementation of sha256_finup_2x_arch() for x86_64. It
interleaves the computation of two SHA-256 hashes using the x86 SHA-NI
instructions. dm-verity and fs-verity will take advantage of this for
greatly improved performance on capable CPUs.
This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:
Intel Ice Lake (server): 4%
Intel Sapphire Rapids: 38%
Intel Emerald Rapids: 38%
AMD Zen 1 (Threadripper 1950X): 84%
AMD Zen 4 (EPYC 9B14): 98%
AMD Zen 5 (Ryzen 9 9950X): 64%
For now, this seems to benefit AMD more than Intel. This seems to be
because current AMD CPUs support concurrent execution of the SHA-NI
instructions, but unfortunately current Intel CPUs don't, except for the
sha256msg2 instruction. Hopefully future Intel CPUs will support SHA-NI
on more execution ports. Zen 1 supports 2 concurrent sha256rnds2, and
Zen 4 supports 4 concurrent sha256rnds2, which suggests that even better
performance may be achievable on Zen 4 by interleaving more than two
hashes. However, doing so poses a number of trade-offs, and furthermore
Zen 5 goes back to supporting "only" 2 concurrent sha256rnds2.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add an implementation of sha256_finup_2x_arch() for arm64. It
interleaves the computation of two SHA-256 hashes using the ARMv8
SHA-256 instructions. dm-verity and fs-verity will take advantage of
this for greatly improved performance on capable CPUs.
This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:
ARM Cortex-X1: 70%
ARM Cortex-X3: 68%
ARM Cortex-A76: 65%
ARM Cortex-A715: 43%
ARM Cortex-A510: 25%
ARM Cortex-A55: 8%
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Many arm64 and x86_64 CPUs can compute two SHA-256 hashes in nearly the
same speed as one, if the instructions are interleaved. This is because
SHA-256 is serialized block-by-block, and two interleaved hashes take
much better advantage of the CPU's instruction-level parallelism.
Meanwhile, a very common use case for SHA-256 hashing in the Linux
kernel is dm-verity and fs-verity. Both use a Merkle tree that has a
fixed block size, usually 4096 bytes with an empty or 32-byte salt
prepended. Usually, many blocks need to be hashed at a time. This is
an ideal scenario for 2-way interleaved hashing.
To enable this optimization, add a new function sha256_finup_2x() to the
SHA-256 library API. It computes the hash of two equal-length messages,
starting from a common initial context.
For now it always falls back to sequential processing. Later patches
will wire up arm64 and x86_64 optimized implementations.
Note that the interleaving factor could in principle be higher than 2x.
However, that runs into many practical difficulties and CPU throughput
limitations. Thus, both the implementations I'm adding are 2x. In the
interest of using the simplest solution, the API matches that.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Now that the Curve25519 library has been disentangled from CRYPTO,
adding CRYPTO_SELFTESTS as a default value of
CRYPTO_LIB_CURVE25519_KUNIT_TEST no longer causes a recursive kconfig
dependency. Do this, which makes this option consistent with the other
crypto KUnit test options in the same file.
Link: https://lore.kernel.org/r/20250906213523.84915-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Reorganize the Curve25519 library code:
- Build a single libcurve25519 module, instead of up to three modules:
libcurve25519, libcurve25519-generic, and an arch-specific module.
- Move the arch-specific Curve25519 code from arch/$(SRCARCH)/crypto/ to
lib/crypto/$(SRCARCH)/. Centralize the build rules into
lib/crypto/Makefile and lib/crypto/Kconfig.
- Include the arch-specific code directly in lib/crypto/curve25519.c via
a header, rather than using a separate .c file.
- Eliminate the entanglement with CRYPTO. CRYPTO_LIB_CURVE25519 no
longer selects CRYPTO, and the arch-specific Curve25519 code no longer
depends on CRYPTO.
This brings Curve25519 in line with the latest conventions for
lib/crypto/, used by other algorithms. The exception is that I kept the
generic code in separate translation units for now. (Some of the
function names collide between the x86 and generic Curve25519 code. And
the Curve25519 functions are very long anyway, so inlining doesn't
matter as much for Curve25519 as it does for some other algorithms.)
Link: https://lore.kernel.org/r/20250906213523.84915-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Move curve25519() and curve25519_generate_public() from curve25519.h to
curve25519.c. There's no good reason for them to be inline.
Link: https://lore.kernel.org/r/20250906213523.84915-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a benchmark to curve25519_kunit. This brings it in line with the
other crypto KUnit tests and provides an easy way to measure
performance.
Link: https://lore.kernel.org/r/20250906213523.84915-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Move the Curve25519 test from an ad-hoc self-test to a KUnit test.
Generally keep the same test logic for now, just translated to KUnit.
There's one exception, which is that I dropped the incomplete test of
curve25519_generic(). The approach I'm taking to cover the different
implementations with the KUnit tests is to just rely on booting kernels
in QEMU with different '-cpu' options, rather than try to make the tests
(incompletely) test multiple implementations on one CPU. This way, both
the test and the library API are simpler.
This commit makes the file lib/crypto/curve25519.c no longer needed, as
its only purpose was to call the self-test. However, keep it for now,
since a later commit will add code to it again.
Temporarily omit the default value of CRYPTO_SELFTESTS that the other
lib/crypto/ KUnit tests have. It would cause a recursive kconfig
dependency, since the Curve25519 code is still entangled with CRYPTO. A
later commit will fix that.
Link: https://lore.kernel.org/r/20250906213523.84915-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a KUnit test suite for BLAKE2s. Most of the core test logic is in
the previously-added hash-test-template.h. This commit just adds the
actual KUnit suite, commits the generated test vectors to the tree so
that gen-hash-testvecs.py won't have to be run at build time, and adds a
few BLAKE2s-specific test cases.
This is the replacement for blake2s-selftest, which an earlier commit
removed. Improvements over blake2s-selftest include integration with
KUnit, more comprehensive test cases, and support for benchmarking.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
As was done with the other algorithms, reorganize the BLAKE2s code so
that the generic implementation and the arch-specific "glue" code is
consolidated into a single translation unit, so that the compiler will
inline the functions and automatically decide whether to include the
generic code in the resulting binary or not.
Similarly, also consolidate the build rules into
lib/crypto/{Makefile,Kconfig}. This removes the last uses of
lib/crypto/{arm,x86}/{Makefile,Kconfig}, so remove those too.
Don't keep the !KMSAN dependency. It was needed only for other
algorithms such as ChaCha that initialize memory from assembly code.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Move blake2s_compress_generic() from blake2s-generic.c to blake2s.c.
For now it's still guarded by CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC, but
this prepares for changing it to a 'static __maybe_unused' function and
just using the compiler to automatically decide its inclusion.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
When support for a crypto algorithm is enabled, the arch-optimized
implementation of that algorithm should be enabled too. We've learned
this the hard way many times over the years: people regularly forget to
enable the arch-optimized implementations of the crypto algorithms,
resulting in significant performance being left on the table.
Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
uses it. Therefore, the arch-optimized BLAKE2s code, which exists for
ARM and x86_64, should be always enabled too. Let's do that.
Note that the effect on kernel image size is very small and should not
be a concern. On ARM, enabling CRYPTO_BLAKE2S_ARM actually *shrinks*
the kernel size by about 1200 bytes, since the ARM-optimized
blake2s_compress() completely replaces the generic blake2s_compress().
On x86_64, enabling CRYPTO_BLAKE2S_X86 increases the kernel size by
about 1400 bytes, as the generic blake2s_compress() is still included as
a fallback; however, for context, that is only about a quarter the size
of the generic blake2s_compress(). The x86_64 optimized BLAKE2s code
uses much less icache at runtime than the generic code.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Remove the original BLAKE2s self-test, since it will be superseded by
blake2s_kunit.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Save 480 bytes of .rodata by replacing the .long constants with .bytes,
and using the vpmovzxbd instruction to expand them.
Also update the code to do the loads before incrementing %rax rather
than after. This avoids the need for the first load to use an offset.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Consolidate the ChaCha code into a single module (excluding
chacha-block-generic.c which remains always built-in for random.c),
similar to various other algorithms:
- Each arch now provides a header file lib/crypto/$(SRCARCH)/chacha.h,
replacing lib/crypto/$(SRCARCH)/chacha*.c. The header defines
chacha_crypt_arch() and hchacha_block_arch(). It is included by
lib/crypto/chacha.c, and thus the code gets built into the single
libchacha module, with improved inlining in some cases.
- Whether arch-optimized ChaCha is buildable is now controlled centrally
by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig.
The conditions for enabling it remain the same as before, and it
remains enabled by default.
- Any additional arch-specific translation units for the optimized
ChaCha code, such as assembly files, are now compiled by
lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile.
This removes the last use for the Makefile and Kconfig files in the
arm64, mips, powerpc, riscv, and s390 subdirectories of lib/crypto/. So
also remove those files and the references to them.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Rename libchacha.c to chacha.c to make the naming consistent with other
algorithms and allow additional source files to be added to the
libchacha module. This file currently contains chacha_crypt_generic(),
but it will soon be updated to contain chacha_crypt().
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|