diff options
| author | Eric Biggers <ebiggers@kernel.org> | 2025-10-01 19:31:17 -0700 |
|---|---|---|
| committer | Eric Biggers <ebiggers@kernel.org> | 2025-10-26 20:37:41 -0700 |
| commit | 05794985b190e0592131b323d37d7cf506711f1f (patch) | |
| tree | 2e945b52fd11e3e93aba1e4a5e299609d983544d /kernel/locking/rtmutex_api.c | |
| parent | 5ab1ff2e0f03ab64cc1832999146c0dcbf9db966 (diff) | |
crypto: x86/aes-gcm - optimize long AAD processing with AVX512
Improve the performance of aes_gcm_aad_update_vaes_avx512() on large AAD
(additional authenticated data) lengths by 4-8 times by making it use up
to 512-bit vectors and a 4-vector-wide loop. Previously, it used only
256-bit vectors and a 1-vector-wide loop.
Originally, I assumed that the case of large AADLEN was unimportant.
Later, when reviewing the users of BoringSSL's AES-GCM code, I found
that some callers use BoringSSL's AES-GCM API to just compute GMAC,
authenticating lots of data but not en/decrypting any. Thus, I included
a similar optimization in the BoringSSL port of this code. I believe
it's wise to include this optimization in the kernel port too for
similar reasons, and to align it more closely with the BoringSSL port.
Another reason this function originally used 256-bit vectors was so that
separate *_avx10_256 and *_avx10_512 versions of it wouldn't be needed.
However, that's no longer applicable.
To avoid a slight performance regression in the common case of AADLEN <=
16, also add a fast path for that case which uses 128-bit vectors. In
fact, this case actually gets slightly faster too, since it saves a
couple instructions over the original 256-bit code.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Diffstat (limited to 'kernel/locking/rtmutex_api.c')
0 files changed, 0 insertions, 0 deletions
