crypto: x86/aes-gcm - optimize long AAD processing with AVX512 - user/sven/linux.git

diff options

author	Eric Biggers <ebiggers@kernel.org>	2025-10-01 19:31:17 -0700
committer	Eric Biggers <ebiggers@kernel.org>	2025-10-26 20:37:41 -0700
commit	05794985b190e0592131b323d37d7cf506711f1f (patch)
tree	2e945b52fd11e3e93aba1e4a5e299609d983544d /kernel/locking/rtmutex_api.c
parent	5ab1ff2e0f03ab64cc1832999146c0dcbf9db966 (diff)

crypto: x86/aes-gcm - optimize long AAD processing with AVX512

Improve the performance of aes_gcm_aad_update_vaes_avx512() on large AAD (additional authenticated data) lengths by 4-8 times by making it use up to 512-bit vectors and a 4-vector-wide loop. Previously, it used only 256-bit vectors and a 1-vector-wide loop. Originally, I assumed that the case of large AADLEN was unimportant. Later, when reviewing the users of BoringSSL's AES-GCM code, I found that some callers use BoringSSL's AES-GCM API to just compute GMAC, authenticating lots of data but not en/decrypting any. Thus, I included a similar optimization in the BoringSSL port of this code. I believe it's wise to include this optimization in the kernel port too for similar reasons, and to align it more closely with the BoringSSL port. Another reason this function originally used 256-bit vectors was so that separate *_avx10_256 and *_avx10_512 versions of it wouldn't be needed. However, that's no longer applicable. To avoid a slight performance regression in the common case of AADLEN <= 16, also add a fast path for that case which uses 128-bit vectors. In fact, this case actually gets slightly faster too, since it saves a couple instructions over the original 256-bit code. Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251002023117.37504-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>

Diffstat (limited to 'kernel/locking/rtmutex_api.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: