crypto: x86/aes-gcm - add VAES+AVX2 optimized code - user/sven/linux.git

diff options

author	Eric Biggers <ebiggers@kernel.org>	2025-10-01 19:31:10 -0700
committer	Eric Biggers <ebiggers@kernel.org>	2025-10-26 20:37:40 -0700
commit	fae3b96ba6015c35a973da09bf313d90e4e4bb94 (patch)
tree	603361d615b5abe6c17a2adc2bb5f9b187c5035e /drivers/usb/cdns3/cdns3-pci-wrap.c
parent	dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa (diff)

crypto: x86/aes-gcm - add VAES+AVX2 optimized code

Add an implementation of AES-GCM that uses 256-bit vectors and the following CPU features: Vector AES (VAES), Vector Carryless Multiplication (VPCLMULQDQ), and AVX2. It doesn't require AVX512. So unlike the existing VAES+AVX512 code, it works on CPUs that support VAES but not AVX512, specifically: - AMD Zen 3, both client and server - Intel Alder Lake, Raptor Lake, Meteor Lake, Arrow Lake, and Lunar Lake. (These are client CPUs.) - Intel Sierra Forest. (This is a server CPU.) On these CPUs, this VAES+AVX2 code is much faster than the existing AES-NI code. The AES-NI code uses only 128-bit vectors. These CPUs are widely deployed, making VAES+AVX2 code worthwhile even though hopefully future x86_64 CPUs will uniformly support AVX512. This implementation will also serve as the fallback 256-bit implementation for older Intel CPUs (Ice Lake and Tiger Lake) that support AVX512 but downclock too eagerly when 512-bit vectors are used. Currently, the VAES+AVX10/256 implementation serves that purpose. A later commit will remove that and just use the VAES+AVX2 one. (Note that AES-XTS and AES-CTR already successfully use this approach.) I originally wrote this AES-GCM implementation for BoringSSL. It's been in BoringSSL for a while now, including in Chromium. This is a port of it to the Linux kernel. The main changes in the Linux version include: - Port from "perlasm" to a standard .S file. - Align all assembly functions with what aesni-intel_glue.c expects, including adding support for lengths not a multiple of 16 bytes. - Rework the en/decryption of the final 1 to 127 bytes. This commit increases AES-256-GCM throughput on AMD Milan (Zen 3) by up to 74%, as shown by the following tables: Table 1: AES-256-GCM encryption throughput change, CPU vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ----------------------+-------+-------+-------+-------+-------+-------+ AMD Milan (Zen 3) | 67% | 59% | 61% | 39% | 23% | 27% | | 300 | 200 | 64 | 63 | 16 | ----------------------+-------+-------+-------+-------+-------+ AMD Milan (Zen 3) | 14% | 12% | 7% | 7% | 0% | Table 2: AES-256-GCM decryption throughput change, CPU vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ----------------------+-------+-------+-------+-------+-------+-------+ AMD Milan (Zen 3) | 74% | 65% | 65% | 44% | 23% | 26% | | 300 | 200 | 64 | 63 | 16 | ----------------------+-------+-------+-------+-------+-------+ AMD Milan (Zen 3) | 12% | 11% | 3% | 2% | -3% | Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251002023117.37504-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>

Diffstat (limited to 'drivers/usb/cdns3/cdns3-pci-wrap.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: