diff options
| author | Eric Biggers <ebiggers@kernel.org> | 2025-10-01 19:31:10 -0700 |
|---|---|---|
| committer | Eric Biggers <ebiggers@kernel.org> | 2025-10-26 20:37:40 -0700 |
| commit | fae3b96ba6015c35a973da09bf313d90e4e4bb94 (patch) | |
| tree | 603361d615b5abe6c17a2adc2bb5f9b187c5035e /drivers/usb/cdns3/cdns3-pci-wrap.c | |
| parent | dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa (diff) | |
crypto: x86/aes-gcm - add VAES+AVX2 optimized code
Add an implementation of AES-GCM that uses 256-bit vectors and the
following CPU features: Vector AES (VAES), Vector Carryless
Multiplication (VPCLMULQDQ), and AVX2.
It doesn't require AVX512. So unlike the existing VAES+AVX512 code, it
works on CPUs that support VAES but not AVX512, specifically:
- AMD Zen 3, both client and server
- Intel Alder Lake, Raptor Lake, Meteor Lake, Arrow Lake, and Lunar
Lake. (These are client CPUs.)
- Intel Sierra Forest. (This is a server CPU.)
On these CPUs, this VAES+AVX2 code is much faster than the existing
AES-NI code. The AES-NI code uses only 128-bit vectors.
These CPUs are widely deployed, making VAES+AVX2 code worthwhile even
though hopefully future x86_64 CPUs will uniformly support AVX512.
This implementation will also serve as the fallback 256-bit
implementation for older Intel CPUs (Ice Lake and Tiger Lake) that
support AVX512 but downclock too eagerly when 512-bit vectors are used.
Currently, the VAES+AVX10/256 implementation serves that purpose. A
later commit will remove that and just use the VAES+AVX2 one. (Note
that AES-XTS and AES-CTR already successfully use this approach.)
I originally wrote this AES-GCM implementation for BoringSSL. It's been
in BoringSSL for a while now, including in Chromium. This is a port of
it to the Linux kernel. The main changes in the Linux version include:
- Port from "perlasm" to a standard .S file.
- Align all assembly functions with what aesni-intel_glue.c expects,
including adding support for lengths not a multiple of 16 bytes.
- Rework the en/decryption of the final 1 to 127 bytes.
This commit increases AES-256-GCM throughput on AMD Milan (Zen 3) by up
to 74%, as shown by the following tables:
Table 1: AES-256-GCM encryption throughput change,
CPU vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3) | 67% | 59% | 61% | 39% | 23% | 27% |
| 300 | 200 | 64 | 63 | 16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3) | 14% | 12% | 7% | 7% | 0% |
Table 2: AES-256-GCM decryption throughput change,
CPU vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3) | 74% | 65% | 65% | 44% | 23% | 26% |
| 300 | 200 | 64 | 63 | 16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3) | 12% | 11% | 3% | 2% | -3% |
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251002023117.37504-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Diffstat (limited to 'drivers/usb/cdns3/cdns3-pci-wrap.c')
0 files changed, 0 insertions, 0 deletions
