| Age | Commit message (Collapse) | Author |
|
Remove the now unused eMAG MIDR check and unused entries from cpu_list[].
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
As a cleanup remove the eMAG ifunc for memset.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
As a cleanup remove the eMAG ifunc for memchr.
Reviewed-by: JiangNing OS<jiangning@amperemail.onmicrosoft.com>
|
|
An interrupted RPC call can return EINTR whilst the RPC is still in
progress on the server. Some RPC calls have permanent consequences
(eg. a write() to append data to a file) but a caller seeing EINTR
should expect that no state has changed. The signal thread now stores
the server's reply (which it already waited for) as the interrupted
thread's reply.
Message-ID: <20260401194948.90428-3-mike@weatherwax.co.uk>
|
|
MSG_EXAMINE has been broadened to allow the signal thread (for
example) to access additional arguments that are passed to
interruptible RPCs in other threads. All architecture specific
variants of intr-msg.h now comply with the revised interface and the
single user of MSG_EXAMINE (report-wait.c) adjusted accordingly.
Message-ID: <20260401194948.90428-2-mike@weatherwax.co.uk>
|
|
Update the hugetlb tunable default in elf/dl-tunables.c so it is shown as 1
with /lib/ld-linux-aarch64.so.1 --list-tunables.
Move the intitialization of thp_mode/thp_pagesize to do_set_hugetlb() and
avoid accessing /sys/kernel/mm if DEFAULT_THP_PAGESIZE > 0. Switch off THP if
glibc.malloc.hugetlb=0 is used - this behaves as if DEFAULT_THP_PAGESIZE==0.
Fix the --list-tunables testcase.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The current implementation of ftw relies on recursion to traverse
directories (ftw_dir calls process_entry, which calls ftw_dir). In deep
directory trees, this could lead to a stack overflow (as demonstrated by
the new tst-nftw-bz33882.c test).
This patch refactors ftw to use an explicit, heap-allocated stack to
manage directory traversal:
* The 'struct ftw_frame' encapsulates the state of a single directory
level (directory stream, stat buffer, previous base offset, and
current state).
* The ftw_dir is rewritten to use a loop instead of recursion and
an iterative loop to enable immediate state transitions without
function call overhead.
The patch also cleans up some unused definitions and assumptions (e.g.,
free-clobbering errno) and fixes a UB when handling the ftw callback.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH e756933f improved the error bound in the fast path for
x_0 <= x < 1/4, along with a formal proof [1].
Checked on x86_64-linux-gnu, i686-linux-gnu, aaarch64-linux-gnu,
and arm-linux-gnueabihf.
[1] https://core-math.gitlabpages.inria.fr/sinh.pdf
|
|
not defined
I didn't realize it can be undefined at all instead of simply
unsupported :(.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
|
|
Introduced a synthetic architecture preference flag (Prefer_EVEX512)
and enabled it for AMD Zen5 (CPUID Family 0x1A) when AVX-512 is supported.
This flag modifies IFUNC dispatch to prefer 512-bit EVEX variants over
256-bit EVEX variants for string and memory functions on Zen5 processors,
leveraging their native 512-bit execution units for improved throughput.
When Prefer_EVEX512 is set, the dispatcher selects evex512 implementations;
otherwise, it falls back to evex (256-bit) variants.
The implementation updates the IFUNC selection logic in ifunc-avx2.h and
ifunc-evex.h to check for the Prefer_EVEX512 flag before dispatching to
EVEX512 implementations. This change affects six string/memory functions:
- strchr
- strlen
- strnlen
- strrchr
- strchrnul
- memchr
Benchmarks conducted on AMD Zen5 hardware demonstrate significant
performance improvements across all affected functions:
Function Baseline Patched Avg Avg Avg Max
Variant Variant Baseline Patched Change Improve
(ns) (ns) % %
------------+----------+----------+-----------+----------+--------+--------
STRCHR evex evex512 16.408 12.293 25.08% 37.69%
STRLEN evex evex512 16.862 11.436 32.18% 56.74%
STRNLEN evex evex512 18.493 11.762 36.40% 64.40%
STRRCHR evex evex512 15.154 10.874 28.24% 44.38%
STRCHRNUL evex evex512 16.464 12.605 23.44% 45.56%
MEMCHR evex evex512 9.984 8.268 17.19% 39.99%
Additionally, a tunable option (glibc.cpu.x86_cpu_features.preferred)
is provided to allow runtime control of the Prefer_EVEX512 flag for testing
and compatibility.
Reviewed-by: Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The new test from 19781c2221 triggers a failure on i686:
testing float (without inline functions)
Failure: lgamma (0x3.12be38p+120): errno set to 0, expected 34 (ERANGE)
Failure: lgamma_upward (0x3.12be38p+120): errno set to 0, expected 34 (ERANGE)
Use math_narrow_eval on the multiplication to force the expected
precision.
Checked on i686-linux-gnu.
|
|
This is similar to original CORE-MATH code and why the function
exists.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
and arm-linux-gnueabihf.
|
|
Add the FSF's disclaimer to bi_VU, C, gbm_IN, hif_FJ, sah_RU,
sm_WS, and to_TO which were created under copyright assignment
(not DCO).
This change ensures that all 352 localedata files have either
the FSF disclaimer or the related DCO text we are using
e.g. ab_GE.
Link: https://inbox.sourceware.org/libc-alpha/80426eb7-70cd-4178-8fda-51d590aa38d4@redhat.com/
Link: https://inbox.sourceware.org/libc-alpha/20130220215701.B263F2C0A7@topped-with-meat.com/
Link: https://inbox.sourceware.org/libc-alpha/87pmtq54hs.fsf@oldenburg.str.redhat.com/
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|
|
Update advisories with Fix-Commit information for 2.43.9000 and 2.44.
Update NEWS with advisory entries.
|
|
The processed hostname in getanswer_ptr should be correctly checked to
avoid invalid characters from being allowed, including shell
metacharacters. It is a security issue to fail to check the returned
hostname for validity.
A regression test is added for invalid metacharacters and other cases
of invalid or valid characters.
No regressions on x86_64-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Some distributions ban the /usr/bin/python path in their build
systems due to the ambiguity of whether it refers to Python 2 or
Python 3. Python 2 has been out of support for many years, and
glibc has required Python 3 at build time for a while. So it seems
safe to switch the remaining scripts over to /usr/bin/python3.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Add implies, abilist, c++-types and syscall files.
|
|
|
|
|
|
Move the loongarch64 implementation to sysdeps/loongarch/lp64/fpu.
|
|
|
|
The libc_feupdateenv_test macro is supposed to trap when the trap for a
previously held exception is enabled. But
libc_feupdateenv_test_loongarch wasn't doing it properly: the comment
claims "setting of the cause bits" would cause "the hardware to generate
the exception" but that's simply not true for the LoongArch movgr2fcsr
instruction.
To fix the issue, we need to call __feraiseexcept in case a held exception
is enabled to trap.
Reviewed-by: caiyinyu <caiyinyu@loongson.cn>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
|
|
tst-cancel31 fail on la32 qemu-system with a single-core
system sometimes.
IF the test and a infinite loop run on a same x86_64 core,
the test also fail sometimes.
taskset -c 0 make test t=nptl/tst-cancel31
taskset -c 0 ./a.out (a.out is a infinite loop)
After writeopener thread opens the file, it may switch to
main thread and find redundant files.
pthread_cancel and pthread_join writeopener thread
before support_descriptors_check.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
The answer section boundary was previously ignored, and the code in
getanswer_ptr would iterate past the last resource record, but not
beyond the end of the returned data. This could lead to subsequent data
being interpreted as answer records, thus violating the DNS
specification. Such resource records could be maliciously crafted and
hidden from other tooling, but processed by the glibc stub resolver and
acted upon by the application. While we trust the data returned by the
configured recursive resolvers, we should not trust its format and
should validate it as required. It is a security issue to incorrectly
process the DNS protocol.
A regression test is added for response section crossing.
No regressions on x86_64-linux-gnu.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|
|
Explain the security issue and set the context for the vulnerability to
help downstreams get a better understanding of the issue.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Explain the security issue and set the context for the vulnerability to
help downstreams get a better understanding of the issue.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|
|
Note that MPC 1.4.0 has moved from .tar.gz to .tar.xz distribution.
Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).
|
|
The comment explaining the reason to clear CAUSE does not make any
sense: it says the next "CTC" instruction would raise the FP exception
of which both the CAUSE and ENABLE bits are set, but LoongArch does not
have the CTC instruction. LoongArch has the movgr2fcsr instruction but
movgr2fcsr never raises any FP exception, different from the MIPS CTC
instruction.
So we don't really need to care CAUSE at all.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
|
|
This patch from Adhemerval sets up the ifunc redirections so that we
resolve memcpy to memcpy_generic in early startup. This avoids infinite
recursion for memcpy calls before the loader is fully initialized.
Tested-by: Jeff Law <jeffrey.law@oss.qualcomm.com>
|
|
Detect clang explicitly and apply compiler-specific version checks for
RVV support.
Signed-off-by: Zihong Yao <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
|
|
It syncs with CORE-MATH 9a75500ba1831 and 20d51f2ee.
Checked on aarch64-linux-gnu.
|
|
|
|
It removes some unnecessary corner-case checks and uses a slightly
different binary algorithm for the hard-case database binary search.
Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.
|
|
It adds a minor optimization on fast path.
Checked on aarch64-linux-gnu, arm-linux-gnueabihf,
powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu.
|
|
Enable adding known failures to allowed-failures.txt and ignore failures
in case they are in the list. In case the allowed-failures.txt does not
exist, all failures lead to a failed status as before.
When the file is present, failures of listed tests are ignored and reported
on stdout. If tests not in the allowed list fail, summarize-tests exits with
status 1 and reports the failing tests.
The expected format of allowed-failures.txt file is:
<test_name> # <comment>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
The libgcc implementations of __builtin_clzl/__builtin_ctzl may require
access to additional data that is not marked as hidden, which could
introduce additional GOT indirection and necessitate RELATIVE relocs.
And the RELATIVE reloc is an issue if the code is used during static-pie
startup before self-relocation (for instance, during an assert).
For this case, the ABI can add a string-bitops.h header that defines
HAVE_BITOPTS_WORKING to 0. A configure check for this issue is tricky
because it requires linking against the standard libraries, which
create many RELATIVE relocations and complicate filtering those that
might be created by the builtins.
The fallback is disabled by default, so no target is affected.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Remove the prefer_sve_ifuncs CPU feature since it was intended for older
kernels. Current distros all use modern Linux kernels with improved support
for SVE save/restore, making this check redundant.
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
|
|
First off, apologies for my misunderstanding on how madvise(MADV_HUGEPAGE)
works. I had the misconception that doing madvise(p, 1, MADV_HUGEPAGE) will set
VM_HUGEPAGE on the entire VMA - it does not, it will align the size to
PAGE_SIZE (4k) and then *split* the VMA. Only the first page-length of the
virtual space will VM_HUGEPAGE'd, the rest of it will stay the same.
The above is the semantics for all madvise() calls - which makes sense from a
UABI perspective. madvise() should do the proposed thing to only the length
(page-aligned) which it was asked to do, doing any more than that is not
something the user is expecting.
Commit 6e8f32d39a57 tries to optimize around the madvise() call by determining
whether the VMA got madvise'd before. This will work for most cases except
the following: if check_may_shrink_heap() is true, shrink_heap() re-maps the
shrunk portion, giving us a new VMA altogether. That VMA won't have the
VM_HUGEPAGE flag.
Reverting this commit, we will again mark the new VMA with VM_HUGEPAGE, and
the kernel will merge the two into a single VMA marked with VM_HUGEPAGE.
This may be the only case where we lose VM_HUGEPAGE, and we could micro-optimize
by extending the current if-condition with !check_may_shrink_heap. But let us
not do this - this is very difficult to reason about, and I am soon going
to propose mmap(MAP_HUGEPAGE) in Linux to do away with all these workarounds.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
|
|
The inclusion of generic tanh implementation without undefining the
libm_alias_double (to provide the __tanh_sse2 implementation) makes
the exported tanh symbol pointing to SSE2 variant.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The cosh shows an improvement of about ~35% when building for
x86_64-v3.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Common data definitions are moved to e_coshsinh_data, cosh only
data is moved to e_cosh_data, sinh to e_sinh_data, and tanh to
e_tanh_data.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,-10], [-10,10], [10,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):
* Range [-DBL_MAX, -10]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
* Range [-10, -10]
* FE_TONEAREST
0: 4059325526 94.51%
1: 231023238 5.38%
2: 4618531 0.11%
* FE_UPWARD
0: 2106654900 49.05%
1: 2145413180 49.95%
2: 40847554 0.95%
3: 2051661 0.05%
* FE_DOWNWARD
0: 2106618401 49.05%
1: 2145409958 49.95%
2: 40880992 0.95%
3: 2057944 0.05%
* FE_TOWARDZERO
0: 4061659952 94.57%
1: 221006985 5.15%
2: 12285512 0.29%
3: 14846 0.00%
* Range [10, DBL_MAX]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Performance-wise, it shows:
latency master patched improvement
x86_64 109.7420 184.5950 -68.21%
x86_64v2 109.1230 187.1890 -71.54%
x86_64v3 99.4471 49.1104 50.62%
aarch64 43.0474 32.2933 24.98%
armhf-vpfv4 41.0954 35.8473 12.77%
powerpc64le 27.3282 22.7134 16.89%
reciprocal-throughput master patched improvement
x86_64 42.5562 158.1820 -271.70%
x86_64v2 42.5734 159.2560 -274.07%
x86_64v3 35.9899 24.2877 32.52%
aarch64 24.7660 22.8466 7.75%
armhf-vpfv4 27.0251 25.8150 4.48%
powerpc64le 11.7350 11.2504 4.13%
* x86_64: gcc version 15.2.1 20260112, Ryzen 9 5900X, --disable-multi-arch
* aarch64: gcc version 15.2.1 20251105, Neoverse-N1
* armv7a-vpfv4: gcc version 15.2.1 20251105, Neoverse-N1
* powerpc64le: gcc version 15.2.1 20260128, POWER10
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
It improves throughput from 8 to 18% and latency from 1 to 10%,
dependending of the ABI.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,-10], [-10,10], [10,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):
* Range [-DBL_MAX, -10]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
* Range [-10, -10]
* FE_TONEAREST
0: 3169388892 73.79%
1: 1125270674 26.20%
2: 307729 0.01%
* FE_UPWARD
0: 1450068660 33.76%
1: 2146926394 49.99%
2: 697404986 16.24%
3: 567255 0.01%
* FE_DOWNWARD
0: 1449727976 33.75%
1: 2146957381 49.99%
2: 697719649 16.25%
3: 562289 0.01%
* FE_TOWARDZERO
0: 2519351889 58.66%
1: 1773434502 41.29%
2: 2180904 0.05%
* Range [10, DBL_MAX]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Performance-wise, it shows:
latency master patched improvement
x86_64 101.0710 129.4710 -28.10%
x86_64v2 101.1810 127.6370 -26.15%
x86_64v3 96.0685 48.5911 49.42%
aarch64 41.4229 22.3971 45.93%
armhf-vpfv4 42.8620 25.6011 40.27%
powerpc64le 29.2630 13.1450 55.08%
reciprocal-throughput master patched improvement
x86_64 42.6895 105.7150 -147.64%
x86_64v2 42.7255 104.7480 -145.17%
x86_64v3 39.6949 25.9087 34.73%
aarch64 26.0104 19.2236 26.09%
armhf-vpfv4 29.4362 23.6350 19.71%
powerpc64le 12.9170 8.34582 35.39%
* x86_64: gcc version 15.2.1 20260112, Ryzen 9 5900X, --disable-multi-arch
* aarch64: gcc version 15.2.1 20251105, Neoverse-N1
* armv7a-vpfv4: gcc version 15.2.1 20251105, Neoverse-N1
* powerpc64le: gcc version 15.2.1 20260128, POWER10
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
It improves throughout from 3.5% to 9%.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,-10], [-10,10], [10,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):
* Range [-DBL_MAX, -10]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
* Range [-10, -10]
* FE_TONEAREST
0: 3291614060 76.64%
1: 1003353235 23.36%
* FE_UPWARD
0: 2295272497 53.44%
1: 1999675198 46.56%
2: 19600 0.00%
* FE_DOWNWARD
0: 2294966533 53.43%
1: 1999981461 46.57%
2: 19301 0.00%
* FE_TOWARDZERO
0: 2306015780 53.69%
1: 1988942093 46.31%
2: 9422 0.00%
* Range [10, DBL_MAX]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Performance-wise, it shows:
latency master patched improvement
x86_64 52.1066 126.4120 -142.60%
x86_64v2 49.5781 119.8520 -141.74%
x86_64v3 45.0811 50.5758 -12.19%
aarch64 19.9977 21.7814 -8.92%
armhf-vpfv4 20.5969 27.0479 -31.32%
powerpc64le 12.6405 13.6768 -8.20%
reciprocal-throughput master patched improvement
x86_64 18.4833 102.9120 -456.78%
x86_64v2 17.5409 99.5179 -467.35%
x86_64v3 18.9187 25.3662 -34.08%
aarch64 10.9045 18.8217 -72.60%
armhf-vpfv4 15.7430 24.0822 -52.97%
powerpc64le 5.4275 8.1269 -49.73%
* x86_64: gcc version 15.2.1 20260112, Ryzen 9 5900X, --disable-multi-arch
* aarch64: gcc version 15.2.1 20251105, Neoverse-N1
* armv7a-vpfv4: gcc version 15.2.1 20251105, Neoverse-N1
* powerpc64le: gcc version 15.2.1 20260128, POWER10
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
|
|
The last uses of PTHREAD_IN_LIBC is where it should have been
__PTHREAD_NPTL/HTL. The latter was not conveniently available everywhere.
Defining it from config.h makes things simpler.
|