summaryrefslogtreecommitdiff
path: root/src/include/port
AgeCommit message (Collapse)Author
14 daysOptimize hex_encode() and hex_decode() using SIMD.Nathan Bossart
The hex_encode() and hex_decode() functions serve as the workhorses for hexadecimal data for bytea's text format conversion functions, and some workloads are sensitive to their performance. This commit adds new implementations that use routines from port/simd.h, which testing indicates are much faster for larger inputs. For small or invalid inputs, we fall back on the existing scalar versions. Since we are using port/simd.h, these optimizations apply to both x86-64 and AArch64. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com> Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan
2025-10-03Optimize vector8_has_le() on AArch64.Nathan Bossart
Presently, the SIMD implementation of this function uses unsigned saturating subtraction to find bytes less than or equal to the given value, which is a workaround for the lack of unsigned comparison instructions on some architectures. However, Neon offers vminvq_u8(), which returns the minimum (unsigned) value in the vector. This commit adds a Neon-specific implementation that uses vminvq_u8() to optimize vector8_has_le() on AArch64. In passing, adjust the SSE2 implementation to use vector8_min() and vector8_eq() to find values less than or equal to the given value. This was the only use of vector8_ssub(), so it has been removed. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aNHDNDSHleq0ogC_%40nathan
2025-09-12Remove traces of support for Sun Studio compilerPeter Eisentraut
Per discussion, this compiler suite is no longer maintained, and it has not been able to compile PostgreSQL since at least PostgreSQL 17. This removes all the remaining support code for this compiler. Note that the Solaris operating system continues to be supported, but using GCC as the compiler. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
2025-09-11Remove checks for no longer supported GCC versionsPeter Eisentraut
Since commit f5e0186f865 (Raise C requirement to C11), we effectively require at least GCC version 4.7, so checks for older versions can be removed. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
2025-08-13Grab the low-hanging fruit from forcing sizeof(Datum) to 8.Tom Lane
Remove conditionally-compiled code for smaller Datum widths, and simplify comments that describe cases no longer of interest. I also fixed up a few more places that were not using DatumGetIntXX where they should, and made some cosmetic adjustments such as using sizeof(int64) not sizeof(Datum) in places where that fit better with the surrounding code. One thing I remembered while preparing this part is that SP-GiST stores pass-by-value prefix keys as Datums, so that the on-disk representation depends on sizeof(Datum). That's even more unfortunate than the existing commentary makes it out to be, because now there is a hazard that the change of sizeof(Datum) will break SP-GiST indexes on 32-bit machines. It appears that there are no existing SP-GiST opclasses that are actually affected; and if there are some that I didn't find, the number of installations that are using them on 32-bit machines is doubtless tiny. So I'm proceeding on the assumption that we can get away with this, but it's something to worry about. (gininsert.c looks like it has a similar problem, but it's okay because the "tuples" it's constructing are just transient data within the tuplesort step. That's pretty poorly documented though, so I added some comments.) Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
2025-07-23Fix build breakage on Solaris-alikes with late-model GCC.Tom Lane
Solaris has never bothered to add "const" to the second argument of PAM conversation procs, as all other Unixen did decades ago. This resulted in an "incompatible pointer" compiler warning when building --with-pam, but had no more serious effect than that, so we never did anything about it. However, as of GCC 14 the case is an error not warning by default. To complicate matters, recent OpenIndiana (and maybe illumos in general?) *does* supply the "const" by default, so we can't just assume that platforms using our solaris template need help. What we can do, short of building a configure-time probe, is to make solaris.h #define _PAM_LEGACY_NONCONST, which causes OpenIndiana's pam_appl.h to revert to the traditional definition, and hopefully will have no effect anywhere else. Then we can use that same symbol to control whether we include "const" in the declaration of pam_passwd_conv_proc(). Bug: #18995 Reported-by: Andrew Watkins <awatkins1966@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18995-82058da9ab4337a7@postgresql.org Backpatch-through: 13
2025-07-02Remove implicit cast from 'void *'John Naylor
Commit e2809e3a101 added code to a header which assigns a pointer to void to a pointer to unsigned char. This causes build errors for extensions written in C++. Fix by adding an explicit cast. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CANWCAZaCq9AHBuhs%3DMx7Gg_0Af9oRU7iAqr0itJCtfmsWwVmnQ%40mail.gmail.com Backpatch-through: 18
2025-07-01Make sure IOV_MAX is defined.Tom Lane
We stopped defining IOV_MAX on non-Windows systems in 75357ab94, on the assumption that every non-Windows system defines it in <limits.h> as required by X/Open. GNU Hurd, however, doesn't follow that standard either. Put back the old logic to assume 16 if it's not defined. Author: Michael Banck <mbanck@gmx.net> Co-authored-by: Christoph Berg <myon@debian.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6862e8d1.050a0220.194b8d.76fa@mx.google.com Discussion: https://postgr.es/m/6846e0c3.df0a0220.39ef9b.c60e@mx.google.com Backpatch-through: 16
2025-07-01Fix indentation in pg_numa codeTomas Vondra
Broken by commits 7fe2f67c7c9f, 81f287dc923f and bf1119d74a79. Backpatch to 18, same as the offending commits. Backpatch-through: 18
2025-07-01Silence valgrind about pg_numa_touch_mem_if_requiredTomas Vondra
When querying NUMA status of pages in shared memory, we need to touch the memory first to get valid results. This may trigger valgrind reports, because some of the memory (e.g. unpinned buffers) may be marked as noaccess. Solved by adding a valgrind suppresion. An alternative would be to adjust the access/noaccess status before touching the memory, but that seems far too invasive. It would require all those places to have detailed knowledge of what the shared memory stores. The pg_numa_touch_mem_if_required() macro is replaced with a function. Macros are invisible to suppressions, so it'd have to suppress reports for the caller - e.g. pg_get_shmem_allocations_numa(). So we'd suppress reports for the whole function, and that seems to heavy-handed. It might easily hide other valid issues. Reviewed-by: Christoph Berg <myon@debian.org> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de Backpatch-through: 18
2025-04-27Match parameter in new function to earlier equivalentsJohn Naylor
Oversight in commit 3c6e8c123.
2025-04-12Harmonize function parameter names for Postgres 18.Peter Geoghegan
Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 18 development. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).
2025-04-09Cleanup of pg_numa.cTomas Vondra
This moves/renames some of the functions defined in pg_numa.c: * pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and moved to src/backend/storage/ipc/shmem.c. The new name better reflects that the page size is not related to NUMA, and it's specifically about the page size used for the main shared memory segment. * move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into the backend (which more appropriate for functions callable from SQL). While at it, improve the comment to explain what page size it returns. * remove unnecessary includes from src/port/pg_numa.c, adding unnecessary dependencies (src/port should be suitable for frontent). These were either leftovers or unnecessary thanks to the other changes in this commit. This eliminates unnecessary dependencies on backend symbols, which we don't want in src/port. Reported-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-07Add support for basic NUMA awarenessTomas Vondra
Add basic NUMA awareness routines, using a minimal src/port/pg_numa.c portability wrapper and an optional build dependency, enabled by --with-libnuma configure option. For now this is Linux-only, other platforms may be supported later. A built-in SQL function pg_numa_available() allows checking NUMA support, i.e. that the server was built/linked with the NUMA library. The main function introduced is pg_numa_query_pages(), which allows determining the NUMA node for individual memory pages. Internally the function uses move_pages(2) syscall, as it allows batching, and is more efficient than get_mempolicy(2). Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-06Compute CRC32C using AVX-512 instructions where availableJohn Naylor
The previous implementation of CRC32C on x86 relied on the native CRC32 instruction from the SSE 4.2 extension, which operates on up to 8 bytes at a time. We can get a substantial speedup by using carryless multiplication on SIMD registers, processing 64 bytes per loop iteration. Shorter inputs fall back to ordinary CRC instructions. On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs between 64 and 112 bytes, and 3x faster for 256 bytes. The VPCLMULQDQ instruction on 512-bit registers has been available on Intel hardware since 2019 and AMD since 2022. There is an older variant for 128-bit registers, but at least on Zen 2 it performs worse than normal CRC instructions for short inputs. We must now do a runtime check, even for builds that target SSE 4.2. This doesn't matter in practice for WAL (arguably the most critical case), because since commit e2809e3a1 the final computation with the 20-byte WAL header is inlined and unrolled when targeting that extension. Compared with two direct function calls, testing showed equal or slightly faster performance in performing an indirect function call on several dozen bytes followed by inlined instructions on constant input of 20 bytes. The MIT-licensed implementation was generated with the "generate" program from https://github.com/corsix/fast-crc32/ Based on: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" V. Gopal, E. Ozturk, et al., 2009 Co-authored-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Co-authored-by: Paul Amonson <paul.d.amonson@intel.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Matthew Sterrett <matthewsterrett2@gmail.com> (earlier version) Tested-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Tested-by: David Rowley <<dgrowleyml@gmail.com>> (earlier version) Discussion: https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192@BL1PR11MB5304.namprd11.prod.outlook.com Discussion: https://postgr.es/m/PH8PR11MB82869FF741DFA4E9A029FF13FBF72@PH8PR11MB8286.namprd11.prod.outlook.com
2025-04-01Use function attributes for SSE 4.2 even when targeting that extensionJohn Naylor
On Red Hat 9 systems (or similar), the packaged gcc targets x86-64-v2, but clang does not. This has caused build failures in the wake of commit e2809e3a1 when building --with-llvm. The most expedient fix is to use the same function attributes for the inlined function as we do for the global function. Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (plus members skimmer and bumblebee) Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Todd Cook <cookt@blackduck.com> Discussion: https://postgr.es/m/CANWCAZZSxs3a1YRKehkgk2OHKbrVn+xZ+AWW8Co2R_f70NqqmA@mail.gmail.com
2025-03-31Inline CRC computation for small fixed-length input on x86John Naylor
pg_crc32c.h now has a simplified copy of the loop in pg_crc32c_sse42.c suitable for inlining where possible. This may slightly reduce contention for the WAL insertion lock, but that hasn't been tested. The motivation for this change is avoid regressing for a future commit that will use a function pointer for non-constant input in all x86 builds. While it's technically possible to make a similar change for Arm and LoongArch, there are some questions about how inlining should work since those platforms prefer stricter alignment. There are also no immediate plans to add additional implementations for them. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Discussion: https://postgr.es/m/CANWCAZZEiTzhZcuwTiJ2=opiNpAUn1vuDRu1N02z61AthwRZLA@mail.gmail.com Discussion: https://postgr.es/m/CANWCAZYRhLHArpyfV4uRK-Rw9N5oV5HMkkKtBehcuTjNOMwCZg@mail.gmail.com
2025-03-28Optimize popcount functions with ARM SVE intrinsics.Nathan Bossart
This commit introduces SVE implementations of pg_popcount{32,64}. Unlike the Neon versions, we need an additional configure-time check to determine if the compiler supports SVE intrinsics, and we need a runtime check to determine if the current CPU supports SVE instructions. Our testing showed that the SVE implementations are much faster for larger inputs and are comparable to the status quo for smaller inputs. Author: "Devanga.Susmitha@fujitsu.com" <Devanga.Susmitha@fujitsu.com> Co-authored-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Co-authored-by: "Malladi, Rama" <ramamalladi@hotmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com
2025-03-28Revert "Tidy up locale thread safety in ECPG library."Peter Eisentraut
This reverts commit 8e993bff5326b00ced137c837fce7cd1e0ecae14. It causes various build failures on the buildfarm, to be investigated. Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2025-03-28Optimize popcount functions with ARM Neon intrinsics.Nathan Bossart
This commit introduces Neon implementations of pg_popcount{32,64}, pg_popcount(), and pg_popcount_masked(). As in simd.h, we assume that all available AArch64 hardware supports Neon, so we don't need any new configure-time or runtime checks. Some compilers already emit Neon instructions for these functions, but our hand-rolled implementations for pg_popcount() and pg_popcount_masked() performed better in testing, likely due to better instruction-level parallelism. Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
2025-03-28Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64.Nathan Bossart
This macro protects x86_64-specific code, and a subsequent commit will introduce AArch64-specific versions of that code. To prevent confusion, let's rename it to clearly indicate that it's for x86_64. We should likely move this code to its own file (perhaps merging it with the AVX-512 popcount code), but that is left as a future exercise. Reviewed-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
2025-03-28Tidy up locale thread safety in ECPG library.Peter Eisentraut
Remove setlocale() and _configthreadlocal() as fallback strategy on systems that don't have uselocale(), where ECPG tries to control LC_NUMERIC formatting on input and output of floating point numbers. It was probably broken on some systems (NetBSD), and the code was also quite messy and complicated, with obsolete configure tests (Windows). It was also arguably broken, or at least had unstated environmental requirements, if pgtypeslib code was called directly. Instead, introduce PG_C_LOCALE to refer to the "C" locale as a locale_t value. It maps to the special constant LC_C_LOCALE when defined by libc (macOS, NetBSD), or otherwise uses a process-lifetime locale_t that is allocated on first use, just as ECPG previously did itself. The new replacement might be more widely useful. Then change the float parsing and printing code to pass that to _l() functions where appropriate. Unfortunately the portability of those functions is a bit complicated. First, many obvious and useful _l() functions are missing from POSIX, though most standard libraries define some of them anyway. Second, although the thread-safe save/restore technique can be used to replace the missing ones, Windows and NetBSD refused to implement standard uselocale(). They might have a point: "wide scope" uselocale() is hard to combine with other code and error-prone, especially in library code. Luckily they have the _l() functions we want so far anyway. So we have to be prepared for both ways of doing things: 1. In ECPG, use strtod_l() for parsing, and supply a port.h replacement using uselocale() over a limited scope if missing. 2. Inside our own snprintf.c, use three different approaches to format floats. For frontend code, call libc's snprintf_l(), or wrap libc's snprintf() in uselocale() if it's missing. For backend code, snprintf.c can keep assuming that the global locale's LC_NUMERIC is "C" and call libc's snprintf() without change, for now. (It might eventually be possible to call our in-tree Ryū routines to display floats in snprintf.c, given the C-locale-always remit of our in-tree snprintf(), but this patch doesn't risk changing anything that complicated.) Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tristan Partin <tristan@partin.io> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2025-03-19Increase io_combine_limit range to 1MB.Thomas Munro
The default of 128kB is unchanged, but the upper limit is changed from 32 blocks to 128 blocks, unless the operating system's IOV_MAX is too low. Some other RDBMSes seem to cap their multi-block buffer pool I/O around this number, and it seems useful to allow experimentation. The concrete change is to our definition of PG_IOV_MAX, which provides the maximum for io_combine_limit and io_max_combine_limit. It also affects a couple of other places that work with arrays of struct iovec or smaller objects on the stack, so we still don't want to use the system IOV_MAX directly without a clamp: it is not under our control and likely to be 1024. 128 seems acceptable for our current usage. For Windows, we can't use real scatter/gather yet, so we continue to define our own IOV_MAX value of 16 and emulate preadv()/pwritev() with loops. Someone would need to research the trade-offs of raising that number. NB if trying to see this working: you might temporarily need to hack BAS_BULKREAD to be bigger, since otherwise the obvious way of "a very big SELECT" is limited by that for now. Suggested-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-01-22Fix comment about AVX-512 popcount support.Nathan Bossart
Since commit f78667bd91, we've used __attribute__((target(...))) instead of extra compiler flags for AVX-512 support, but this comment still says that we put the code in a separate file because it might require extra compiler flags. Let's just remove that part of the comment.
2025-01-16Remove redefinitions of SIG_* macros in win32_port.h.Nathan Bossart
It is not clear why these were originally added. One hypothesis is that an ancient version of MinGW didn't define them. In any case, they appear to now be superfluous, so let's remove them. If nothing else, the buildfarm might offer us clues to their origins. Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/Z4chOKfnthRH71mw%40nathan
2025-01-15IWYU widely useful pragmasPeter Eisentraut
Add various widely useful "IWYU pragma" annotations, such as - Common header files such as c.h, postgres.h should be "always_keep". - System headers included in c.h, postgres.h etc. should be considered "export". - Some portability headers such as getopt_long.h should be "always_keep", so they are not considered superfluous on some platforms. - Certain system headers included from portability headers should be considered "export" because the purpose of the portability header is to wrap them. - Superfluous includes marked as "for backward compatibility" get a formal IWYU annotation. - Generated header included in utils/syscache.h is marked exported. This is a very commonly used include and this avoids lots of complaints. Discussion: https://www.postgresql.org/message-id/flat/9395d484-eff4-47c2-b276-8e228526c8ae@eisentraut.org
2025-01-09Provide 64-bit ftruncate() and lseek() on Windows.Thomas Munro
Change our ftruncate() macro to use the 64-bit variant of chsize(), and add a new macro to redirect lseek() to _lseeki64(). Back-patch to all supported releases, in preparation for a bug fix. Tested-by: Davinder Singh <davinder.singh@enterprisedb.com> Discussion: https://postgr.es/m/CAKZiRmyM4YnokK6Oenw5JKwAQ3rhP0YTz2T-tiw5dAQjGRXE3Q%40mail.gmail.com
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-12-04Use <stdint.h> and <inttypes.h> for c.h integers.Thomas Munro
Redefine our exact width types with standard C99 types and macros, including int64_t, INT64_MAX, INT64_C(), PRId64 etc. We were already using <stdint.h> types in a few places. One complication is that Windows' <inttypes.h> uses format strings like "%I64d", "%I32", "%I" for PRI*64, PRI*32, PTR*PTR, instead of mapping to other standardized format strings like "%lld" etc as seen on other known systems. Teach our snprintf.c to understand them. This removes a lot of configure clutter, and should also allow 64-bit numbers and other standard types to be used in localized messages without casting. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
2024-11-08Fix sign-compare warnings in pg_iovec.h.Nathan Bossart
The code in question (pg_preadv() and pg_pwritev()) has been around for a while, but commit 15c9ac3629 moved it to a header file. If third-party code that includes this header file is built with -Wsign-compare on a system without preadv() or pwritev(), warnings ensue. This commit fixes said warnings by casting the result of pg_pread()/pg_pwrite() to size_t, which should be safe because we will have already checked for a negative value. Author: Wolfgang Walther Discussion: https://postgr.es/m/16989737-1aa8-48fd-8dfe-b7ada06509ab%40technowledgy.de Backpatch-through: 17
2024-08-28Add prefetching support on macOSPeter Eisentraut
macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE command does the same thing. Some related documentation has been generalized to not mention posix_advise() specifically anymore. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org
2024-08-23thread-safety: gmtime_r(), localtime_r()Peter Eisentraut
Use gmtime_r() and localtime_r() instead of gmtime() and localtime(), for thread-safety. There are a few affected calls in libpq and ecpg's libpgtypes, which are probably effectively bugs, because those libraries already claim to be thread-safe. There is one affected call in the backend. Most of the backend otherwise uses the custom functions pg_gmtime() and pg_localtime(), which are implemented differently. While we're here, change the call in the backend to gmtime*() instead of localtime*(), since for that use time zone behavior is irrelevant, and this side-steps any questions about when time zones are initialized by localtime_r() vs localtime(). Portability: gmtime_r() and localtime_r() are in POSIX but are not available on Windows. Windows has functions gmtime_s() and localtime_s() that can fulfill the same purpose, so we add some small wrappers around them. (Note that these *_s() functions are also different from the *_s() functions in the bounds-checking extension of C11. We are not using those here.) On MinGW, you can get the POSIX-style *_r() functions by defining _POSIX_C_SOURCE appropriately before including <time.h>. This leads to a conflict at least in plpython because apparently _POSIX_C_SOURCE gets defined in some header there, and then our replacement definitions conflict with the system definitions. To avoid that sort of thing, we now always define _POSIX_C_SOURCE on MinGW and use the POSIX-style functions here. Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/eba1dc75-298e-4c46-8869-48ba8aad7d70@eisentraut.org
2024-07-30Require memory barrier support.Thomas Munro
Previously we had a fallback implementation that made a harmless system call, based on the assumption that system calls must contain a memory barrier. That shouldn't be reached on any current system, and it seems highly likely that we can easily find out how to request explicit memory barriers, if we've already had to find out how to do atomics on a hypothetical new system. Removed comments and a function name referred to a spinlock used for fallback memory barriers, but that changed in 1b468a13, which left some misleading words behind in a few places. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us
2024-07-30Require compiler barrier support.Thomas Munro
Previously we had a fallback implementation of pg_compiler_barrier() that called an empty function across a translation unit boundary so the compiler couldn't see what it did. That shouldn't be needed on any current systems, and might not even work with a link time optimizer. Since we now require compiler-specific knowledge of how to implement atomics, we should also know how to implement compiler barriers on a hypothetical new system. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us
2024-07-30Remove --disable-atomics, require 32 bit atomics.Thomas Munro
Modern versions of all relevant architectures and tool chains have atomics support. Since edadeb07, there is no remaining reason to carry code that simulates atomic flags and uint32 imperfectly with spinlocks. 64 bit atomics are still emulated with spinlocks, if needed, for now. Any modern compiler capable of implementing C11 <stdatomic.h> must have the underlying operations we need, though we don't require C11 yet. We detect certain compilers and architectures, so hypothetical new systems might need adjustments here. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us
2024-07-30Remove --disable-spinlocks.Thomas Munro
A later change will require atomic support, so it wouldn't make sense for a hypothetical new system not to be able to implement spinlocks. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us
2024-07-23Windows replacement for strtok_r()Peter Eisentraut
They spell it "strtok_s" there. There are currently no uses, but some will be added soon. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
2024-07-04Remove bogus assertion in pg_atomic_monotonic_advance_u64Alvaro Herrera
This code wanted to ensure that the 'exchange' variable passed to pg_atomic_compare_exchange_u64 has correct alignment, but apparently platforms don't actually require anything that doesn't come naturally. While messing with pg_atomic_monotonic_advance_u64: instead of using Max() to determine the value to return, just use pg_atomic_compare_exchange_u64()'s return value to decide; also, use pg_atomic_compare_exchange_u64 instead of the _impl version; also remove the unnecessary underscore at the end of variable name "target". Backpatch to 17, where this code was introduced by commit bf3ff7bf83bc. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/36796438-a718-cf9b-2071-b2c1b947c1b5@gmail.com
2024-07-01Remove support for HPPA (a/k/a PA-RISC) architecture.Tom Lane
This old CPU architecture hasn't been produced in decades, and whatever instances might still survive are surely too underpowered for anyone to consider running Postgres on in production. We'd nonetheless continued to carry code support for it (largely at my insistence), because its unique implementation of spinlocks seemed like a good edge case for our spinlock infrastructure. However, our last buildfarm animal of this type was retired last year, and it seems quite unlikely that another will emerge. Without the ability to run tests, the argument that this is useful test code fails to hold water. Furthermore, carrying code support for an untestable architecture has costs not to be ignored. So, remove HPPA-specific code, in the same vein as commits 718aa43a4 and 92d70b77e. Discussion: https://postgr.es/m/3351991.1697728588@sss.pgh.pa.us
2024-04-07Add XLogCtl->logInsertResultAlvaro Herrera
This tracks the position of WAL that's been fully copied into WAL buffers by all processes emitting WAL. (For some reason we call that "WAL insertion"). This is updated using atomic monotonic advance during WaitXLogInsertionsToFinish, which is not when the insertions actually occur, but it's the only place where we know where have all the insertions have completed. This value is useful in WALReadFromBuffers, which can verify that callers don't try to read past what has been inserted. (However, more infrastructure is needed in order to actually use WAL after the flush point, since it could be lost.) The value is also useful in WaitXLogInsertionsToFinish() itself, since we can now exit quickly when all WAL has been already inserted, without even having to take any locks.
2024-04-06Optimize visibilitymap_count() with AVX-512 instructions.Nathan Bossart
Commit 792752af4e added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
2024-04-06Optimize pg_popcount() with AVX-512 instructions.Nathan Bossart
Presently, pg_popcount() processes data in 32-bit or 64-bit chunks when possible. Newer hardware that supports AVX-512 instructions can use 512-bit chunks, which provides a nice speedup, especially for larger buffers. This commit introduces the infrastructure required to detect compiler and CPU support for the required AVX-512 intrinsic functions, and it adds a new pg_popcount() implementation that uses these functions. If CPU support for this optimized implementation is detected at runtime, a function pointer is updated so that it is used by subsequent calls to pg_popcount(). Most of the existing in-tree calls to pg_popcount() should benefit from these instructions, and calls with smaller buffers should at least not regress compared to v16. The new infrastructure introduced by this commit can also be used to optimize visibilitymap_count(), but that is left for a follow-up commit. Co-authored-by: Paul Amonson, Ants Aasma Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
2024-04-03Inline pg_popcount() for small buffers.Nathan Bossart
If there aren't many bytes to process, the function call overhead of the optimized implementation isn't worth taking, so instead we inline a loop that consults pg_number_of_ones in that case. If there are many bytes to process, we accept the function call overhead because the optimized versions are likely to be faster. The threshold at which we use the optimized implementation is set to the smallest amount of data required to use special popcount instructions. Reviewed-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13
2024-03-27Improve style of pg_lfind32().Nathan Bossart
This commit simplifies pg_lfind32() a bit by moving the standard one-by-one linear search code to an inline helper function. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20240327013616.GA3940109%40nathanxps13
2024-03-26Fix compiler warning for pg_lfind32().Nathan Bossart
The newly-introduced "one_by_one" label produces -Wunused-label warnings when building without SIMD support. To fix, move the label into the SIMD section of this function. Oversight in commit 7644a7340c. Reported-by: Tom Lane Discussion: https://postgr.es/m/3189995.1711495704%40sss.pgh.pa.us
2024-03-26Micro-optimize pg_lfind32().Nathan Bossart
This commit improves the performance of pg_lfind32() in many cases by modifying it to process the remaining "tail" of elements with SIMD instructions instead of processing them one-by-one. Since the SIMD code processes a large block of elements, this means that we will process a subset of elements more than once, but that won't affect the correctness of the result, and testing has shown that this helps more cases than it regresses. With this change, the standard one-by-one linear search code is only used for small arrays and for platforms without SIMD support. Suggested-by: John Naylor Reviewed-by: John Naylor Discussion: https://postgr.es/m/20231129171526.GA857928%40nathanxps13
2024-03-19Inline pg_popcount{32,64} into pg_popcount().Nathan Bossart
On some systems, calls to pg_popcount{32,64} are indirected through a function pointer. This commit converts pg_popcount() to a function pointer on those systems so that we can inline the appropriate pg_popcount{32,64} implementations into each of the pg_popcount() implementations. Since pg_popcount() may call pg_popcount{32,64} several times, this can significantly improve its performance. Suggested-by: David Rowley Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAApHDvrb7MJRB6JuKLDEY2x_LKdFHwVbogpjZBCX547i5%2BrXOQ%40mail.gmail.com
2024-03-08Fix link error for test_radixtree module on WindowsJohn Naylor
Add PGDLLIMPORT to pg_popcount32/64. In passing, fix a typo. Diagnosis by Masahiko Sawada, patch by David Rowley Per buildfarm members drongo and fairywren Discussion: https://postgr.es/m/CAD21AoAMm1mQd%3Dw4PrfrKK%3DOMP8j8%3D7ntJRPF8%2B%3D10iUuvwiCA%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvov7724UrD1Ug0D1eV%2B9Pd_x5VEQmw-6HVG9w1WdCxXPA%40mail.gmail.com
2024-03-06Fix signedness error in 9f225e992 for gccJohn Naylor
The first argument of vshrq_n_s8 needs to be a signed vector type, but it was passed unsigned. Clang is more lax with conversion, but gcc needs a cast. Fix by me, tested by Masahiko Sawada Per buildfarm members splitfin, batta, widowbird, snakefly, parula, massasauga Discussion: https://postgr.es/m/20240306074106.mg6w4koohdlworbs%40alap3.anarazel.de
2024-03-06Introduce helper SIMD functions for small byte arraysJohn Naylor
vector8_min - helper for emulating ">=" semantics vector8_highbit_mask - used to turn the result of a vector comparison into a bitmask Masahiko Sawada Reviewed by Nathan Bossart, with additional adjustments by me Discussion: https://postgr.es/m/CAFBsxsHbBm_M22gLBO%2BAZT4mfMq3L_oX3wdKZxjeNnT7fHsYMQ%40mail.gmail.com