summaryrefslogtreecommitdiff
path: root/src/backend/tsearch
AgeCommit message (Collapse)Author
5 daysUpdate copyright for 2026Bruce Momjian
Backpatch-through: 14
10 daysChange some Datum to void * for opaque pass-through pointerPeter Eisentraut
Here, Datum was used to pass around an opaque pointer between a group of functions. But one might as well use void * for that; the use of Datum doesn't achieve anything here and is just distracting. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/1c5d23cb-288b-4154-b1cd-191fe2301707%40eisentraut.org
2025-12-10Use palloc_object() and palloc_array() in backend codeMichael Paquier
The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). This batch of changes includes most of the trivial changes suggested by the author for src/backend/. A total of 334 files are updated here. Among these files, 48 of them have their build change slightly; these are caused by line number changes as the new allocation formulas are simpler, shaving around 100 lines of code in total. Similar work has been done in 0c3c5c3b06a3 and 31d3847a37be. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-05Fix some cases of indirectly casting away const.Tom Lane
Newest versions of gcc are able to detect cases where code implicitly casts away const by assigning the result of strchr() or a similar function applied to a "const char *" value to a target variable that's just "char *". This of course creates a hazard of not getting a compiler warning about scribbling on a string one was not supposed to, so fixing up such cases is good. This patch fixes a dozen or so places where we were doing that. Most are trivial additions of "const" to the target variable, since no actually-hazardous change was occurring. There is one place in ecpg.trailer where we were indeed violating the intention of not modifying a string passed in as "const char *". I believe that's harmless not a live bug, but let's fix it by copying the string before modifying it. There is a remaining trouble spot in ecpg/preproc/variable.c, which requires more complex surgery. I've left that out of this commit because I want to study that code a bit more first. We probably will want to back-patch this once compilers that detect this pattern get into wider circulation, but for now I'm just going to apply it to master to see what the buildfarm says. Thanks to Bertrand Drouvot for finding a couple more spots than I had. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
2025-11-21Remove useless casts to (void *)Peter Eisentraut
Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers. Some have been missed in 7f798aca1d5 and some are new ones along the same lines. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/aR8Yv%2BuATLKbJCgI%40ip-10-97-1-34.eu-west-3.compute.internal
2025-10-31Mark function arguments of type "Datum *" as "const Datum *" where possiblePeter Eisentraut
Several functions in the codebase accept "Datum *" parameters but do not modify the pointed-to data. These have been updated to take "const Datum *" instead, improving type safety and making the interfaces clearer about their intent. This change helps the compiler catch accidental modifications and better documents immutability of arguments. Most of "Datum *" parameters have a pairing "bool *isnull" parameter, they are constified as well. No functional behavior is changed by this patch. Author: Chao Li <lic@highgo.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
2025-10-21tsearch: use database default collation for parsing.Jeff Davis
Previously, tsearch used the database's CTYPE setting, which only matches the database default collation if the locale provider is libc. Note that tsearch types (tsvector and tsquery) are not collatable types. The locale affects parsing the original text, which is a lossy process, so a COLLATE clause on the already-parsed value would not make sense. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
2025-09-20Track the maximum possible frequency of non-MCE array elements.Tom Lane
The lossy-counting algorithm that ANALYZE uses to identify most-common array elements has a notion of cutoff frequency: elements with frequency greater than that are guaranteed to be collected, elements with smaller frequencies are not. In cases where we find fewer MCEs than the stats target would permit us to store, the cutoff frequency provides valuable additional information, to wit that there are no non-MCEs with frequency greater than that. What the selectivity estimation functions actually use the "minfreq" entry for is as a ceiling on the possible frequency of non-MCEs, so using the cutoff rather than the lowest stored MCE frequency provides a tighter bound and more accurate estimates. Therefore, instead of redundantly storing the minimum observed MCE frequency, store the cutoff frequency when there are fewer tracked values than we want. (When there are more, then of course we cannot assert that no non-stored elements are above the cutoff frequency, since we're throwing away some that are; so we still use the minimum stored frequency in that case.) Notably, this works even when none of the values are common enough to be called MCEs. In such cases we previously stored nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the selectivity functions falling back to default estimates. So in that case we want to construct a STATISTIC_KIND_MCELEM entry that contains no "values" but does have "numbers", to wit the three extra numbers that the MCELEM entry type defines. A small obstacle is that update_attstats() has traditionally stored a null, not an empty array, when passed zero "values" for a slot. That gives rise to an MCELEM entry that get_attstatsslot() will spit up on. The least risky solution seems to be to adjust update_attstats() so that it will emit a non-null (but possibly empty) array when the passed stavalues array pointer isn't NULL, rather than conditioning that on numvalues > 0. In other existing cases I don't believe that that changes anything. For consistency, handle the stanumbers array the same way. In passing, improve the comments in routines that use STATISTIC_KIND_MCELEM data. Particularly, explain why we use minfreq / 2 not minfreq as the estimate for non-MCE values. Thanks to Matt Long for the suggestion that we could apply this idea even when there are more than zero MCEs. Reported-by: Mark Frost <FROSTMAR@uk.ibm.com> Reported-by: Matt Long <matt@mattlong.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-08-08Remove useless/superfluous Datum conversionsPeter Eisentraut
Remove useless DatumGetFoo() and FooGetDatum() calls. These are places where no conversion from or to Datum was actually happening. We think these extra calls covered here were harmless. Some actual bugs that were discovered during this process have been committed separately (80c758a2e1d, 2242b26ce47). Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
2025-08-05Fix varatt versus Datum type confusionsPeter Eisentraut
Macros like VARDATA() and VARSIZE() should be thought of as taking values of type pointer to struct varlena or some other related struct. The way they are implemented, you can pass anything to it and it will cast it right. But this is in principle incorrect. To fix, add the required DatumGetPointer() calls. Or in a couple of cases, remove superfluous PointerGetDatum() calls. It is planned in a subsequent patch to change macros like VARDATA() and VARSIZE() to inline functions, which will enforce stricter typing. This is in preparation for that. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/928ea48f-77c6-417b-897c-621ef16685a6%40eisentraut.org
2025-08-02Suppress complaints about leaks in TS dictionary loading.Tom Lane
Like the situation with function cache loading, text search dictionary loading functions tend to leak some cruft into the dictionary's long-lived cache context. To judge by the examples in the core regression tests, not very many bytes are at stake. Moreover, I don't see a way to prevent such leaks without changing the API for TS template initialization functions: right now they do not have to worry about making sure that their results are long-lived. Hence, I think we should install a suppression rule rather than trying to fix this completely. However, I did grab some low-hanging fruit: several places were leaking the result of get_tsearch_config_filename. This seems worth doing mostly because they are inconsistent with other dictionaries that were freeing it already. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
2025-07-09Change wchar2char() and char2wchar() to accept a locale_t.Jeff Davis
These are libc-specific functions, so should require a locale_t rather than a pg_locale_t (which could use another provider). Discussion: https://postgr.es/m/a8666c391dfcabe79868d95f7160eac533ace718.camel%40j-davis.com
2025-03-08Clear errno before calling strtol() in spell.c.Tom Lane
Per POSIX, a caller of strtol() that wishes to check for errors must set errno to 0 beforehand. Several places in spell.c neglected that, so that they risked delivering a false overflow error in case errno had been ERANGE already. Given the lack of field reports, this case may be unreachable at present --- but it's surely trouble waiting to happen, so fix it. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Discussion: https://postgr.es/m/CA+COZaBhsq6EromFm+knMJfzK6nTpG23zJ+K2=nfUQQXcj_xcQ@mail.gmail.com Backpatch-through: 13
2025-02-11Add is_analyze parameter to vacuum_delay_point().Nathan Bossart
This function is used in both vacuum and analyze code paths, and a follow-up commit will require distinguishing between the two. This commit forces callers to specify whether they are in a vacuum or analyze path, but it does not use that information for anything yet. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-01-21Reword recent error messages: "should" -> "must"Álvaro Herrera
Most were introduced in the 17 timeframe. The ones in wparser_def.c are very old. I also changed "JSON path expression for column \"%s\" should return single item without wrapper" to "JSON path expression for column \"%s\" must return single item when no wrapper is requested" to avoid ambiguity. Backpatch to 17. Crickets: https://postgr.es/m/202501131819.26ors7oouafu@alvherre.pgsql
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-12-17Remove ts_locale.c's lowerstr()Peter Eisentraut
lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as str_tolower() that the rest of the system uses, except that the former don't use the common locale provider framework but instead use the global libc locale settings. This patch replaces uses of lowerstr*() with str_tolower(..., DEFAULT_COLLATION_OID). For instances that use a libc locale globally, this will result in exactly the same behavior. For instances that use other locale providers, you now get consistent behavior and are no longer dependent on the libc locale settings (for this case; there are others). Most uses of these functions are for processing dictionary and configuration files. In those cases, using the default collation seems appropriate. At least we don't have a more specific collation available. But the code in contrib/pg_trgm should really depend on the collation of the columns being processed. This is not done here, this can be done in a separate patch. (You can probably construct some edge cases where this change would create some locale-related upgrade incompatibility, for example if before you used a combination of ICU and a differently-behaving libc locale. We can document this in the release notes, but I don't think there is anything more we can do about this.) Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-12-17Remove ts_locale.c's t_isdigit(), t_isspace(), t_isprint()Peter Eisentraut
These do the same thing as the standard isdigit(), isspace(), and isprint() but with multibyte and encoding support. But all the callers are only interested in analyzing single-byte ASCII characters. So this extra layer is overkill and we can replace the uses with the standard functions. All the t_is*() functions in ts_locale.c are under scrutiny because they don't use the common locale provider framework but instead use the global libc locale settings. For the functions being touched by this patch, we don't need all that anyway, as mentioned above, so the simplest solution is to just remove them. The few remaining t_is*() functions will need a different treatment in a separate patch. pg_trgm has some compile-time options with macros such as KEEPONLYALNUM. These are not documented, and the non-default variant is not supported by any test cases. As part of this undertaking, I'm removing the non-default variant, as it is in the way of cleanup. So in this case, the not-KEEPONLYALNUM code path is gone. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-11-28Remove useless casts to (void *)Peter Eisentraut
Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
2024-08-06Constify fields and parameters in spell.cHeikki Linnakangas
I started by marking VoidString as const, and fixing the fallout by marking more fields and function arguments as const. It proliferated quite a lot, but all within spell.c and spell.h. A more narrow patch to get rid of the static VoidString buffer would be to replace it with '#define VoidString ""', as C99 allows assigning "" to a non-const pointer, even though you're not allowed to modify it. But it seems like good hygiene to mark all these as const. In the structs, the pointers can point to the constant VoidString, or a buffer allocated with palloc(), or with compact_palloc(), so you should not modify them. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-09-25Limit to_tsvector_byid's initial array allocation to something sane.Tom Lane
The initial estimate of the number of distinct ParsedWords is just that: an estimate. Don't let it exceed what palloc is willing to allocate. If in fact we need more entries, we'll eventually fail trying to enlarge the array. But if we don't, this allows success on inputs that currently draw "invalid memory alloc request size". Per bug #18080 from Uwe Binder. Back-patch to all supported branches. Discussion: https://postgr.es/m/18080-d5c5e58fef8c99b7@postgresql.org
2023-07-03Take pg_attribute out of VacAttrStatsPeter Eisentraut
The VacAttrStats structure contained the whole Form_pg_attribute for a column, but it actually only needs attstattarget from there. So remove the Form_pg_attribute field and make a separate field for attstattarget. This simplifies some code for extended statistics that doesn't deal with a column but an expression, which had to fake up pg_attribute rows to satisfy internal APIs. Also, we can remove some comments that essentially said "don't look at pg_attribute directly". Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
2023-06-24Check for interrupts and stack overflow in TParserGet().Tom Lane
TParserGet() recurses for some token types, meaning it's possible to drive it to stack overflow. Since this is a minority behavior, I chose to add the check_stack_depth() call to the two places that recurse rather than doing it during every single call. While at it, add CHECK_FOR_INTERRUPTS(), because this can run unpleasantly long for long inputs. Per bug #17995 from Zuming Jiang. This is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/17995-9f20ff3e6389db4c@postgresql.org
2023-05-19Pre-beta mechanical code beautification.Tom Lane
Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19Remove stray mid-sentence tabs in commentsPeter Eisentraut
2023-04-08Update tsearch regex memory management.Thomas Munro
Now that our regex engine uses palloc(), it's not necessary to set up a special memory context callback to free compiled regexes. The regex has no resources other than the memory that is already going to be freed in bulk. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
2023-04-06Fix ts_headline() edge cases for empty query and empty search text.Tom Lane
tsquery's GETQUERY() macro is only safe to apply to a tsquery that is known non-empty; otherwise it gives a pointer to garbage. Before commit 5a617d75d, ts_headline() avoided this pitfall, but only in a very indirect, nonobvious way. (hlCover could not reach its TS_execute call, because if the query contains no lexemes then hlFirstIndex would surely return -1.) After that commit, it fell into the trap, resulting in weird errors such as "unrecognized operator" and/or valgrind complaints. In HEAD, fix this by not calling TS_execute_locations() at all for an empty query. In the back branches, add a defensive check to hlCover() --- that's not fixing any live bug, but I judge the code a bit too fragile as-is. Also, both mark_hl_fragments() and mark_hl_words() were careless about the possibility of empty search text: in the cases where no match has been found, they'd end up telling mark_fragment() to mark from word indexes 0 to 0 inclusive, even when there is no word 0. This is harmless since we over-allocated the prs->words array, but it does annoy valgrind. Fix so that the end index is -1 and thus mark_fragment() will do nothing in such cases. Bottom line is that this fixes a live bug in HEAD, but in the back branches it's only getting rid of a valgrind nitpick. Back-patch anyway. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com
2023-03-17Fix t_isspace(), etc., when datlocprovider=i and datctype=C.Jeff Davis
Check whether the datctype is C to determine whether t_isspace() and related functions use isspace() or iswspace(). Previously, t_isspace() checked whether the database default collation was C; which is incorrect when the default collation uses the ICU provider. Discussion: https://postgr.es/m/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com Reviewed-by: Peter Eisentraut Backpatch-through: 15
2023-02-07Remove useless casts to (void *) in arguments of some system functionsPeter Eisentraut
The affected functions are: bsearch, memcmp, memcpy, memset, memmove, qsort, repalloc Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-02-06Remove useless casts to (void *) in hash_search() callsPeter Eisentraut
Some of these appear to be leftovers from when hash_search() took a char * argument (changed in 5999e78fc45dcb91784b64b6e9ae43f4e4f68ca2). Since after this there is some more horizontal space available, do some light reformatting where suitable. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-01-19Fix ts_headline() to handle ORs and phrase queries more honestly.Tom Lane
This patch largely reverts what I did in commits c9b0c678d and 78e73e875. The maximum cover length limit that I added in 78e73e875 (to band-aid over c9b0c678d's performance issues) creates too many user-visible behavior discrepancies, as complained of for example in bug #17691. The real problem with hlCover() is not what I thought at the time, but more that it seems to have been designed with only AND tsquery semantics in mind. It doesn't work quite right for OR, and even less so for NOT or phrase queries. However, we can improve that situation by building a variant of TS_execute() that returns a list of match locations. We already get an ExecPhraseData struct representing match locations for the primitive case of a simple match, as well as one for a phrase match; we just need to add some logic to combine these for AND and OR operators. The result is a list of ExecPhraseDatas, which hlCover can regard as having simple AND semantics, so that its old algorithm works correctly. There's still a lot not to like about ts_headline's behavior, but I think the remaining issues have to do with the heuristics used in mark_hl_words and mark_hl_fragments (which, likewise, were not revisited when phrase search was added). Improving those is a task for another day. Patch by me; thanks to Alvaro Herrera for review. Discussion: https://postgr.es/m/840.1669405935@sss.pgh.pa.us
2023-01-10New header varatt.h split off from postgres.hPeter Eisentraut
This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-27Convert the reg* input functions to report (most) errors softly.Tom Lane
This is not really complete, but it catches most cases of practical interest. The main omissions are: * regtype, regprocedure, and regoperator parse type names by calling the main grammar, so any grammar-detected syntax error will still be a hard error. Also, if one includes a type modifier in such a type specification, errors detected by the typmodin function will be hard errors. * Lookup errors are handled just by passing missing_ok = true to the relevant catalog lookup function. Because we've used quite a restrictive definition of "missing_ok", this means that edge cases such as "the named schema exists, but you lack USAGE permission on it" are still hard errors. It would make sense to me to replace most/all missing_ok parameters with an escontext parameter and then allow these additional lookup failure cases to be trapped too. But that's a job for some other day. Discussion: https://postgr.es/m/3342239.1671988406@sss.pgh.pa.us
2022-12-27Convert tsqueryin and tsvectorin to report errors softly.Tom Lane
This is slightly tedious because the adjustments cascade through a couple of levels of subroutines, but it's not very hard. I chose to avoid changing function signatures more than absolutely necessary, by passing the escontext pointer in existing structs where possible. tsquery's nuisance NOTICEs about empty queries are suppressed in soft-error mode, since they're not errors and we surely don't want them to be shown to the user anyway. Maybe that whole behavior should be reconsidered. Discussion: https://postgr.es/m/3824377.1672076822@sss.pgh.pa.us
2022-12-21Switch some system functions to use get_call_result_type()Michael Paquier
This shaves some code by replacing the combinations of CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of the attributes listed in pg_proc.dat by get_call_result_type() to build the TupleDesc needed for the rows generated. get_call_result_type() is more expensive than the former style, but this removes some duplication with the lists of OUT parameters (pg_proc.dat and the attributes hardcoded in these code paths). This is applied to functions that are not considered as critical (aka that could be called repeatedly for monitoring purposes). Author: Bharath Rupireddy Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
2022-12-20Add copyright notices to meson filesAndrew Dunstan
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
2022-11-21Add comments and a missing CHECK_FOR_INTERRUPTS in ts_headline.Tom Lane
I just spent an annoying amount of time reverse-engineering the 100%-undocumented API between ts_headline and the text search parser's prsheadline function. Add some commentary about that while it's fresh in mind. Also remove some unused macros in wparser_def.c. While at it, I noticed that when commit 78e73e875 added a CHECK_FOR_INTERRUPTS call in TS_execute_recurse, it missed doing so in the parallel function TS_phrase_execute, which surely needs one just as much. Back-patch because of the missing CHECK_FOR_INTERRUPTS. Might as well back-patch the rest of this too.
2022-10-06Introduce t_isalnum() to replace t_isalpha() || t_isdigit() tests.Tom Lane
ts_locale.c omitted support for "isalnum" tests, perhaps on the grounds that there were initially no use-cases for that. However, both ltree and pg_trgm need such tests, and we do also have one use-case now in the core backend. The workaround of testing isalpha and isdigit separately seems quite inefficient, especially when dealing with multibyte characters; so let's fill in the missing support. Discussion: https://postgr.es/m/2548310.1664999615@sss.pgh.pa.us
2022-10-05Rename shadowed local variablesDavid Rowley
In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-09-27Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut
The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. For the *GetDatumFast() macros, converting to an inline function doesn't work in the !USE_FLOAT8_BYVAL case, but we can use AssertVariableIsOfTypeMacro() to get a similar level of type checking. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-09-21meson: Add initial version of meson based build systemAndres Freund
Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-13Split up guc.c for better build speed and ease of maintenance.Tom Lane
guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-12Revert "Convert *GetDatum() and DatumGet*() macros to inline functions"Peter Eisentraut
This reverts commit 595836e99bf1ee6d43405b885fb69bb8c6d3ee23. It has problems when USE_FLOAT8_BYVAL is off.
2022-09-12Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut
The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-08-24Defend against stack overrun in a few more places.Tom Lane
SplitToVariants() in the ispell code, lseg_inside_poly() in geo_ops.c, and regex_selectivity_sub() in selectivity estimation could recurse until stack overflow; fix by adding check_stack_depth() calls. So could next() in the regex compiler, but that case is better fixed by converting its tail recursion to a loop. (We probably get better code that way too, since next() can now be inlined into its sole caller.) There remains a reachable stack overrun in the Turkish stemmer, but we'll need some advice from the Snowball people about how to fix that. Per report from Egor Chindyaskin and Alexander Lakhin. These mistakes are old, so back-patch to all supported branches. Richard Guo and Tom Lane Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
2022-08-24Further -Wshadow=compatible-local warning fixesDavid Rowley
These should have been included in 421892a19 as these shadowed variable warnings can also be fixed by adjusting the scope of the shadowed variable to put the declaration for it in an inner scope. This is part of the same effort as f01592f91. By my count, this takes the warning count from 114 down to 106. Author: David Rowley and Justin Pryzby Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-07-12Invent qsort_interruptible().Tom Lane
Justin Pryzby reported that some scenarios could cause gathering of extended statistics to spend many seconds in an un-cancelable qsort() operation. To fix, invent qsort_interruptible(), which is just like qsort_arg() except that it will also do CHECK_FOR_INTERRUPTS every so often. This bloats the backend by a couple of kB, which seems like a good investment. (We considered just enabling CHECK_FOR_INTERRUPTS in the existing qsort and qsort_arg functions, but there are some callers for which that'd demonstrably be unsafe. Opt-in seems like a better way.) For now, just apply qsort_interruptible() in statistics collection. There's probably more places where it could be useful, but we can always change other call sites as we find problems. Back-patch to v14. Before that we didn't have extended stats on expressions, so that the problem was less severe. Also, this patch depends on the sort_template infrastructure introduced in v14. Tom Lane and Justin Pryzby Discussion: https://postgr.es/m/20220509000108.GQ28830@telsasoft.com