user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2022-08-31	In the Snowball dictionary, don't try to stem excessively-long words.	Tom Lane
	If the input word exceeds 1000 bytes, don't pass it to the stemmer; just return it as-is after case folding. Such an input is surely not a word in any human language, so whatever the stemmer might do to it would be pretty dubious in the first place. Adding this restriction protects us against a known recursion-to-stack-overflow problem in the Turkish stemmer, and it seems like good insurance against any other safety or performance issues that may exist in the Snowball stemmers. (I note, for example, that they contain no CHECK_FOR_INTERRUPTS calls, so we really don't want them running for a long time.) The threshold of 1000 bytes is arbitrary. An alternative definition could have been to treat such words as stopwords, but that seems like a bigger break from the old behavior. Per report from Egor Chindyaskin and Alexander Lakhin. Thanks to Olly Betts for the recommendation to fix it this way. Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
2019-01-02	Update copyright for 2019	Bruce Momjian
	Backpatch-through: certain files through 9.4
2018-09-24	Sync our Snowball stemmer dictionaries with current upstream.	Tom Lane
	We haven't touched these since text search functionality landed in core in 2007 :-(. While the upstream project isn't a beehive of activity, they do make additions and bug fixes from time to time. Update our copies of these files. Also update our documentation about how to keep things in sync, since they're not making distribution tarballs these days. Fortunately, their source code turns out to be a breeze to build. Notable changes: * The non-UTF8 version of the hungarian stemmer now works in LATIN2 not LATIN1. * New stemmers have appeared for arabic, indonesian, irish, lithuanian, nepali, and tamil. These all work in UTF8, and the indonesian and irish ones also work in LATIN1. (There are some new stemmers that I did not incorporate, mainly because their names don't match the underlying languages, suggesting that they're not to be considered mainstream.) Worth noting: the upstream Nepali dictionary was contributed by Arthur Zakirov. initdb forced because the contents of snowball_create.sql have changed. Still TODO: see about updating the stopword lists. Arthur Zakirov, minor mods and doc work by me Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain
2018-01-26	Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.	Tom Lane
	We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-02	Update copyright for 2018	Bruce Momjian
	Backpatch-through: certain files through 9.3
2017-11-10	Add some const decorations to prototypes	Peter Eisentraut
	Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-06-21	Initial pgindent run with pg_bsd_indent version 2.0.	Tom Lane
	The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname varname", as well as some similar cases for const- or volatile-qualified pointers. Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-12	Add ICU_CFLAGS to global CPPFLAGS	Peter Eisentraut
	The original code only added ICU_CFLAGS to the backend build. But it is also needed for building external modules that include pg_locale.h. So add it to the global CPPFLAGS. (This is only relevant if ICU is not in a compiler default path, so it apparently hasn't bitten many.)
2017-03-23	Add ICU_FLAGS to one more place	Peter Eisentraut
	Reported-by: Thomas Munro <thomas.munro@enterprisedb.com>
2017-01-03	Update copyright via script for 2017	Bruce Momjian

2016-01-02	Update copyright for 2016	Bruce Momjian
	Backpatch certain files through 9.1
2015-12-17	Adjust behavior of single-user -j mode for better initdb error reporting.	Tom Lane
	Previously, -j caused the entire input file to be read in and executed as a single command string. That's undesirable, not least because any error causes the entire file to be regurgitated as the "failing query". Some experimentation suggests a better rule: end the command string when we see a semicolon immediately followed by two newlines, ie, an empty line after a query. This serves nicely to break up the existing examples such as information_schema.sql and system_views.sql. A limitation is that it's no longer possible to write such a sequence within a string literal or multiline comment in a file meant to be read with -j; but there are no instances of such a problem within the data currently used by initdb. (If someone does make such a mistake in future, it'll be obvious because they'll get an unterminated-literal or unterminated-comment syntax error.) Other than that, there shouldn't be any negative consequences; you're not forced to end statements that way, it's just a better idea in most cases. In passing, remove src/include/tcop/tcopdebug.h, which is dead code because it's not included anywhere, and hasn't been for more than ten years. One of the debug-support symbols it purported to describe has been unreferenced for at least the same amount of time, and the other is removed by this commit on the grounds that it was useless: forcing -j mode all the time would have broken initdb. The lack of complaints about that, or about the missing inclusion, shows that no one has tried to use TCOP_DONTUSENEWLINE in many years.
2015-05-24	Remove no-longer-required function declarations.	Tom Lane
	Remove a bunch of "extern Datum foo(PG_FUNCTION_ARGS);" declarations that are no longer needed now that PG_FUNCTION_INFO_V1(foo) provides that. Some of these were evidently missed in commit e7128e8dbb305059, but others were cargo-culted in in code added since then. Possibly that can be blamed in part on the fact that we'd not fixed relevant documentation examples, which I've now done.
2015-01-06	Update copyright for 2015	Bruce Momjian
	Backpatch certain files through 9.0
2014-08-18	Finish adding file version information to installed Windows binaries.	Noah Misch
	In support of this, have the MSVC build follow GNU make in preferring GNUmakefile over Makefile when a directory contains both. Michael Paquier, reviewed by MauMau.
2014-07-10	Adjust blank lines around PG_MODULE_MAGIC defines, for consistency	Bruce Momjian
	Report by Robert Haas
2014-06-10	Fix ancient encoding error in hungarian.stop.	Tom Lane
	When we grabbed this file off the Snowball project's website, we mistakenly supposed that it was in LATIN1 encoding, but evidently it was actually in LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too, but there aren't any of those in the file. Other characters used in the file have the same codes in LATIN1 and LATIN2, which no doubt helped hide the problem for so long. The error is not only ours: the Snowball project also was confused about which encoding is required for Hungarian. But dealing with that will require source-code changes that I'm not at all sure we'll wish to back-patch. Fixing the stopword file seems reasonably safe to back-patch however.
2014-02-23	Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.	Tom Lane
	A large majority of the callers of pg_do_encoding_conversion were specifying the database encoding as either source or target of the conversion, meaning that we can use the less general functions pg_any_to_server/pg_server_to_any instead. The main advantage of using the latter functions is that they can make use of a cached conversion-function lookup in the common case that the other encoding is the current client_encoding. It's notationally cleaner too in most cases, not least because of the historical artifact that the latter functions use "char " rather than "unsigned char " in their APIs. Note that pg_any_to_server will apply an encoding verification step in some cases where pg_do_encoding_conversion would have just done nothing. This seems to me to be a good idea at most of these call sites, though it partially negates the performance benefit. Per discussion of bug #9210.
2014-01-07	Update copyright for 2014	Bruce Momjian
	Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-01-01	Update copyrights for 2013	Bruce Momjian
	Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-08-30	Remove configure flag --disable-shared, as it is no longer used by any	Bruce Momjian
	port. The last use was QNX, per Peter Eisentraut.
2012-01-01	Update copyright notices for year 2012.	Bruce Momjian

2011-09-01	Remove unnecessary #include references, per pgrminclude script.	Bruce Momjian

2011-01-01	Stamp copyrights for year 2011.	Bruce Momjian

2010-11-23	Remove useless whitespace at end of lines	Peter Eisentraut

2010-09-22	Convert cvsignore to gitignore, and add .gitignore for build targets.	Magnus Hagander

2010-09-20	Remove cvs keywords from all files.	Magnus Hagander

2010-08-19	Remove extra newlines at end and beginning of files, add missing newlines	Peter Eisentraut
	at end of files.
2010-01-02	Update copyright for the year 2010.	Bruce Momjian

2009-08-28	Derived files that are shipped in the distribution used to be built in the	Peter Eisentraut
	source directory even for out-of-tree builds. They are now alsl built in the build tree. This should be more convenient for certain developers' workflows, and shouldn't really break anything else.
2009-08-26	Update of install-sh, mkinstalldirs, and associated configury	Peter Eisentraut
	Update install-sh to that from Autoconf 2.63, plus our Darwin-specific changes (which I simplified a bit). install-sh is now able to install multiple files in one run, so we could simplify our makefiles sometime. install-sh also now has a -d option to create directories, so we don't need mkinstalldirs anymore. Use AC_PROG_MKDIR_P in configure.in, so we can use mkdir -p when available instead of install-sh -d. For consistency with the rest of the world, the corresponding make variable has been renamed from $(mkinstalldirs) to $(MKDIR_P).
2009-01-01	Update copyright for 2009.	Bruce Momjian

2008-11-10	pg_do_encoding_conversion cannot return NULL (at least not unless the input	Tom Lane
	is NULL), so remove some useless tests for the case.
2008-04-07	Implement a few changes to how shared libraries and dynamically loadable	Peter Eisentraut
	modules are built. Foremost, it creates a solid distinction between these two types of targets based on what had already been implemented and duplicated in ad hoc ways before. Specifically, - Dynamically loadable modules no longer get a soname. The numbers previously set in the makefiles were dummy numbers anyway, and the presence of a soname upset a few packaging tools, so it is nicer not to have one. - The cumbersome detour taken on installation (build a libfoo.so.0.0.0 and then override the rule to install foo.so instead) is removed. - Lots of duplicated code simplified.
2008-03-21	More README src cleanups.	Bruce Momjian

2008-03-20	Make source code READMEs more consistent. Add CVS tags to all README files.	Bruce Momjian

2008-03-18	Catch all errors in for and while loops in makefiles. Don't ignore any	Peter Eisentraut
	errors in any commands, including in various clean targets that have so far been handled inconsistently. make -i is available to ignore all errors in a consistent and official way.
2008-01-01	Update copyrights in source tree to 2008.	Bruce Momjian

2007-11-15	Re-run pgindent with updated list of typedefs. (Updated README should	Bruce Momjian
	avoid this problem in the future.)
2007-11-15	pgindent run for 8.3.	Bruce Momjian

2007-10-27	Rename default text search parser's "uri" token type to "url_path",	Tom Lane
	per recommendation from Alvaro. This doesn't force initdb since the numeric token type in the catalogs doesn't change; but note that the expected regression test output changed.
2007-10-23	Rename and slightly redefine the default text search parser's "word"	Tom Lane
	categories, as per discussion. asciiword (formerly lword) is still ASCII-letters-only, and numword (formerly word) is still the most general mixed-alpha-and-digits case. But word (formerly nlword) is now any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as before. This is no worse than before for parsing mixed Russian/English text, which seems to have been the design center for the original coding; and it should simplify matters for parsing most European languages. In particular it will not be necessary for any language to accept strings containing digits as being regular "words". The hyphenated-word categories are adjusted similarly.
2007-09-07	Add turkish stopword list. Thanks to Devrim GUNDUZ <devrim@CommandPrompt.com>	Teodor Sigaev

2007-09-03	Improve stylistic consistency of descriptions of built-in objects by avoiding	Tom Lane
	initcap style --- the vast majority of the existing descriptions do not use an initial cap. I didn't change places where the first word was all-cap. initdb not forced because this doesn't change any regression test results.
2007-08-27	Fix generation of snowball_create.sql on msvc builds.	Magnus Hagander

2007-08-25	Rename built-in Snowball stemmer dictionaries to be english_stem,	Tom Lane
	russian_stem, etc. Per discussion.
2007-08-25	Cleanup for some problems in tsearch patch:	Tom Lane
	- ispell initialization crashed on empty dictionary file - ispell initialization crashed on affix file with prefixes but no suffixes - stop words file was run through pg_verify_mbstr, with database encoding, but it's supposed to be UTF-8; similar bug for synonym files - bunch of comments added, typos fixed, and other cleanup Introduced consistent encoding checking/conversion of data read from tsearch configuration files, by doing this in a single t_readline() subroutine (replacing direct usages of fgets). Cleaned up API for readstopwords too. Heikki Linnakangas
2007-08-22	Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by treating the	Tom Lane
	init options of the template as top-level options in the syntax. This also makes ALTER a bit easier to use, since options can be replaced individually. I also made these statements verify that the tmplinit method will accept the new settings before they get stored; in the original coding you didn't find out about mistakes until the dictionary got invoked. Under the hood, init methods now get options as a List of DefElem instead of a raw text string --- that lets tsearch use existing options-pushing code instead of duplicating functionality.
2007-08-21	Tsearch2 functionality migrates to core. The bulk of this work is by	Tom Lane
	Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.