summaryrefslogtreecommitdiff
path: root/src/backend/access/transam/parallel.c
AgeCommit message (Collapse)Author
2018-11-23Add WL_EXIT_ON_PM_DEATH pseudo-event.Thomas Munro
Users of the WaitEventSet and WaitLatch() APIs can now choose between asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death. This reduces code duplication, since almost all callers want the latter. Repair all code that was previously ignoring postmaster death completely, or requesting the event but ignoring it, or requesting the event but then doing an unconditional PostmasterIsAlive() call every time through its event loop (which is an expensive syscall on platforms for which we don't have USE_POSTMASTER_DEATH_SIGNAL support). Assert that callers of WaitLatchXXX() under the postmaster remember to ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent future bugs. The only process that doesn't handle postmaster death is syslogger. It waits until all backends holding the write end of the syslog pipe (including the postmaster) have closed it by exiting, to be sure to capture any parting messages. By using the WaitEventSet API directly it avoids the new assertion, and as a by-product it may be slightly more efficient on platforms that have epoll(). Author: Thomas Munro Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com, https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
2018-10-09Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux).Thomas Munro
Originally committed as 15bc038f (plus some follow-ups), this was reverted in 28e07270 due to a problem discovered in parallel workers. This new version corrects that problem by sending the list of uncommitted enum values to parallel workers. Here follows the original commit message describing the change: To prevent possibly breaking indexes on enum columns, we must keep uncommitted enum values from getting stored in tables, unless we can be sure that any such column is new in the current transaction. Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE from being executed at all in a transaction block, unless the target enum type had been created in the current transaction. This patch removes that restriction, and instead insists that an uncommitted enum value can't be referenced unless it belongs to an enum type created in the same transaction as the value. Per discussion, this should be a bit less onerous. It does require each function that could possibly return a new enum value to SQL operations to check this restriction, but there aren't so many of those that this seems unmaintainable. Author: Andrew Dunstan and Tom Lane, with parallel query fix by Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D0Ei7g6PaNTbcmAh9tCRahQrk%3Dr5ZWLD-jr7hXweYX3yg%40mail.gmail.com Discussion: https://postgr.es/m/4075.1459088427%40sss.pgh.pa.us
2018-10-06Propagate xactStartTimestamp and stmtStartTimestamp to parallel workers.Tom Lane
Previously, a worker process would establish values for these based on its own start time. In v10 and up, this can trivially be shown to cause misbehavior of transaction_timestamp(), timestamp_in(), and related functions which are (perhaps unwisely?) marked parallel-safe. It seems likely that other behaviors might diverge from what happens in the parent as well. It's not as trivial to demonstrate problems in 9.6 or 9.5, but I'm sure it's still possible, so back-patch to all branches containing parallel worker infrastructure. In HEAD only, mark now() and statement_timestamp() as parallel-safe (other affected functions already were). While in theory we could still squeeze that change into v11, it doesn't seem important enough to force a last-minute catversion bump. Konstantin Knizhnik, whacked around a bit by me Discussion: https://postgr.es/m/6406dbd2-5d37-4cb6-6eb2-9c44172c7e7c@postgrespro.ru
2018-09-20Defer restoration of libraries in parallel workers.Thomas Munro
Several users of extensions complained of crashes in parallel workers that turned out to be due to syscache access from their _PG_init() functions. Reorder the initialization of parallel workers so that libraries are restored after the caches are initialized, and inside a transaction. This was reported in bug #15350 and elsewhere. We don't consider it to be a bug: extensions shouldn't do that, because then they can't be used in shared_preload_libraries. However, it's a fairly obscure hazard and these extensions worked in practice before parallel query came along. So let's make it work. Later commits might add a warning message and eventually an error. Back-patch to 9.6, where parallel query landed. Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Kieran McCusker, Jimmy Discussion: https://postgr.es/m/153512195228.1489.8545997741965926448%40wrigleys.postgresql.org
2018-08-10Handle parallel index builds on mapped relations.Peter Geoghegan
Commit 9da0cc35284, which introduced parallel CREATE INDEX, failed to propagate relmapper.c backend local cache state to parallel worker processes. This could result in parallel index builds against mapped catalog relations where the leader process (participating as a worker) scans the new, pristine relfilenode, while worker processes scan the obsolescent relfilenode. When this happened, the final index structure was typically not consistent with the owning table's structure. The final index structure could contain entries formed from both heap relfilenodes. Only rebuilds on mapped catalog relations that occur as part of a VACUUM FULL or CLUSTER could become corrupt in practice, since their mapped relation relfilenode swap is what allows the inconsistency to arise. On master, fix the problem by propagating the required relmapper.c backend state as part of standard parallel initialization (Cf. commit 29d58fd3). On v11, simply disallow builds against mapped catalog relations by deeming them parallel unsafe. Author: Peter Geoghegan Reported-By: "death lock" Reviewed-By: Tom Lane, Amit Kapila Bug: #15309 Discussion: https://postgr.es/m/153329671686.1405.18298309097348420351@wrigleys.postgresql.org Backpatch: 11-, where parallel CREATE INDEX was introduced.
2018-07-18Use a ResourceOwner to track buffer pins in all cases.Tom Lane
Historically, we've allowed auxiliary processes to take buffer pins without tracking them in a ResourceOwner. However, that creates problems for error recovery. In particular, we've seen multiple reports of assertion crashes in the startup process when it gets an error while holding a buffer pin, as for example if it gets ENOSPC during a write. In a non-assert build, the process would simply exit without releasing the pin at all. We've gotten away with that so far just because a failure exit of the startup process translates to a database crash anyhow; but any similar behavior in other aux processes could result in stuck pins and subsequent problems in vacuum. To improve this, institute a policy that we must *always* have a resowner backing any attempt to pin a buffer, which we can enforce just by removing the previous special-case code in resowner.c. Add infrastructure to make it easy to create a process-lifespan AuxProcessResourceOwner and clear out its contents at appropriate times. Replace existing ad-hoc resowner management in bgwriter.c and other aux processes with that. (Thus, while the startup process gains a resowner where it had none at all before, some other aux process types are replacing an ad-hoc resowner with this code.) Also use the AuxProcessResourceOwner to manage buffer pins taken during StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap process or a standalone backend rather than a true auxiliary process. In passing, remove some other ad-hoc resource owner creations that had gotten cargo-culted into various other places. As far as I can tell that was all unnecessary, and if it had been necessary it was incomplete, due to lacking any provision for clearing those resowners later. (Also worth noting in this connection is that a process that hasn't called InitBufferPoolBackend has no business accessing buffers; so there's more to do than just add the resowner if we want to touch buffers in processes not covered by this patch.) Although this fixes a very old bug, no back-patch, because there's no evidence of any significant problem in non-assert builds. Patch by me, pursuant to a report from Justin Pryzby. Thanks to Robert Haas and Kyotaro Horiguchi for reviews. Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com
2018-07-10Remove dynamic_shared_memory_type=nonePeter Eisentraut
PostgreSQL nowadays offers some kind of dynamic shared memory feature on all supported platforms. Having the choice of "none" prevents us from relying on DSM in core features. So this patch removes the choice of "none". Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
2018-04-05Allow background workers to bypass datallowconnMagnus Hagander
THis adds a "flags" field to the BackgroundWorkerInitializeConnection() and BackgroundWorkerInitializeConnectionByOid(). For now only one flag, BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore datallowconn.
2018-02-02Be more wary about shm_toc_lookup failure.Tom Lane
Commit 445dbd82a basically missed the point of commit d46633506, which was that we shouldn't allow shm_toc_lookup() failure to lead to a core dump or assertion crash, because the odds of such a failure should never be considered negligible. It's correct that we can't expect the PARALLEL_KEY_ERROR_QUEUE TOC entry to be there if we have no workers. But if we have no workers, we're not going to do anything in this function with the lookup result anyway, so let's just skip it. That lets the code use the easy-to-prove-safe noError=false case, rather than anything requiring effort to review. Back-patch to v10, like the previous commit. Discussion: https://postgr.es/m/3647.1517601675@sss.pgh.pa.us
2018-02-02Support parallel btree index builds.Robert Haas
To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02Add new function WaitForParallelWorkersToAttach.Robert Haas
Once this function has been called, we know that all workers have started and attached to their error queues -- so if any of them subsequently exit uncleanly, we'll be sure to throw an ERROR promptly. Otherwise, users of the ParallelContext machinery must be careful not to wait forever for a worker that has failed to start. Parallel query manages to work without needing this for reasons explained in new comments added by this patch, but it's a useful primitive for other parallel operations, such as the pending patch to make creating a btree index run in parallel. Amit Kapila, revised by me. Additional review by Peter Geoghegan. Discussion: http://postgr.es/m/CAA4eK1+e2MzyouF5bg=OtyhDSX+=Ao=3htN=T-r_6s3gCtKFiw@mail.gmail.com
2018-01-23Report an ERROR if a parallel worker fails to start properly.Robert Haas
Commit 28724fd90d2f85a0573a8107b48abad062a86d83 fixed things so that if a background worker fails to start due to fork() failure or because it is terminated before startup succeeds, BGWH_STOPPED will be reported. However, that only helps if the code that uses the background worker machinery notices the change in status, and the code in parallel.c did not. To fix that, do two things. First, make sure that when a worker exits, it triggers the leader to read from error queues. That way, if a worker which has attached to an error queue exits uncleanly, the leader is sure to throw some error, either the contents of the ErrorResponse sent by the worker, or "lost connection to parallel worker" if it exited without sending one. To cover the case where the worker never starts up in the first place or exits before attaching to the error queue, the ParallelContext now keeps track of which workers have sent at least one message via the error queue. A worker which sends no messages by the time the parallel operation finishes will be checked to see whether it exited before attaching to the error queue; if so, a new error message, "parallel worker failed to initialize", will be reported. If not, we'll continue to wait until it either starts up and exits cleanly, starts up and exits uncleanly, or fails to start, and then take the appropriate action. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com
2018-01-19Transfer state pertaining to pending REINDEX operations to workers.Robert Haas
This will allow the pending patch for parallel CREATE INDEX to work on system catalogs, and to provide the same level of protection against use of user indexes while they are being rebuilt that we have for non-parallel CREATE INDEX. Patch by me, reviewed by Peter Geoghegan. Discussion: http://postgr.es/m/CA+TgmoYN-YQU9JsGQcqFLovZ-C+Xgp1_xhJQad=cunGG-_p5gg@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wzkv4UNkXYhqQRqk-u9rS7h5c-4cCW+EqQ8K_WSeS43aZg@mail.gmail.com
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-11-28Fix ReinitializeParallelDSM to tolerate finding no error queues.Robert Haas
Commit d4663350646ca0c069a36d906155a0f7e3372eb7 changed things so that shm_toc_lookup would fail with an error rather than silently returning NULL in the hope that such failures would be reported in a useful way rather than via a system crash. However, it overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE in ReinitializeParallelDSM is expected to fail when no DSM segment was created in the first place; in that case, we end up with a backend-private memory segment that still contains an entry for PARALLEL_KEY_FIXED but no others. Consequently a benign failure to initialize parallelism can escalate into an elog(ERROR); repair. Discussion: http://postgr.es/m/CA+Tgmob8LFw55DzH1QEREpBEA9RJ_W_amhBFCVZ6WMwUhVpOqg@mail.gmail.com
2017-10-29Fix problems with the "role" GUC and parallel query.Robert Haas
Without this fix, dropping a role can sometimes result in parallel query failures in sessions that have used "SET ROLE" to assume the dropped role, even if that setting isn't active any more. Report by Pavan Deolasee. Patch by Amit Kapila, reviewed by me. Discussion: http://postgr.es/m/CABOikdOomRcZsLsLK+Z+qENM1zxyaWnAvFh3MJZzZnnKiF+REg@mail.gmail.com
2017-10-11Replace remaining uses of pq_sendint with pq_sendint{8,16,32}.Andres Freund
pq_sendint() remains, so extension code doesn't unnecessarily break. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
2017-09-29Add background worker typePeter Eisentraut
Add bgw_type field to background worker structure. It is intended to be set to the same value for all workers of the same type, so they can be grouped in pg_stat_activity, for example. The backend_type column in pg_stat_activity now shows bgw_type for a background worker. The ps listing also no longer calls out that a process is a background worker but just show the bgw_type. That way, being a background worker is more of an implementation detail now that is not shown to the user. However, most log messages still refer to 'background worker "%s"'; otherwise constructing sensible and translatable log messages would become tricky. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2017-09-14Add support for coordinating record typmods among parallel workers.Andres Freund
Tuples can have type RECORDOID and a typmod number that identifies a blessed TupleDesc in a backend-private cache. To support the sharing of such tuples through shared memory and temporary files, provide a typmod registry in shared memory. To achieve that, introduce per-session DSM segments, created on demand when a backend first runs a parallel query. The per-session DSM segment has a table-of-contents just like the per-query DSM segment, and initially the contents are a shared record typmod registry and a DSA area to provide the space it needs to grow. State relating to the current session is accessed via a Session object reached through global variable CurrentSession that may require significant redesign further down the road as we figure out what else needs to be shared or remodelled. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-31Clean up shm_mq cleanup.Tom Lane
The logic around shm_mq_detach was a few bricks shy of a load, because (contrary to the comments for shm_mq_attach) all it did was update the shared shm_mq state. That left us leaking a bit of process-local memory, but much worse, the on_dsm_detach callback for shm_mq_detach was still armed. That means that whenever we ultimately detach from the DSM segment, we'd run shm_mq_detach again for already-detached, possibly long-dead queues. This accidentally fails to fail today, because we only ever re-use a shm_mq's memory for another shm_mq, and multiple detach attempts on the last such shm_mq are fairly harmless. But it's gonna bite us someday, so let's clean it up. To do that, change shm_mq_detach's API so it takes a shm_mq_handle not the underlying shm_mq. This makes the callers simpler in most cases anyway. Also fix a few places in parallel.c that were just pfree'ing the handle structs rather than doing proper cleanup. Back-patch to v10 because of the risk that the revenant shm_mq_detach callbacks would cause a live bug sometime. Since this is an API change, it's too late to do it in 9.6. (We could make a variant patch that preserves API, but I'm not excited enough to do that.) Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane
The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-06Clean up latch related code.Andres Freund
The larger part of this patch replaces usages of MyProc->procLatch with MyLatch. The latter works even early during backend startup, where MyProc->procLatch doesn't yet. While the affected code shouldn't run in cases where it's not initialized, it might get copied into places where it might. Using MyLatch is simpler and a bit faster to boot, so there's little point to stick with the previous coding. While doing so I noticed some weaknesses around newly introduced uses of latches that could lead to missed events, and an omitted CHECK_FOR_INTERRUPTS() call in worker_spi. As all the actual bugs are in v10 code, there doesn't seem to be sufficient reason to backpatch this. Author: Andres Freund Discussion: https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de Backpatch: -
2017-06-05Don't be so trusting that shm_toc_lookup() will always succeed.Tom Lane
Given the possibility of race conditions and so on, it seems entirely unsafe to just assume that shm_toc_lookup() always finds the key it's looking for --- but that was exactly what all but one call site were doing. To fix, add a "bool noError" argument, similarly to what we have in many other functions, and throw an error on an unexpected lookup failure. Remove now-redundant Asserts that a rather random subset of call sites had. I doubt this will throw any light on buildfarm member lorikeet's recent failures, because if an unnoticed lookup failure were involved, you'd kind of expect a null-pointer-dereference crash rather than the observed symptom. But you never know ... and this is better coding practice even if it never catches anything. Discussion: https://postgr.es/m/9697.1496675981@sss.pgh.pa.us
2017-04-16Ensure BackgroundWorker struct contents are well-defined.Tom Lane
Coverity complained because bgw.bgw_extra wasn't being filled in by ApplyLauncherRegister(). The most future-proof fix is to memset the whole BackgroundWorker struct to zeroes. While at it, let's apply the same coding rule to other places that set up BackgroundWorker structs; four out of five had the same or related issues.
2017-04-14Avoid passing function pointers across process boundaries.Tom Lane
We'd already recognized that we can't pass function pointers across process boundaries for functions in loadable modules, since a shared library could get loaded at different addresses in different processes. But actually the practice doesn't work for functions in the core backend either, if we're using EXEC_BACKEND. This is the cause of recent failures on buildfarm member culicidae. Switch to passing a string function name in all cases. Something like this needs to be back-patched into 9.6, but let's see if the buildfarm likes it first. Petr Jelinek, with a bunch of basically-cosmetic adjustments by me Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com
2017-03-31Don't use bgw_main even to specify in-core bgworker entrypoints.Robert Haas
On EXEC_BACKEND builds, this can fail if ASLR is in use. Backpatch to 9.5. On master, completely remove the bgw_main field completely, since there is no situation in which it is safe for an EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid breaking things for third-party code that doesn't care about working under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker entrypoints. Petr Jelinek, reviewed by me. Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-12-02Add max_parallel_workers GUC.Robert Haas
Increase the default value of the existing max_worker_processes GUC from 8 to 16, and add a new max_parallel_workers GUC with a maximum of 8. This way, even if the maximum amount of parallel query is happening, there is still room for background workers that do other things, as originally envisioned when max_worker_processes was added. Julien Rouhaud, reviewed by Amit Kapila and by revised by me.
2016-10-04Extend framework from commit 53be0b1ad to report latch waits.Robert Haas
WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an additional wait_event_info parameter; legal values are defined in pgstat.h. This makes it possible to uniquely identify every point in the core code where we are waiting for a latch; extensions can pass WAIT_EXTENSION. Because latches were the major wait primitive not previously covered by this patch, it is now possible to see information in pg_stat_activity on a large number of important wait events not previously addressed, such as ClientRead, ClientWrite, and SyncRep. Unfortunately, many of the wait events added by this patch will fail to appear in pg_stat_activity because they're only used in background processes which don't currently appear in pg_stat_activity. We should fix this either by creating a separate view for such information, or else by deciding to include them in pg_stat_activity after all. Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and Thomas Munro.
2016-08-27Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-26Fix potential memory leakage from HandleParallelMessages().Tom Lane
HandleParallelMessages leaked memory into the caller's context. Since it's called from ProcessInterrupts, there is basically zero certainty as to what CurrentMemoryContext is, which means we could be leaking into long-lived contexts. Over the processing of many worker messages that would grow to be a problem. Things could be even worse than just a leak, if we happened to service the interrupt while ErrorContext is current: elog.c thinks it can reset that on its own whim, possibly yanking storage out from under HandleParallelMessages. Give HandleParallelMessages its own dedicated context instead, which we can reset during each call to ensure there's no accumulation of wasted memory. Discussion: <16610.1472222135@sss.pgh.pa.us>
2016-08-26Fix logic for adding "parallel worker" context line to worker errors.Tom Lane
The previous coding here was capable of adding a "parallel worker" context line to errors that were not, in fact, returned from a parallel worker. Instead of using an errcontext callback to add that annotation, just paste it onto the message by hand; this looks uglier but is more reliable. Discussion: <19757.1472151987@sss.pgh.pa.us>
2016-08-02Block interrupts during HandleParallelMessages().Tom Lane
As noted by Alvaro, there are CHECK_FOR_INTERRUPTS() calls in the shm_mq.c functions called by HandleParallelMessages(). I believe they're all unreachable since we always pass nowait = true, but it doesn't seem like a great idea to assume that no such call will ever be reachable from HandleParallelMessages(). If that did happen, there would be a risk of a recursive call to HandleParallelMessages(), which it does not appear to be designed for --- for example, there's nothing that would prevent out-of-order processing of received messages. And certainly such cases cannot easily be tested. So let's prevent it by holding off interrupts for the duration of the function. Back-patch to 9.5 which contains identical code. Discussion: <14869.1470083848@sss.pgh.pa.us>
2016-08-01Minor cleanup for access/transam/parallel.c.Tom Lane
ParallelMessagePending *must* be marked volatile, because it's set by a signal handler. On the other hand, it's pointless for HandleParallelMessageInterrupt to save/restore errno; that must be, and is, done at the outer level of the SIGUSR1 signal handler. Calling CHECK_FOR_INTERRUPTS() inside HandleParallelMessages, which itself is called from CHECK_FOR_INTERRUPTS(), seems both useless and hazardous. The comment claiming that this is needed to handle the error queue going away is certainly misguided, in any case. Improve a couple of error message texts, and use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to report loss of parallel worker connection, since that's what's used in e.g. tqueue.c. (Maybe it would be worth inventing a dedicated ERRCODE for this type of failure? But I do not think ERRCODE_INTERNAL_ERROR is appropriate.) Minor stylistic cleanups.
2016-06-30Fix several mistakes around parallel workers and client_encoding.Robert Haas
Previously, workers sent data to the leader using the client encoding. That mostly worked, but the leader the converted the data back to the server encoding. Since not all encoding conversions are reversible, that could provoke failures. Fix by using the database encoding for all communication between worker and leader. Also, while temporary changes to GUC settings, as from the SET clause of a function, are in general OK for parallel query, changing client_encoding this way inside of a parallel worker is not OK. Previously, that would have confused the leader; with these changes, it would not confuse the leader, but it wouldn't do anything either. So refuse such changes in parallel workers. Also, the previous code naively assumed that when it received a NotifyResonse from the worker, it could pass that directly back to the user. But now that worker-to-leader communication always uses the database encoding, that's clearly no longer correct - though, actually, the old way was always broken for V2 clients. So disassemble and reconstitute the message instead. Issues reported by Peter Eisentraut. Patch by me, reviewed by Peter Eisentraut.
2016-06-17Remove PID from 'parallel worker' context message.Robert Haas
Discussion: <bfd204ab-ab1a-792a-b345-0274a09a4b5f@2ndquadrant.com>
2016-06-16Fix fuzzy thinking in ReinitializeParallelDSM().Tom Lane
The fact that no workers were successfully launched in the previous iteration does not excuse us from setting up properly to try again. This appears to explain crashes I saw in parallel regression testing due to error_mqh being NULL when it shouldn't be. Minor other cosmetic fixes too.
2016-06-09Improve the situation for parallel query versus temp relations.Tom Lane
Transmit the leader's temp-namespace state to workers. This is important because without it, the workers do not really have the same search path as the leader. For example, there is no good reason (and no extant code either) to prevent a worker from executing a temp function that the leader created previously; but as things stood it would fail to find the temp function, and then either fail or execute the wrong function entirely. We still prohibit a worker from creating a temp namespace on its own. In effect, a worker can only see the session's temp namespace if the leader had created it before starting the worker, which seems like the right semantics. Also, transmit the leader's BackendId to workers, and arrange for workers to use that when determining the physical file path of a temp relation belonging to their session. While the original intent was to prevent such accesses entirely, there were a number of holes in that, notably in places like dbsize.c which assume they can safely access temp rels of other sessions anyway. We might as well get this right, as a small down payment on someday allowing workers to access the leader's temp tables. (With this change, directly using "MyBackendId" as a relation or buffer backend ID is deprecated; you should use BackendIdForTempRelations() instead. I left a couple of such uses alone though, as they're not going to be reachable in parallel workers until we do something about localbuf.c.) Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down into localbuf.c, which is where it actually matters, instead of having it in relation_open(). This amounts to recognizing that access to temp tables' catalog entries is perfectly safe in a worker, it's only the data in local buffers that is problematic. Having done all that, we can get rid of the test in has_parallel_hazard() that says that use of a temp table's rowtype is unsafe in parallel workers. That test was unduly expensive, and if we really did need such a prohibition, that was not even close to being a bulletproof guard for it. (For example, any user-defined function executed in a parallel worker might have attempted such access.)
2016-06-09pgindent run for 9.6Robert Haas
2016-05-06Use mul_size when multiplying by the number of parallel workers.Robert Haas
That way, if the result overflows size_t, you'll get an error instead of undefined behavior, which seems like a plus. This also has the effect of casting the number of workers from int to Size, which is better because it's harder to overflow int than size_t. Dilip Kumar reported this issue and provided a patch upon which this patch is based, but his version did use mul_size.
2016-03-04Minor optimizations based on ParallelContext having nworkers_launched.Robert Haas
Originally, we didn't have nworkers_launched, so code that used parallel contexts had to be preprared for the possibility that not all of the workers requested actually got launched. But now we can count on knowing the number of workers that were successfully launched, which can shave off a few cycles and simplify some code slightly. Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter Geoghegan.
2016-02-07Introduce a new GUC force_parallel_mode for testing purposes.Robert Haas
When force_parallel_mode = true, we enable the parallel mode restrictions for all queries for which this is believed to be safe. For the subset of those queries believed to be safe to run entirely within a worker, we spin up a worker and run the query there instead of running it in the original process. When force_parallel_mode = regress, make additional changes to allow the regression tests to run cleanly even though parallel workers have been injected under the hood. Taken together, this facilitates both better user testing and better regression testing of the parallelism code. Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07Introduce group locking to prevent parallel processes from deadlocking.Robert Haas
For locking purposes, we now regard heavyweight locks as mutually non-conflicting between cooperating parallel processes. There are some possible pitfalls to this approach that are not to be taken lightly, but it works OK for now and can be changed later if we find a better approach. Without this, it's very easy for parallel queries to silently self-deadlock if the user backend holds strong relation locks. Robert Haas, with help from Amit Kapila. Thanks to Noah Misch and Andres Freund for extensive discussion of possible issues with this approach.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-11-16Message improvementsPeter Eisentraut
2015-11-05Pass extra data to bgworkers, and use this to fix parallel contexts.Robert Haas
Up until now, the total amount of data that could be passed to a background worker at startup was one datum, which can be a small as 4 bytes on some systems. That's enough to pass a dsm_handle or an array index, but not much else. Add a bgw_extra flag to the BackgroundWorker struct, allowing up to 128 bytes to be passed to a new worker on any platform. Use this to fix a problem I recently discovered with the parallel context machinery added in 9.5: the master assigns each worker an array index, and each worker subsequently assigns itself an array index, and there's nothing to guarantee that the two sets of indexes match, leading to chaos. Normally, I would not back-patch the change to add bgw_extra, since it is basically a feature addition. However, since 9.5 is still in beta and there seems to be no other sensible way to repair the broken parallel context machinery, back-patch to 9.5. Existing background worker code can ignore the bgw_extra field without a problem, but might need to be recompiled since the structure size has changed. Report and patch by me. Review by Amit Kapila.
2015-10-30Update parallel executor support to reuse the same DSM.Robert Haas
Commit b0b0d84b3d663a148022e900ebfc164284a95f55 purported to make it possible to relaunch workers using the same parallel context, but it had an unpleasant race condition: we might reinitialize after the workers have sent their last control message but before they have dettached the DSM, leaving to crashes. Repair by introducing a new ParallelContext operation, ReinitializeParallelDSM. Adjust execParallel.c to use this new support, so that we can rescan a Gather node by relaunching workers but without needing to recreate the DSM. Amit Kapila, with some adjustments by me. Extracted from latest parallel sequential scan patch.
2015-10-22Fix typos in comments.Robert Haas
CharSyam
2015-10-16Allow a parallel context to relaunch workers.Robert Haas
This may allow some callers to avoid the overhead involved in tearing down a parallel context and then setting up a new one, which means releasing the DSM and then allocating and populating a new one. I suspect we'll want to revise the Gather node to make use of this new capability, but even if not it may be useful elsewhere and requires very little additional code.