summaryrefslogtreecommitdiff
path: root/src/backend/utils/misc/timeout.c
AgeCommit message (Collapse)Author
2016-05-27Be more predictable about reporting "lock timeout" vs "statement timeout".Tom Lane
If both timeout indicators are set when we arrive at ProcessInterrupts, we've historically just reported "lock timeout". However, some buildfarm members have been observed to fail isolationtester's timeouts test by reporting "lock timeout" when the statement timeout was expected to fire first. The cause seems to be that the process is allowed to sleep longer than expected (probably due to heavy machine load) so that the lock timeout happens before we reach the point of reporting the error, and then this arbitrary tiebreak rule does the wrong thing. We can improve matters by comparing the scheduled timeout times to decide which error to report. I had originally proposed greatly reducing the 1-second window between the two timeouts in the test cases. On reflection that is a bad idea, at least for the case where the lock timeout is expected to fire first, because that would assume that it takes negligible time to get from statement start to the beginning of the lock wait. Thus, this patch doesn't completely remove the risk of test failures on slow machines. Empirically, however, the case this handles is the one we are seeing in the buildfarm. The explanation may be that the other case requires the scheduler to take the CPU away from a busy process, whereas the case fixed here only requires the scheduler to not give the CPU back right away to a process that has been woken from a multi-second sleep (and, perhaps, has been swapped out meanwhile). Back-patch to 9.3 where the isolationtester timeouts test was added. Discussion: <8693.1464314819@sss.pgh.pa.us>
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-02-03Remove remnants of ImmediateInterruptOK handling.Andres Freund
Now that nothing sets ImmediateInterruptOK to true anymore, we can remove all the supporting code. Reviewed-By: Heikki Linnakangas
2015-01-14Add a default local latch for use in signal handlers.Andres Freund
To do so, move InitializeLatchSupport() into the new common process initialization functions, and add a new global variable MyLatch. MyLatch is usable as soon InitPostmasterChild() has been called (i.e. very early during startup). Initially it points to a process local latch that exists in all processes. InitProcess/InitAuxiliaryProcess then replaces that local latch with PGPROC->procLatch. During shutdown the reverse happens. This is primarily advantageous for two reasons: For one it simplifies dealing with the shared process latch, especially in signal handlers, because instead of having to check for MyProc, MyLatch can be used unconditionally. For another, a later patch that makes FEs/BE communication use latches, now can rely on the existence of a latch, even before having gone through InitProcess. Discussion: 20140927191243.GD5423@alap3.anarazel.de
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-12-13Don't let timeout interrupts happen unless ImmediateInterruptOK is set.Tom Lane
Serious oversight in commit 16e1b7a1b7f7ffd8a18713e83c8cd72c9ce48e07: we should not allow an interrupt to take control away from mainline code except when ImmediateInterruptOK is set. Just to be safe, let's adopt the same save-clear-restore dance that's been used for many years in HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad happens if a timeout handler invokes code that tests or even manipulates ImmediateInterruptOK. Per report of "stuck spinlock" failures from Christophe Pettus, though many other symptoms are possible. Diagnosis by Andres Freund.
2013-11-29Fix assorted race conditions in the new timeout infrastructure.Tom Lane
Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.
2013-03-17Improve signal-handler lockout mechanism in timeout.c.Tom Lane
Rather than doing a fairly-expensive setitimer() call to prevent interrupts from happening, let's just invent a simple boolean flag that the signal handler is required to check. This is not only faster but considerably more robust than before, since the previous code effectively assumed that only ITIMER_REAL events would ever fire the SIGALRM handler, which is obviously something that can be broken easily by third-party code. Zoltán Böszörményi and Tom Lane
2013-03-17Move pqsignal() to libpgport.Tom Lane
We had two copies of this function in the backend and libpq, which was already pretty bogus, but it turns out that we need it in some other programs that don't use libpq (such as pg_test_fsync). So put it where it probably should have been all along. The signal-mask-initialization support in src/backend/libpq/pqsignal.c stays where it is, though, since we only need that in the backend.
2013-03-16Add lock_timeout configuration parameter.Tom Lane
This GUC allows limiting the time spent waiting to acquire any one heavyweight lock. In support of this, improve the recently-added timeout infrastructure to permit efficiently enabling or disabling multiple timeouts at once. That reduces the performance hit from turning on lock_timeout, though it's still not zero. Zoltán Böszörményi, reviewed by Tom Lane, Stephen Frost, and Hari Babu
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-07-16Introduce timeout handling frameworkAlvaro Herrera
Management of timeouts was getting a little cumbersome; what we originally had was more than enough back when we were only concerned about deadlocks and query cancel; however, when we added timeouts for standby processes, the code got considerably messier. Since there are plans to add more complex timeouts, this seems a good time to introduce a central timeout handling module. External modules register their timeout handlers during process initialization, and later enable and disable them as they see fit using a simple API; timeout.c is in charge of keeping track of which timeouts are in effect at any time, installing a common SIGALRM signal handler, and calling setitimer() as appropriate to ensure timely firing of external handlers. timeout.c additionally supports pluggable modules to add their own timeouts, though this capability isn't exercised anywhere yet. Additionally, as of this commit, walsender processes are aware of timeouts; we had a preexisting bug there that made those ignore SIGALRM, thus being subject to unhandled deadlocks, particularly during the authentication phase. This has already been fixed in back branches in commit 0bf8eb2a, which see for more details. Main author: Zoltán Böszörményi Some review and cleanup by Álvaro Herrera Extensive reworking by Tom Lane