-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TimescaleDB Background Worker Scheduler crash with "malloc.c:2394: sysmalloc: Assertion ... failed" #3469
Comments
@jankatins can you get a backtrace of this? It appears this is some This could be a bug in timescaledb connection code which explicitly uses It could also be a postgresql race condition for |
This happens rarely, I cannot reproduce at will :-(. In this case, the above is the only output I have (happened on a jenkins). |
Some background, as I just saw it again in the same test: try:
cursor.execute("CREATE TABLE foohyper (time TIMESTAMPTZ NOT NULL, loc TEXT NOT NULL, temp FLOAT, humid FLOAT)")
cursor.execute("SELECT create_hypertable('foohyper', 'time')")
cursor.execute("INSERT INTO foohyper(time, loc, temp, humid) VALUES (NOW(), 'restaurant', 65.0, 66.0)")
cursor.execute("SELECT * FROM foohyper ORDER BY time DESC LIMIT 100")
assert cursor.fetchall()[0]["loc"] == "restaurant"
cursor.execute("SELECT * FROM _timescaledb_internal.get_os_info()")
assert cursor.fetchone()["sysname"] == "Linux"
cursor.execute("SELECT * FROM timescaledb_pre_restore()")
assert cursor.fetchone()["timescaledb_pre_restore"] is True
cursor.execute("SELECT * FROM timescaledb_post_restore()")
assert cursor.fetchone()["timescaledb_post_restore"] is True
finally:
> always_reconnecting_cursor.execute("DROP EXTENSION timescaledb CASCADE") It crashes in the last step. always_reconnecting_cursor is basically opening a new connection and cursor and then executes this. I also think (so no hard data :-() that I only saw it recently, so only with tsdb 2.4. At least twice in two days makes it at least more frequent in the 2.4 release. |
I wonder if it's related to #3434 |
We now saw it again in a simplified test case:
|
And we have a stack trace:
|
This looks kind of relevant https://www.postgresql.org/message-id/20161227.081023.1859085287098959176.t-ishii%40sraoss.co.jp |
The steps triggering this bug are:
|
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes timescale#3469
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes timescale#3469
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes timescale#3469
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes #3469
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes #3469
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes #3469
This release contains bug fixes since the 2.4.1 release. We deem it high priority to upgrade. **Bugfixes** * timescale#3437 Rename on all continuous aggregate objects * timescale#3469 Use signal-safe functions in signal handler * timescale#3520 Modify compression job processing logic * timescale#3527 Fix time_bucket_ng behaviour with origin argument * timescale#3532 Fix bootstrap with regresschecks disabled * timescale#3574 Fix failure on job execution by background worker * timescale#3590 Call cleanup functions on backend exit **Thanks** * @jankatins for reporting a crash with background workers * @LutzWeischerFujitsu for reporting an issue with bootstrap
This release contains bug fixes since the 2.4.1 release. We deem it high priority to upgrade. **Bugfixes** * #3437 Rename on all continuous aggregate objects * #3469 Use signal-safe functions in signal handler * #3520 Modify compression job processing logic * #3527 Fix time_bucket_ng behaviour with origin argument * #3532 Fix bootstrap with regresschecks disabled * #3574 Fix failure on job execution by background worker * #3590 Call cleanup functions on backend exit **Thanks** * @jankatins for reporting a crash with background workers * @LutzWeischerFujitsu for reporting an issue with bootstrap
This release contains bug fixes since the 2.4.1 release. We deem it high priority to upgrade. **Bugfixes** * #3437 Rename on all continuous aggregate objects * #3469 Use signal-safe functions in signal handler * #3520 Modify compression job processing logic * #3527 Fix time_bucket_ng behaviour with origin argument * #3532 Fix bootstrap with regresschecks disabled * #3574 Fix failure on job execution by background worker * #3590 Call cleanup functions on backend exit **Thanks** * @jankatins for reporting a crash with background workers * @LutzWeischerFujitsu for reporting an issue with bootstrap
Reporting errors using `ereport` can call `malloc()`, which is not signal-safe. Using `ereport()` in a signal handler can therefore cause `malloc()` to run nested inside `malloc()` if the termination handler is called in the middle of a `malloc()` call, which will trigger an assertion in `malloc()` that will take down the server. This commit fixes this by using the signal-safe `write_stderr()` inside the signal handlers for the background workers. Fixes #3469
This release contains bug fixes since the 2.4.1 release. We deem it high priority to upgrade. **Bugfixes** * #3437 Rename on all continuous aggregate objects * #3469 Use signal-safe functions in signal handler * #3520 Modify compression job processing logic * #3527 Fix time_bucket_ng behaviour with origin argument * #3532 Fix bootstrap with regresschecks disabled * #3574 Fix failure on job execution by background worker * #3590 Call cleanup functions on backend exit **Thanks** * @jankatins for reporting a crash with background workers * @LutzWeischerFujitsu for reporting an issue with bootstrap
Relevant system information:
postgres --version
): 12.7\dx
inpsql
): 2.4Describe the bug
During a automated test, we saw the following error in the log which lead to crashing tests. This error does not happen often (just saw it for the second time). The failing test was running
DROP EXTENSION timescaledb CASCADE
in a new connection at that time (unsure what else was running in the background). Later connections showed that the DB was was in recovery mode.The text was updated successfully, but these errors were encountered: