Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][SQLsmith] Stuck query in ReplicationSlotAllocationLock, can't be cancelled #11220

Open
def- opened this issue Jan 26, 2022 · 2 comments
Open
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/medium Medium priority issue qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures

Comments

@def-
Copy link
Contributor

def- commented Jan 26, 2022

Jira Link: DB-858

Description

After a few hours SQLsmith gets stuck in ReplicationSlotAllocationLock, I can reproduce it with the last run query:

select
  subq_0.c0 as c0,
  subq_0.c2 as c1,
  subq_0.c2 as c2,
  subq_0.c0 as c3
from
  (select
        82 as c0,
        ref_0.privilege_type as c1,
          pg_catalog.bit_or(
            cast(ref_2.attlen as int2)) over (partition by ref_0.object_schema order by ref_2.attinhcount) as c2,
        pg_catalog.pg_create_physical_replication_slot(
          cast(ref_2.attname as name),
          cast(ref_2.atthasmissing as bool),
          cast(pg_catalog.yb_is_database_colocated() as bool)) as c3
      from
        information_schema.role_usage_grants as ref_0
            inner join information_schema.sql_parts as ref_1
            on ((false)
                and ((cast(null as int8) <= cast(null as int2))
                  and (89 <= 34)))
          right join pg_catalog.pg_attribute as ref_2
          on (ref_2.attnum = ref_2.atttypmod)
      where true
      limit 95) as subq_0
where true
limit 20;

The query seems to stay around:

\x
select *
  from pg_stat_activity
  where datname = 'yugabyte';
-[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------
datid            | 13281
datname          | yugabyte
pid              | 46014
usesysid         | 13280
usename          | yugabyte
application_name | ysqlsh
client_addr      | 127.0.0.1
client_hostname  |
client_port      | 64372
backend_start    | 2022-01-26 14:21:25.051697+01
xact_start       | 2022-01-26 14:24:47.538128+01
query_start      | 2022-01-26 14:24:47.538128+01
state_change     | 2022-01-26 14:24:47.538181+01
wait_event_type  | LWLock
wait_event       | ReplicationSlotAllocationLock
state            | active
backend_xid      |
backend_xmin     | 799
query            | select
[...]
backend_type     | client backend

Cancelling with Ctrl-C has no effect.

@def- def- added the area/ysql Yugabyte SQL (YSQL) label Jan 26, 2022
@sushantrmishra
Copy link

sushantrmishra commented Jan 26, 2022

Creating the slot using pg_create_physical_replication_slot is required for Streaming/replicating changes from a physical slot in PostgreSQL. In YugabyteDB, replication does not use streaming/replication protocol used in PostgreSQL, hence this function is not used and also not supported.
Though we should return an more relevant error message in YugabyteDB when pg_create_physical_replication_slot is used.

@def- def- added the kind/bug This issue is a bug label Jan 26, 2022
@sushantrmishra sushantrmishra self-assigned this Jan 27, 2022
@def- def- changed the title [YSQL] Stuck query in ReplicationSlotAllocationLock, can't be cancelled [YSQL][SQLsmith] Stuck query in ReplicationSlotAllocationLock, can't be cancelled Feb 3, 2022
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Jun 8, 2022
@kripasreenivasan kripasreenivasan added the qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures label Sep 13, 2022
@yugabyte-ci yugabyte-ci added the kind/failing-test Tests and testing infra label Oct 12, 2022
@def-
Copy link
Contributor Author

def- commented Dec 28, 2022

Similar situation with pg_catalog.pg_replication_origin_xact_reset(), might also be expected:

yugabyte=# \x
Expanded display is on.
yugabyte=# SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------
pid     | 628719
age     |
usename |
query   |
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------
pid     | 663159
age     | 02:53:51.284042
usename | yugabyte
query   | select                                                                                                          +
        |   34 as c0,                                                                                                     +
        |   subq_0.c0 as c1,                                                                                              +
        |   subq_0.c4 as c2,                                                                                              +
        |   subq_0.c4 as c3,                                                                                              +
        |   pg_catalog.pg_replication_origin_xact_reset() as c4,                                                          +
        |                                                                                                                 +
        |     pg_catalog.var_samp(                                                                                        +
        |       cast((select feature_col from public.feature_tab_real limit 1 offset 1)                                   +
        |          as float4)) over (partition by subq_0.c2,subq_0.c5 order by subq_0.c6) as c5,                          +
        |   (select a from public.pp_enumpart limit 1 offset 6)                                                           +
        |      as c6,                                                                                                     +
        |   subq_0.c6 as c7,                                                                                              +
        |   subq_0.c5 as c8,                                                                                              +
        |   subq_0.c1 as c9,                                                                                              +
        |   dbms_pipe.create_pipe(                                                                                        +
        |     cast(cast(nullif(                                                                                           +
        |         pg_catalog.max(                                                                                         +
        |           cast(utl_file.tmpdir() as text)) over (partition by subq_0.c6,subq_0.c6 order by subq_0.c1,subq_0.c0),+
        |       oracle.sessiontimezone()) as text) as text)) as c10,                                                      +
        |   (select pg_catalog.max(feature_col) from public.feature_tab_time)                                             +
        |      as c11                                                                                                     +
        | from                                                                                                            +
        |   (select                                                                                                       +
        |         ref_0.b as c0,                                                                                          +
        |         ref_0.b as c1,                                                                                          +
        |         ref_0.b as c2,                                                                                          +
        |         ref_0.a as c3,                                                                                          +
        |         cast(coalesce(ref_0.b,                                                                                  +
        |           ref_0.b) as int4) as c4,                                                                              +
        |         ref_0.b as c5,                                                                                          +
        |         ref_0.a as c6                                                                                           +
        |       from                                                                                                      +
        |

yugabyte=# select *
  from pg_stat_activity
  where datname = 'yugabyte';
-[ RECORD 1 ]----+-----------------------------------------------------------------------------------------------------------------
datid            | 13288
datname          | yugabyte
pid              | 663159
usesysid         | 13287
usename          | yugabyte
application_name | sqlsmith::dut
client_addr      | 127.0.0.1
client_hostname  |
client_port      | 42266
backend_start    | 2022-12-28 20:26:41.171935+00
xact_start       | 2022-12-28 20:45:28.677106+00
query_start      | 2022-12-28 20:45:28.677177+00
state_change     | 2022-12-28 20:45:28.67742+00
wait_event_type  | LWLock
wait_event       | extension
state            | active
backend_xid      |
backend_xmin     | 28
query            | select                                                                                                          +
                 |   34 as c0,                                                                                                     +
                 |   subq_0.c0 as c1,                                                                                              +
                 |   subq_0.c4 as c2,                                                                                              +
                 |   subq_0.c4 as c3,                                                                                              +
                 |   pg_catalog.pg_replication_origin_xact_reset() as c4,                                                          +
                 |                                                                                                                 +
                 |     pg_catalog.var_samp(                                                                                        +
                 |       cast((select feature_col from public.feature_tab_real limit 1 offset 1)                                   +
                 |          as float4)) over (partition by subq_0.c2,subq_0.c5 order by subq_0.c6) as c5,                          +
                 |   (select a from public.pp_enumpart limit 1 offset 6)                                                           +
                 |      as c6,                                                                                                     +
                 |   subq_0.c6 as c7,                                                                                              +
                 |   subq_0.c5 as c8,                                                                                              +
                 |   subq_0.c1 as c9,                                                                                              +
                 |   dbms_pipe.create_pipe(                                                                                        +
                 |     cast(cast(nullif(                                                                                           +
                 |         pg_catalog.max(                                                                                         +
                 |           cast(utl_file.tmpdir() as text)) over (partition by subq_0.c6,subq_0.c6 order by subq_0.c1,subq_0.c0),+
                 |       oracle.sessiontimezone()) as text) as text)) as c10,                                                      +
                 |   (select pg_catalog.max(feature_col) from public.feature_tab_time)                                             +
                 |      as c11                                                                                                     +
                 | from                                                                                                            +
                 |   (select                                                                                                       +
                 |         ref_0.b as c0,                                                                                          +
                 |         ref_0.b as c1,                                                                                          +
                 |         ref_0.b as c2,                                                                                          +
                 |         ref_0.a as c3,                                                                                          +
                 |         cast(coalesce(ref_0.b,                                                                                  +
                 |           ref_0.b) as int4) as c4,                                                                              +
                 |         ref_0.b as c5,                                                                                          +
                 |         ref_0.a as c6                                                                                           +
                 |       from                                                                                                      +
                 |
backend_type     | client backend
-[ RECORD 2 ]----+-----------------------------------------------------------------------------------------------------------------
datid            | 13288
datname          | yugabyte
pid              | 673357
usesysid         | 13287
usename          | yugabyte
application_name | ysqlsh
client_addr      | 127.0.0.1
client_hostname  |
client_port      | 42320
backend_start    | 2022-12-28 23:37:47.377639+00
xact_start       | 2022-12-28 23:39:22.896183+00
query_start      | 2022-12-28 23:39:22.896183+00
state_change     | 2022-12-28 23:39:22.896257+00
wait_event_type  |
wait_event       |
state            | active
backend_xid      |
backend_xmin     | 28
query            | select *                                                                                                        +
                 |   from pg_stat_activity                                                                                         +
                 |   where datname = 'yugabyte';
backend_type     | client backend

The processes stay around on a yb-ctl stop:

[deen@devp yugabyte-db]$ ps aux|grep postgres
deen      628709  0.1  0.2 152892 47284 ?        Ss   Dec28   1:20 postgres: logger
deen      663159  0.4  0.7 448152 116352 ?       Ssl  Dec28   3:01 postgres: yugabyte yugabyte 127.0.0.1(42266) SELECT
deen      673500  0.0  0.4 366156 79008 ?        Ssl  Dec28   0:06 postgres: yugabyte yugabyte 127.0.0.1(42324) SELECT
deen      673710  0.0  0.4 371276 73728 ?        Ssl  Dec28   0:05 postgres: yugabyte yugabyte 127.0.0.1(42332) INSERT

This even hung yb-ctl destroy:

Destroying cluster.
2022-12-28 23:45:08,922 INFO: Stopping Server master-1 PID=628089
2022-12-28 23:45:08,922 INFO: Waiting for Server master-1 PID=628089 to stop...
2022-12-28 23:45:09,437 INFO: Stopping Server master-2 PID=628092
2022-12-28 23:45:09,437 INFO: Waiting for Server master-2 PID=628092 to stop...
2022-12-28 23:45:09,953 INFO: Stopping Server master-3 PID=628095
2022-12-28 23:45:09,953 INFO: Waiting for Server master-3 PID=628095 to stop...
2022-12-28 23:45:10,468 INFO: Stopping Server tserver-1 PID=628098
2022-12-28 23:45:10,469 INFO: Waiting for Server tserver-1 PID=628098 to stop...
2022-12-28 23:45:10,969 INFO: Stopping Postmaster for tserver-1 PID=628638
2022-12-28 23:45:10,969 INFO: Waiting for Postmaster for tserver-1 PID=628638 to stop...
2022-12-28 23:45:11,970 INFO: Waiting for Postmaster for tserver-1 PID=628638 to stop...
2022-12-28 23:45:12,971 INFO: Waiting for Postmaster for tserver-1 PID=628638 to stop...

It only reacts to a kill -9

[deen@devp ~]$ ps aux|grep 628638
deen      628638  0.0  0.4 278488 67084 ?        S    Dec28   0:01 /home/deen/code/yugabyte-db/build/release-clang15-linuxbrew-dynamic/postgres/bin/postgres -D /home/deen/yugabyte-data/node-1/disk-1/pg_data -p 5433 -h 127.0.0.1 -k /tmp/.yb.127.0.0.1:5433 -c unix_socket_permissions=0700 -c logging_collector=on -c log_directory=/home/deen/yugabyte-data/node-1/disk-1/yb-data/tserver/logs -c yb_pg_metrics.node_name=devp:9000 -c yb_pg_metrics.port=13000 -c config_file=/home/deen/yugabyte-data/node-1/disk-1/pg_data/ysql_pg.conf -c hba_file=/home/deen/yugabyte-data/node-1/disk-1/pg_data/ysql_hba.conf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/medium Medium priority issue qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures
Projects
None yet
Development

No branches or pull requests

5 participants