Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: meta doesn't unpin epoch when creating MV fails #7657

Closed
Tracked by #6640
zwang28 opened this issue Feb 2, 2023 · 1 comment · Fixed by #7707
Closed
Tracked by #6640

bug: meta doesn't unpin epoch when creating MV fails #7657

zwang28 opened this issue Feb 2, 2023 · 1 comment · Fixed by #7707
Assignees
Labels
component/meta Meta related issue. found-by-longevity-test type/bug Something isn't working

Comments

@zwang28
Copy link
Contributor

zwang28 commented Feb 2, 2023

Describe the bug

Meta node will pin an epoch for MV creation. But it doesn't unpin it if MV creation fails.
This epoch remains the min_pin_epoch and prevent storage from GC stale KVs.

Found in longevity-20230201-170952
Screen Shot 2023-02-02 at 11 13 42 AM

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@zwang28 zwang28 added type/bug Something isn't working found-by-longevity-test component/meta Meta related issue. labels Feb 2, 2023
@github-actions github-actions bot added this to the release-0.1.17 milestone Feb 2, 2023
@hzxa21
Copy link
Collaborator

hzxa21 commented Feb 6, 2023

related to #5606?

@zwang28 zwang28 self-assigned this Feb 6, 2023
@mergify mergify bot closed this as completed in #7707 Feb 6, 2023
mergify bot pushed a commit that referenced this issue Feb 6, 2023
Fix #7657.

We know MV creation failure will either trigger a recovery, or trigger a meta panic if enable_recovery is false.
- For the former case, this PR unpins all snapshot owned by meta node, before recovery starts.
- For the latter case, hummock manager is already made to unpin all snapshot owned by meta node during initialization.

Approved-By: BugenZhao
Approved-By: yezizp2012
Little-Wallace added a commit to Little-Wallace/risingwave that referenced this issue Feb 8, 2023
commit 61afe87
Author: Little-Wallace <[email protected]>
Date:   Tue Feb 7 13:06:38 2023 +0800

    fix conflict

    Signed-off-by: Little-Wallace <[email protected]>

commit 5397385
Merge: 0f7bfa2 1193d53
Author: Wallace <[email protected]>
Date:   Tue Feb 7 12:23:46 2023 +0800

    Merge branch 'main' into refactor-bloom-filter

commit 1193d53
Author: August <[email protected]>
Date:   Mon Feb 6 22:30:07 2023 +0800

    fix: grant default CONNECT action for new created user (risingwavelabs#7716)

    To fix risingwavelabs#7596 , since we don't have concept of PUBLIC group yet, for newly create user, this PR will simply grant CONNECT action of the current database in session. Note that, like PostgreSQL we still need to grant other actions for new user, so that he can do more operation rather than just connect.
    ```sql
    ~ psql -h localhost -p 4566 -d dev -U root
    psql (14.5 (Homebrew), server 9.5.0)
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.

    dev=> create table a (v1 integer);
    CREATE_TABLE
    dev=> create user xxx with password 'abc';
    CREATE_USER
    dev=> \q
    ~ psql -h localhost -p 4566 -d dev -U xxx
    Password for user xxx:
    psql (14.5 (Homebrew), server 9.5.0)
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.

    dev=> select * from a;
    ERROR:  QueryError: Permission denied: Do not have the privilege
    dev=> \q
    ~ psql -h localhost -p 4566 -d dev -U root
    psql (14.5 (Homebrew), server 9.5.0)
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.

    dev=> create materialized view mv1 as select * from a;
    CREATE_MATERIALIZED_VIEW
    dev=> grant all on materialized view mv1 to xxx;
    GRANT_PRIVILEGE
    dev=> insert into a values (1),(2),(3);
    INSERT 0 3
    dev=> \q
    ~ psql -h localhost -p 4566 -d dev -U xxx
    Password for user xxx:
    psql (14.5 (Homebrew), server 9.5.0)
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.

    dev=> select * from mv1;
    v1
    ----
    3
    2
    1
    (3 rows)
    ```

    Approved-By: chenzl25
    Approved-By: xxchan

commit 4f54a70
Author: Noel Kwan <[email protected]>
Date:   Mon Feb 6 21:25:11 2023 +0800

    chore(sqlsmith): reduce query complexity (risingwavelabs#7709)

    - Avoid generating so many `GROUP BY` and cross joins.
    - Reduce probability of generating `WITH` CTE.

    Approved-By: lmatz

    Co-Authored-By: Noel Kwan <[email protected]>
    Co-Authored-By: Noel Kwan <[email protected]>

commit b205308
Author: Eric Fu <[email protected]>
Date:   Mon Feb 6 20:11:18 2023 +0800

    fix: revert 7690 and redesign for CI pipeline (risingwavelabs#7721)

    risingwavelabs#7690 doesn't work for CI pipelines because BuildKite only checks out the sources without `.git`.

    Approved-By: huangjw806
    Approved-By: wangrunji0408

commit 261c914
Author: zwang28 <[email protected]>
Date:   Mon Feb 6 18:44:13 2023 +0800

    fix(meta): unpin snapshot on MV creation failure (risingwavelabs#7707)

    Fix risingwavelabs#7657.

    We know MV creation failure will either trigger a recovery, or trigger a meta panic if enable_recovery is false.
    - For the former case, this PR unpins all snapshot owned by meta node, before recovery starts.
    - For the latter case, hummock manager is already made to unpin all snapshot owned by meta node during initialization.

    Approved-By: BugenZhao
    Approved-By: yezizp2012

commit 3b38db2
Author: ioperations <[email protected]>
Date:   Mon Feb 6 18:18:30 2023 +0800

     feat(binder): support current_timestamp (risingwavelabs#7695)

    Approved-By: fuyufjh
    Approved-By: xxchan

    Co-Authored-By: aodong.qin <[email protected]>
    Co-Authored-By: ioperations <[email protected]>

commit b7a3448
Author: Dylan <[email protected]>
Date:   Mon Feb 6 17:36:33 2023 +0800

    fix(optimizer): fix distributed batch lookup join shuffle columns (risingwavelabs#7713)

    - As title.

    Approved-By: BugenZhao

commit b7507e5
Author: Shanicky Chen <[email protected]>
Date:   Mon Feb 6 17:08:13 2023 +0800

    chore: use short connection in etcd election client (risingwavelabs#7711)

    This PR changes the connection in the ETCD election client to a short connection to resolve a stuck etcd master migration in case of auth enable.

    Approved-By: yezizp2012

commit a9af7f9
Author: Yuanxin Cao <[email protected]>
Date:   Mon Feb 6 16:06:45 2023 +0800

    fix(ci): fix main-cron by increasing total compute node memory in compaction test (risingwavelabs#7703)

    As title.

    Approved-By: kwannoel

commit 77b4607
Author: August <[email protected]>
Date:   Mon Feb 6 15:09:44 2023 +0800

    feat(frontend): introduce system catalog table pg_enum (risingwavelabs#7706)

    Introduce system catalog table `pg_enum`.

    Approved-By: neverchanje

commit 7e565a7
Author: Yuanxin Cao <[email protected]>
Date:   Mon Feb 6 14:44:30 2023 +0800

    feat(optimizer): watermark derivation for various plan nodes (risingwavelabs#7655)

    Add watermark derivation for various plan nodes. Watermark derivation for `TableScan` hasn't been implemented because it may need to modify table catalog and will be done in future PR.

    Approved-By: st1page

commit c3bb027
Author: ZENOTME <[email protected]>
Date:   Mon Feb 6 14:16:39 2023 +0800

    feat(frontend): seperate plan_fragmenter into two phase (risingwavelabs#7581)

    To solve risingwavelabs#7439, we need to do async operation in plan_fragmentor. To do this, I seperate the plan_fragmentor into two phase **so that we can do async operation in phase 2**:

    phase 1 : BatchPlanFragmenter.split(batch_node) -> PreStageGraph
    phase 2 : PreStageGraph.complete() -> StageGraph

    The difference between PreStageGraph and StageGraph is that StageGraph contains the exchange_info and parallism. These information will be filled in phase 2.

    Approved-By: liurenjie1024

commit 2678067
Author: Eric Fu <[email protected]>
Date:   Mon Feb 6 13:06:04 2023 +0800

    feat: display git version in `version()` and server logs (risingwavelabs#7690)

    Show Git version in `version()` and server logs. Would be helpful when looking into problems, especially for nightly versions.

    ```
    dev=> select version();
    version
    --------------------------------------------------
    PostgreSQL 13.9-RisingWave-0.2.0-alpha (b720f19)
    (1 row)
    ```

    ```
    2023-02-03T10:35:25.446834Z  INFO risingwave_compute::server: Starting compute node
    2023-02-03T10:35:25.446858Z  INFO risingwave_compute::server: > config: RwConfig { server: ServerConfig { heartbeat_interval_ms: 1000, max_heartbeat_interval_secs: 600, connection_pool_size: 16, metrics_level: 0 }, meta: MetaConfig { min_sst_retention_time_sec: 604800, collect_gc_watermark_spin_interval_sec: 5, periodic_compaction_interval_sec: 60, vacuum_interval_sec: 30, max_heartbeat_interval_secs: 600, disable_recovery: true, meta_leader_lease_secs: 10, dangerous_max_idle_secs: Some(1800), enable_compaction_deterministic: false, enable_committed_sst_sanity_check: false, node_num_monitor_interval_sec: 10, backend: Mem }, batch: BatchConfig { worker_threads_num: None, developer: DeveloperConfig { batch_output_channel_size: 64, batch_chunk_size: 1024, stream_enable_executor_row_count: false, stream_connector_message_buffer_size: 16, unsafe_stream_extreme_cache_size: 1024, stream_chunk_size: 1024, stream_exchange_initial_permits: 8192, stream_exchange_batched_permits: 1024 } }, streaming: StreamingConfig { barrier_interval_ms: 1000, in_flight_barrier_nums: 10000, checkpoint_frequency: 10, actor_runtime_worker_threads_num: None, enable_jaeger_tracing: false, async_stack_trace: On, developer: DeveloperConfig { batch_output_channel_size: 64, batch_chunk_size: 1024, stream_enable_executor_row_count: false, stream_connector_message_buffer_size: 16, unsafe_stream_extreme_cache_size: 1024, stream_chunk_size: 1024, stream_exchange_initial_permits: 8192, stream_exchange_batched_permits: 1024 } }, storage: StorageConfig { sstable_size_mb: 256, block_size_kb: 64, bloom_false_positive: 0.001, share_buffers_sync_parallelism: 1, share_buffer_compaction_worker_threads_number: 4, shared_buffer_capacity_mb: 1024, state_store: "hummock+memory", data_directory: "hummock_001", write_conflict_detection_enabled: true, block_cache_capacity_mb: 512, meta_cache_capacity_mb: 128, disable_remote_compactor: false, enable_local_spill: true, local_object_store: "tempdisk", share_buffer_upload_concurrency: 8, compactor_memory_limit_mb: 512, sstable_id_remote_fetch_number: 10, file_cache: FileCacheConfig { dir: "", capacity_mb: 1024, total_buffer_capacity_mb: 128, cache_file_fallocate_unit_mb: 512, cache_meta_fallocate_unit_mb: 16, cache_file_max_write_size_mb: 4 }, min_sst_size_for_streaming_upload: 33554432, max_sub_compaction: 4, object_store_use_batch_delete: true, max_concurrent_compaction_task_number: 16, enable_state_store_v1: false }, backup: BackupConfig { storage_url: "memory", storage_directory: "backup" } }
    2023-02-03T10:35:25.447014Z  INFO risingwave_compute::server: > debug assertions: on
    2023-02-03T10:35:25.447043Z  INFO risingwave_compute::server: > version: 0.2.0-alpha (75de7ee)
    ```

    Approved-By: liurenjie1024

commit 2dfa704
Author: lmatz <[email protected]>
Date:   Mon Feb 6 12:23:39 2023 +0800

    fix(meta): temporarily does not require advertise_addr when using etcd

commit 167afb3
Author: xiangjinwu <[email protected]>
Date:   Mon Feb 6 12:11:40 2023 +0800

    refactor(DataType): cleanup outdated helpers (risingwavelabs#7685)

    The following 2 helpers are no longer used in any place: `is_type_encodable`, `mem_cmp_eq_value_enc`.

    Note: right now all DataTypes are encodable in memcomparable format. We will reintroduce the difference and disallow certain types from being used as memcomparable later.

    Approved-By: liurenjie1024

commit b996ba1
Author: ZENOTME <[email protected]>
Date:   Mon Feb 6 11:13:19 2023 +0800

    chore(test):add e2e test for producing timestamp in kafka source (risingwavelabs#7699)

    Approved-By: liurenjie1024

commit 50fc449
Author: Eric Fu <[email protected]>
Date:   Mon Feb 6 10:44:18 2023 +0800

    fix: alias of argument `advertise_addr` (risingwavelabs#7702)

    The alias of `--advertise-addr` should be `--client-address` to keep compatible with previous versions.

    Approved-By: lmatz
    Approved-By: huangjw806

commit 4e9756d
Author: xxchan <[email protected]>
Date:   Sat Feb 4 08:17:40 2023 +0100

    ci: add github_token for buf-setup-action (risingwavelabs#7697)

    Approved-By: tabVersion

commit 05eb37f
Author: xxchan <[email protected]>
Date:   Fri Feb 3 23:55:18 2023 +0100

    feat: implement append-only group TopN (risingwavelabs#7522)

    close risingwavelabs#7376

    Approved-By: BugenZhao

commit dffc2f1
Author: Yuanxin Cao <[email protected]>
Date:   Sat Feb 4 00:37:50 2023 +0800

    feat: ensure reserved memory for computing tasks on compute node starting (risingwavelabs#7670)

    The total memory of a CN consists of:

    1. computing memory (both stream & batch)
    2. storage memory (block cache, meta cache, etc.)
    3. memory for system usage

    That is to say, we have **_CN total memory_ = _computing memory_ + _storage memory_ + _system memory_**, and both _CN total memory_ and _storage memory_ are configured by the user currently. This PR is to ensure that _computing memory_ and  _system memory_ are correctly reserved,, i.e. **_computing memory_ + _system memory_ = _CN total memory_ - _storage memory_ > a given amount of memory**. We set this "given amount of memory" as 1G for now (512M for computing and 512M for system). The check is performed on CN starting.

    Approved-By: fuyufjh
    Approved-By: hzxa21

commit 20bdb72
Author: Bowen <[email protected]>
Date:   Fri Feb 3 19:20:31 2023 +0800

    fix: report local execution mode error (risingwavelabs#7454)

    1. Enable local mode error propagation. Now when local mode task (in CN) happens error, it can report to users.
    2. Store sender in TaskExecution, avoid early drop (Otherwise it's possible that the task execution error will become hash shuffle error)

    This pr revert some previous workaround: TODO in sqlsmith, store the sender in task execution

    Approved-By: liurenjie1024

    Co-Authored-By: BowenXiao1999 <[email protected]>
    Co-Authored-By: Bowen <[email protected]>

commit a3306ea
Author: xxchan <[email protected]>
Date:   Fri Feb 3 11:40:52 2023 +0100

    feat: don't report error when cancelling `risedev configure` (risingwavelabs#7673)

    Also tweak the prompts

    Approved-By: BugenZhao
    Approved-By: TennyZhuang

commit 930b185
Author: Shanicky Chen <[email protected]>
Date:   Fri Feb 3 17:34:29 2023 +0800

    feat: refine meta election logic (risingwavelabs#7669)

    This PR refine a series of meta election related codes

    1. meta's addr supports multiple addresses
    2. meta client adds a meta address mode parameter to distinguish the behavior of election members found in loadbalance (kubernetes environment) and list (normal environment)

    Approved-By: yezizp2012

commit 496f7a9
Author: Noel Kwan <[email protected]>
Date:   Fri Feb 3 15:23:56 2023 +0800

    fix(sqlsmith): generate typed null (risingwavelabs#7679)

    Approved-By: fuyufjh

commit 50bd487
Author: Eric Fu <[email protected]>
Date:   Fri Feb 3 14:59:58 2023 +0800

    fix: bug of MetaNodeOpts (risingwavelabs#7681)

    It seems to be a mistake introduced in risingwavelabs#7658

    Approved-By: Gun9niR

commit f1c4558
Author: zwang28 <[email protected]>
Date:   Fri Feb 3 14:35:40 2023 +0800

    chore(log): suppress unnecessary warning (risingwavelabs#7676)

    Change this warning message into a debug message, because it's the expected behavior during backfill. Otherwise it can overwhelm the log stream during a large MV creation.

    Approved-By: wenym1

commit a97e94f
Author: Yi Zhang <[email protected]>
Date:   Fri Feb 3 14:12:16 2023 +0800

    feat(ci): add e2e test for iceberg sink (risingwavelabs#7631)

    This PR adds e2e tests for iceberg sink in PR & main branch workflows, which includes:
    - setting up the environment (creating a new bucket in hummock-minio with `mcli`, creating an iceberg table with `spark-sql`)
    - running `e2e_test/sink/iceberg_sink.slt`
    - checking the test results (reading from the iceberg table with `spark-sql` and matching the output)

    Approved-By: wenym1
    Approved-By: StrikeW
    Approved-By: tabVersion

commit b720f19
Author: Noel Kwan <[email protected]>
Date:   Fri Feb 3 12:03:46 2023 +0800

    feat(sqlsmith): gen implicit cast (risingwavelabs#7629)

    - [x] Gen for fixed func
    - [x] Gen for concat (note that this is implicit cast but in explicit context...)

    Approved-By: lmatz

    Co-Authored-By: Noel Kwan <[email protected]>
    Co-Authored-By: Noel Kwan <[email protected]>

commit 1f6b063
Author: ioperations <[email protected]>
Date:   Fri Feb 3 11:35:37 2023 +0800

    fix(sqlparser): fix operator precedence between '>=' and 'IN' (risingwavelabs#7665)

    Approved-By: kwannoel

    Co-Authored-By: Noel Kwan <[email protected]>
    Co-Authored-By: aodong.qin <[email protected]>
    Co-Authored-By: ioperations <[email protected]>

commit 704cc4f
Author: jon-chuang <[email protected]>
Date:   Fri Feb 3 11:12:45 2023 +0800

    feat(service-params): Rename `host -> listen_addr`, `client_addr -> advertise_addr` (risingwavelabs#7530)

    Rename `host -> listen_addr`, `client_addr -> advertise_addr` for clearer meaning of cmdline params.

    Approved-By: CAJan93

    Co-Authored-By: jon-chuang <[email protected]>
    Co-Authored-By: jon-chuang <[email protected]>

commit bb673d4
Author: jon-chuang <[email protected]>
Date:   Fri Feb 3 10:15:13 2023 +0800

    feat(frontend): define `ConstantEvalRewriter`, impl `ExprRewritable` for batch (risingwavelabs#7541)

    Next steps:
    - impl for stream nodes

    Depends on: risingwavelabs#7542

    Let's remember to enable `now.slt.part` after we enable `ConstantEvalRewriter`.

    Approved-By: chenzl25

commit 99ffb71
Author: Noel Kwan <[email protected]>
Date:   Thu Feb 2 23:59:02 2023 +0800

    fix(sqlsmith): generation of ambiguous `IN` `list` expression (risingwavelabs#7672)

    Generate `InList` expression with parenthesis in the prefix argument to avoid ambiguity.

    Approved-By: lmatz

commit 00b0e36
Author: August <[email protected]>
Date:   Thu Feb 2 22:18:53 2023 +0800

    chore(test): minor code refactoring in simulation test (risingwavelabs#7666)

    Approved-By: BugenZhao
    Approved-By: wangrunji0408

commit 9fd7721
Author: ZENOTME <[email protected]>
Date:   Thu Feb 2 20:46:16 2023 +0800

    feat(pgwire):support mix format in extended query mode (risingwavelabs#7622)

    as title. Solve the issue risingwavelabs#7605 and risingwavelabs#7599

    Approved-By: BowenXiao1999
    Approved-By: xiangjinwu

commit 477e30d
Author: Dylan <[email protected]>
Date:   Thu Feb 2 17:44:31 2023 +0800

    feat(streaming): support delta join on primary table (risingwavelabs#7662)

    - Support delta join on primary table, because primary table is also an index as well.

    Approved-By: st1page

commit e987106
Author: Zhidong Guo <[email protected]>
Date:   Thu Feb 2 16:42:58 2023 +0800

    feat(cli): fallback to env if CLI arg is absent (risingwavelabs#7658)

    As title.

    All envs will have the prefix of `RW`. Clap does not support prefixing env at the moment, so we have to specify the name manually 🥵. Serfig provides this functionality, but it still has the limitation of being unable to override with default value.

    As for backward compatibility, although the env names have changed, the CLI args remain the same.

    Approved-By: fuyufjh

commit 83e7f00
Author: August <[email protected]>
Date:   Thu Feb 2 16:06:43 2023 +0800

    fix(meta): fix meta_endpoint format when host not provided in playground mode (risingwavelabs#7661)

    Fix `meta_endpoint` format when `host` is not provided in playground mode, the endpoint will be wrongly generated as `127.0.0.1:5690:5690`.

    Approved-By: lmatz
    Approved-By: tabVersion
    Approved-By: BugenZhao

commit e7fe72b
Author: Wallace <[email protected]>
Date:   Thu Feb 2 13:04:58 2023 +0800

    feat(storage): use finer granularity to monitor request latency (risingwavelabs#7586)

    We used to thought the iterator latency of state-store could always exceed several hundred microsecond but it is a mistake caused by metrics. In fact, in most case, it is only several micro-seconds

    Approved-By: Li0k

commit df85930
Author: CAJan93 <[email protected]>
Date:   Thu Feb 2 04:54:27 2023 +0100

    feat(docs): Add documentation on how to run RW with debugger (risingwavelabs#7652)

    Simple docs on how to run RW locally using a debugger. Hope you find it useful. For me it sometimes helps to step through the code line by line

    Approved-By: lmatz

commit 9b9e092
Author: waruto <[email protected]>
Date:   Thu Feb 2 09:49:38 2023 +0800

    refactor(source): refine some code of split reader (risingwavelabs#7644)

    - move the implementation of `SplitReaderV2` to the readers themselves.

    Approved-By: xx01cyx
    Approved-By: tabVersion

commit 0f7bfa2
Merge: 95085ee 3d9077b
Author: Wallace <[email protected]>
Date:   Wed Feb 1 22:17:14 2023 +0800

    Merge branch 'main' into refactor-bloom-filter

commit 95085ee
Author: Little-Wallace <[email protected]>
Date:   Fri Jan 20 11:53:21 2023 +0800

    rename trait

    Signed-off-by: Little-Wallace <[email protected]>

commit 660ad11
Author: Little-Wallace <[email protected]>
Date:   Fri Jan 20 11:43:42 2023 +0800

    fix test

    Signed-off-by: Little-Wallace <[email protected]>

commit c6d9ac3
Author: Little-Wallace <[email protected]>
Date:   Thu Jan 19 17:33:24 2023 +0800

    fix name

    Signed-off-by: Little-Wallace <[email protected]>

commit 523b5a4
Author: Little-Wallace <[email protected]>
Date:   Thu Jan 19 17:07:52 2023 +0800

    refactor builder

    Signed-off-by: Little-Wallace <[email protected]>

Signed-off-by: Little-Wallace <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/meta Meta related issue. found-by-longevity-test type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants