Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] tserver crashed while inserting data into the cluster #10621

Closed
sonalsagarwal opened this issue Nov 17, 2021 · 1 comment
Closed

[DocDB] tserver crashed while inserting data into the cluster #10621

sonalsagarwal opened this issue Nov 17, 2021 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority

Comments

@sonalsagarwal
Copy link

sonalsagarwal commented Nov 17, 2021

Jira Link: DB-1006

Description

Observed this issue in the TPC-C 100k run, where one of the client threads failed while loading the data.
Its a 63-node cluster with 3 master nodes and 60 tserver nodes.
40 clients were loading 100k WH TPC-C data(each client loading 2500WH)
DB version for cluster : 2.9.2.0_b104

Flags :
memstore_size_mb 256
global_memstore_size_mb_max 4096
tserver_tcmalloc_max_total_thread_cache_bytes 1073741824
rocksdb_max_write_buffer_number 10

The reason for failure was : Connection closed error. This happened around 12:36:50.
Error in client :

Exception in thread “main" java.lang.RuntimeException: Failed to execute threads: org.postgresql.util.PSQLException: This connection has been closed.
	at com.oltpbenchmark.util.ThreadUtil.runNewPool(ThreadUtil.java:81)
	at com.oltpbenchmark.api.BenchmarkModule.loadDatabase(BenchmarkModule.java:221)
	at com.oltpbenchmark.DBWorkload.runLoader(DBWorkload.java:458)
	at com.oltpbenchmark.DBWorkload.main(DBWorkload.java:358)
Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.
	at com.oltpbenchmark.api.Loader$LoaderThread.run(Loader.java:70)
	at com.oltpbenchmark.util.ThreadUtil$LatchRunnable.run(ThreadUtil.java:117)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
	at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:793)
	at org.postgresql.jdbc.PgConnection.commit(PgConnection.java:779)
	at com.oltpbenchmark.api.Loader$LoaderThread.run(Loader.java:63)
	... 4 more

Checked the logs of one of the tservers (172.151.58.196), it had below message in warning log file:

W1117 12:36:49.463980 35966 tablet_rpc.cc:450] Timed out (yb/rpc/outbound_call.cc:501): Failed UpdateTransaction: tablet_id: "ab44776dbdf14673a80f802529e32fc8" state { transaction_id: "\t_\257qz\213M\222\226|ic\321\304\001?" status: APPLYING tablets: "f29f0b0fba4d4320b028c7bfb875c42a" commit_hybrid_time: 6705777047196700672 sealed: false aborted { } } propagated_hybrid_time: 6705777067885903872, retrier: { task_id: -1 state: kIdle deadline: 7776.420s } to tablet ab44776dbdf14673a80f802529e32fc8 on tablet server { uuid: 6be590b5174b4a31967588d433a9bf19 private: [host: "172.151.48.222" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2c" after 1 attempt(s): UpdateTransaction RPC (request call id 101668698) to 172.151.48.222:9100 timed out after 4.999s

On the server 172.151.48.222, there are no Errors. But this warning logs has the following message, which indicates some crash :

W1117 12:37:01.903275 47301 log_util.cc:221] Could not read footer for segment: /mnt/d1/yb-data/tserver/wals/table-fde6b822741e4ab890209eb89a2c9543/tablet-e7090f7b29da4a608f40b238835dbd3a/wal-000000022: Not found (yb/consensus/log_util.cc:463): Footer not found. Footer magic doesn't match
W1117 12:37:01.903398 47301 log_reader.cc:197] T e7090f7b29da4a608f40b238835dbd3a P 6be590b5174b4a31967588d433a9bf19: Log segment /mnt/d1/yb-data/tserver/wals/table-fde6b822741e4ab890209eb89a2c9543/tablet-e7090f7b29da4a608f40b238835dbd3a/wal-000000022 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W1117 12:37:01.903419 47297 log_util.cc:221] Could not read footer for segment: /mnt/d1/yb-data/tserver/wals/table-fde6b822741e4ab890209eb89a2c9543/tablet-609afb43a2374b388ec87d931f35b6d9/wal-000000021: Not found (yb/consensus/log_util.cc:463): Footer not found. Footer magic doesn't match
W1117 12:37:01.903276 47303 log_util.cc:221] Could not read footer for segment: /mnt/d1/yb-data/tserver/wals/table-fde6b822741e4ab890209eb89a2c9543/tablet-ce7e3ec6df0942b59513366e0580ca1e/wal-000000024: Not found (yb/consensus/log_util.cc:463): Footer not found. Footer magic doesn't match

This tserver 172.151.48.222 had a crash and was restarted. The core dump is as below.

#0  rocksdb::Block::NumRestarts (this=this@entry=0x6ed06960) at ../../src/yb/rocksdb/table/block.cc:491
491	../../src/yb/rocksdb/table/block.cc: No such file or directory.
(gdb) bt
#0  rocksdb::Block::NumRestarts (this=this@entry=0x6ed06960) at ../../src/yb/rocksdb/table/block.cc:491
#1  0x00007f78d36f2b82 in rocksdb::Block::NewIterator (this=0x6ed06960, cmp=0x18346e50, key_value_encoding_format=rocksdb::kKeyDeltaEncodingSharedPrefix, iter=iter@entry=0x0, total_order_seek=total_order_seek@entry=true)
    at ../../src/yb/rocksdb/table/block.cc:522
#2  0x00007f78d36ea92f in rocksdb::BlockBasedTable::NewDataBlockIterator (this=0x192cf6cb0, ro=..., index_value=..., block_type=rocksdb::kIndex, input_iter=0x0) at ../../src/yb/rocksdb/table/block_based_table_reader.cc:1256
#3  0x00007f78d3704dde in InitSubIterator (parent_iter=0x31477e4c8, this=0x31477e480) at ../../src/yb/rocksdb/table/index_reader.cc:271
#4  DoSeek<std::_Bind<std::_Mem_fn<void (rocksdb::IteratorWrapper::*)()>(std::_Placeholder<1>)> > (seek_function=..., this=0x31477e480) at ../../src/yb/rocksdb/table/index_reader.cc:231
#5  rocksdb::MultiLevelIterator::SeekToFirst (this=0x31477e480) at ../../src/yb/rocksdb/table/index_reader.cc:165
#6  0x00007f78d372b816 in SeekToFirst (this=0x8028f110) at ../../src/yb/rocksdb/table/iterator_wrapper.h:127
#7  rocksdb::(anonymous namespace)::TwoLevelIterator::SeekToFirst (this=0x8028f0e0) at ../../src/yb/rocksdb/table/two_level_iterator.cc:130
#8  0x00007f78d3709595 in SeekToFirst (this=<optimized out>) at ../../src/yb/rocksdb/table/iterator_wrapper.h:127
#9  rocksdb::MergingIterator::SeekToFirst (this=0x5151f200) at ../../src/yb/rocksdb/table/merger.cc:97
#10 0x00007f78d35f3c9d in rocksdb::CompactionJob::ProcessKeyValueCompaction (this=this@entry=0x7f7589192d70, holder=holder@entry=0x7f7589192930, sub_compact=0xaae634a0) at ../../src/yb/rocksdb/db/compaction_job.cc:631
#11 0x00007f78d35f509a in rocksdb::CompactionJob::Run (this=this@entry=0x7f7589192d70) at ../../src/yb/rocksdb/db/compaction_job.cc:461
#12 0x00007f78d3631a32 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=0x17ff1200, made_progress=made_progress@entry=0x7f75891931e7, job_context=job_context@entry=0x7f7589193250, log_buffer=log_buffer@entry=0x7f7589193420,
    manual_compaction=manual_compaction@entry=0x0, compaction=...) at ../../src/yb/rocksdb/db/db_impl.cc:3626
#13 0x00007f78d36335f7 in rocksdb::DBImpl::BackgroundCallCompaction (this=0x17ff1200, m=0x0, compaction=..., compaction_task=0xd1dd7e80) at ../../src/yb/rocksdb/db/db_impl.cc:3351
#14 0x00007f78d36470c8 in rocksdb::DBImpl::CompactionTask::DoRun (this=<optimized out>, suspender=<optimized out>) at ../../src/yb/rocksdb/db/db_impl.cc:265
#15 0x00007f78d3639a79 in rocksdb::DBImpl::ThreadPoolTask::Run (this=0xd1dd7e80, status=..., suspender=<optimized out>) at ../../src/yb/rocksdb/db/db_impl.cc:219
#16 0x00007f78d024c868 in yb::(anonymous namespace)::PriorityThreadPoolWorker::Run (this=0x2417bb98) at ../../src/yb/util/priority_thread_pool.cc:213
#17 0x00007f78d027ff2f in operator() (this=0x3f041eb8) at /home/yugabyte/yb-software/yugabyte-2.9.2.0-b104-centos-x86_64/linuxbrew-xxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#18 yb::Thread::SuperviseThread (arg=0x3f041e60) at ../../src/yb/util/thread.cc:771
#19 0x00007f78ca51c694 in start_thread (arg=0x7f758919c700) at pthread_create.c:333
#20 0x00007f78c9c5941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
@sonalsagarwal sonalsagarwal added the area/docdb YugabyteDB core features label Nov 17, 2021
@sonalsagarwal sonalsagarwal changed the title [DocDB] tserver crashed while insert data into the cluster [DocDB] tserver crashed while inserting data into the cluster Nov 17, 2021
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022
@bmatican bmatican assigned ttyusupov and rthallamko3 and unassigned bmatican Jul 19, 2022
@rthallamko3 rthallamko3 removed their assignment May 9, 2023
@yugabyte-ci yugabyte-ci added priority/high High Priority and removed priority/medium Medium priority issue labels Sep 22, 2023
@rthallamko3
Copy link
Contributor

DUP of #20852

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority
Projects
None yet
Development

No branches or pull requests

5 participants