Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test FilterProjectReplayerTest.projectOnly in velox_tool_trace_test crashes #12071

Closed
kevinwilfong opened this issue Jan 13, 2025 · 3 comments
Assignees
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@kevinwilfong
Copy link
Contributor

Bug description

I frequently see FilterProjectReplayerTest.projectOnly in velox_tool_trace_test in the "Build with GCC / Linux release with adapters" CI job crash with a SIGSEGV on PRs.

*** Signal 11 (SIGSEGV) (0x0) received by PID 83215 (pthread TID 0x7f475a704b80) (linux TID 83215) (code: address not mapped to object), stack trace: ***
(error retrieving stack trace)
Fatal signal handler. ThreadDebugInfo object not found.

E.g. (multiple PRs from multiple authors)
https://github.com/facebookincubator/velox/actions/runs/12745207168/job/35518716720
https://github.com/facebookincubator/velox/actions/runs/12741414722/job/35507977698
https://github.com/facebookincubator/velox/actions/runs/12719384670/job/35459508390

System information

"Build with GCC / Linux release with adapters" CI job

Relevant logs

No response

@kevinwilfong kevinwilfong added bug Something isn't working triage Newly created issue that needs attention. labels Jan 13, 2025
@kevinwilfong
Copy link
Contributor Author

I found this stack trace in one of them

*** Aborted at 1736561111 (Unix time, try 'date -d @1736561111') ***
*** Signal 11 (SIGSEGV) (0x0) received by PID 85813 (pthread TID 0x7f3da04f7340) (linux TID 85813) (code: address not mapped to object), stack trace: ***
@ 000000000cf0cd48 _ZN5folly10symbolizer12_GLOBAL__N_118innerSignalHandlerEiP9siginfo_tPv
/home/runner/work/velox/velox/velox/_build/debug/_deps/folly-src/folly/experimental/symbolizer/SignalHandler.cpp:453
@ 000000000cf0ce2e _ZN5folly10symbolizer12_GLOBAL__N_113signalHandlerEiP9siginfo_tPv
/home/runner/work/velox/velox/velox/_build/debug/_deps/folly-src/folly/experimental/symbolizer/SignalHandler.cpp:474
@ 000000000004251f (unknown)
@ 000000000c34292a _ZNKSt12__shared_ptrIN8facebook5velox9RowVectorELN9__gnu_cxx12_Lock_policyE2EE3getEv
/usr/include/c++/11/bits/shared_ptr_base.h:1296
-> /home/runner/work/velox/velox/velox/velox/tool/trace/tests/AggregationReplayerTest.cpp
@ 000000000c33ecb5 _ZNKSt19__shared_ptr_accessIN8facebook5velox9RowVectorELN9__gnu_cxx12_Lock_policyE2ELb0ELb0EE6_M_getEv
/usr/include/c++/11/bits/shared_ptr_base.h:993
-> /home/runner/work/velox/velox/velox/velox/tool/trace/tests/AggregationReplayerTest.cpp
@ 000000000c33b2db _ZNKSt19__shared_ptr_accessIN8facebook5velox9RowVectorELN9__gnu_cxx12_Lock_policyE2ELb0ELb0EEptEv
/usr/include/c++/11/bits/shared_ptr_base.h:987
-> /home/runner/work/velox/velox/velox/velox/tool/trace/tests/AggregationReplayerTest.cpp
@ 000000000cc7dea9 _ZN8facebook5velox4tool5trace21TraceReplayTaskRunner4copyERKSt6vectorISt10shared_ptrINS0_9RowVectorEESaIS7_EE
/home/runner/work/velox/velox/velox/velox/tool/trace/TraceReplayTaskRunner.cpp:48
@ 000000000cc7dd13 _ZN8facebook5velox4tool5trace21TraceReplayTaskRunner3runEb
/home/runner/work/velox/velox/velox/velox/tool/trace/TraceReplayTaskRunner.cpp:35
@ 000000000cc590d4 _ZN8facebook5velox4tool5trace20OperatorReplayerBase3runEb
/home/runner/work/velox/velox/velox/velox/tool/trace/OperatorReplayerBase.cpp:87
@ 000000000c352b77 _ZN8facebook5velox4tool5trace4test42FilterProjectReplayerTest_projectOnly_Test8TestBodyEv
/home/runner/work/velox/velox/velox/velox/tool/trace/tests/FilterProjectReplayerTest.cpp:284
@ 000000001796619a _ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:2621
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001795ff54 _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:2657
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001793d32f _ZN7testing4Test3RunEv
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:2696
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001793de4a _ZN7testing8TestInfo3RunEv
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:2845
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001793e797 _ZN7testing9TestSuite3RunEv
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:3004
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001794ec0f _ZN7testing8internal12UnitTestImpl11RunAllTestsEv
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:5890
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 00000000179670ab _ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:2621
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001796100c _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:2657
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000001794d1f6 _ZN7testing8UnitTest3RunEv
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest.cc:5455
-> /home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/src/gtest-all.cc
@ 000000000c3c156f _Z13RUN_ALL_TESTSv
/home/runner/work/velox/velox/velox/_build/debug/_deps/gtest-src/googletest/include/gtest/gtest.h:2314
-> /home/runner/work/velox/velox/velox/velox/tool/trace/tests/TableWriterReplayerTest.cpp
@ 000000000c3bc40a main
/home/runner/work/velox/velox/velox/velox/tool/trace/tests/TableWriterReplayerTest.cpp:524
@ 0000000000029d8f (unknown)
@ 0000000000029e3f __libc_start_main
@ 000000000c32ada4 _start
Fatal signal handler. ThreadDebugInfo object not found.

@kgpai
Copy link
Contributor

kgpai commented Jan 13, 2025

cc: @duanmeng

@duanmeng duanmeng self-assigned this Jan 14, 2025
@jinchengchenghh
Copy link
Contributor

I meet another exception in this test. CC @duanmeng
https://github.com/facebookincubator/velox/actions/runs/12823939338/job/35759249546?pr=11869

[ RUN      ] FilterProjectReplayerTest.projectOnly
E20250117 07:12:05.974892 79179 Exceptions.h:66] Line: /__w/velox/velox/velox/common/memory/SharedArbitrator.cpp:430, Function:addPool, Expression: participants_.count(pool->name()) == 0 (1 vs. 0) Memory pool FilterProject_replayer already exists, Source: RUNTIME, ErrorCode: INVALID_STATE
unknown file: Failure
C++ exception with description "Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (1 vs. 0) Memory pool FilterProject_replayer already exists
Retriable: False
Expression: participants_.count(pool->name()) == 0
Function: addPool
File: /__w/velox/velox/velox/common/memory/SharedArbitrator.cpp
Line: 430
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox6memory16SharedArbitrator7addPoolERKSt10shared_ptrINS1_10MemoryPoolEE
# 4  _ZN8facebook5velox6memory13MemoryManager14createRootPoolENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt10unique_ptrINS1_15MemoryReclaimerESt14default_deleteISA_EERNS1_10MemoryPool7OptionsE
# 5  _ZN8facebook5velox6memory13MemoryManager11addRootPoolERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEElSt10unique_ptrINS1_15MemoryReclaimerESt14default_deleteISC_EE
# 6  _ZN8facebook5velox4tool5trace20OperatorReplayerBase14createQueryCtxEv
# 7  _ZN8facebook5velox4tool5trace20OperatorReplayerBase3runEb
# 8  _ZN8facebook5velox4tool5trace4test42FilterProjectReplayerTest_projectOnly_Test8TestBodyEv
# 9  _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc
# 10 _ZN7testing4Test3RunEv
# 11 _ZN7testing8TestInfo3RunEv
# 12 _ZN7testing9TestSuite3RunEv
# 13 _ZN7testing8internal12UnitTestImpl11RunAllTestsEv
# 14 _ZN7testing8UnitTest3RunEv
# 15 main
# 16 __libc_start_call_main
# 17 __libc_start_main
# 18 _start
" thrown in the test body.
[  FAILED  ] FilterProjectReplayerTest.projectOnly (130 ms)

facebook-github-bot pushed a commit that referenced this issue Jan 21, 2025
Summary:
Disabling tool trace test till a fix is found for issue : #12071

Pull Request resolved: #12136

Reviewed By: amitkdutta

Differential Revision: D68448746

Pulled By: kgpai

fbshipit-source-id: 99423db1632be497257b80dc87b9c926ac0e07bb
facebook-github-bot pushed a commit that referenced this issue Jan 22, 2025
Summary:
Enable the Velox trace tool test as the flakey issue described in #12071 was fixed and merged.

Details in #12136 and #12124

Pull Request resolved: #12139

Reviewed By: gggrace14

Differential Revision: D68471205

Pulled By: xiaoxmeng

fbshipit-source-id: 80b3df271c3622b88e2f2296b1ff07f38e2ff514
duanmeng added a commit to duanmeng/velox that referenced this issue Feb 2, 2025
…tor#12124)

Summary:
The trace data file for some drivers might be empty hence the
replaying results of those drivers are empty causing a segment
fault issue when we use `results[0]` to get the output type. We
should use the `TraceReplayTaskRunner::cursorParams_::planNode`.

Fix facebookincubator#12071

Pull Request resolved: facebookincubator#12124

Reviewed By: kgpai

Differential Revision: D68428191

Pulled By: xiaoxmeng

fbshipit-source-id: d06a0306783d942eb968ec06ed1f8667949bc921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants