Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle lack of a temporary result file in the 'no output' warning #245

Closed
Totktonada opened this issue Nov 28, 2020 · 2 comments · Fixed by #246
Closed

Handle lack of a temporary result file in the 'no output' warning #245

Totktonada opened this issue Nov 28, 2020 · 2 comments · Fixed by #246
Assignees
Labels
bug Something isn't working

Comments

@Totktonada
Copy link
Member

Steps to reproduce:

  1. Checkout tarantool-2.7.0-72-g33b0dd582.

  2. Mangle it to don't exit after SIGTERM:

    diff --git a/src/main.cc b/src/main.cc
    index 2f48f474c..258d21a69 100644
    --- a/src/main.cc
    +++ b/src/main.cc
    @@ -171,7 +171,10 @@ signal_cb(ev_loop *loop, struct ev_signal *w, int revents)
     	 */
     	if (pid_file)
     		say_crit("got signal %d - %s", w->signum, strsignal(w->signum));
    +	say_crit("IGNORED signal %d - %s", w->signum, strsignal(w->signum));
    +#if 0
     	tarantool_exit(0);
    +#endif
     }
     
     static void
  3. Build.
    $ cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_BUNDLED_LIBCURL=OFF && make -j
  4. Create a simple test:

    $ printf '42\n' > test/app/the-question.test.lua
    $ printf '42\n---\n- 42\n...\n' > test/app/the-question.result
  5. Run it:

    $ (cd test && ./test-run.py app/the-question.test.lua --debug)

Actual behaviour: print 'No such file or directory: 'var/001_app/the-question.result'', print a stack trace and hang at attempt to show the 'No output during 10 seconds' warning.

More logs
[001] [2020-11-28 07:10:54.440643] DEBUG: [Instance app] Stopping the server...
[001] [2020-11-28 07:10:54.440886]  | Sending signal 15 (SIGTERM) to process 5689 [S (sleeping); tarantool app.lua <running>]
No output during 10 seconds. Will abort after 120 seconds without output. List of workers not reporting the status:
Traceback (most recent call last):
  File "./test-run.py", line 219, in <module>
    status = main_parallel()
  File "./test-run.py", line 134, in main_parallel
    res = main_loop_parallel()
  File "./test-run.py", line 106, in main_loop_parallel
    dispatcher.wait()
  File "/home/alex/projects/tarantool-meta/r/t-3/test-run/dispatcher.py", line 279, in wait
    objs = self.invoke_listeners(inputs, ready_inputs)
  File "/home/alex/projects/tarantool-meta/r/t-3/test-run/dispatcher.py", line 297, in invoke_listeners
    listener.process_timeout(self.report_timeout)
  File "/home/alex/projects/tarantool-meta/r/t-3/test-run/listeners.py", line 296, in process_timeout
    with open(task.task_tmp_result, 'r') as f:
IOError: [Errno 2] No such file or directory: 'var/001_app/the-question.result'

Expected hehaviour: just say that the file not exists.

The situation is a bit artificial, but we anyway should handle the case in the code that prints the warning.


If you're spin here, you may be also interested in implementing #145.

@Totktonada Totktonada added the bug Something isn't working label Nov 28, 2020
@Totktonada Totktonada self-assigned this Nov 28, 2020
@Totktonada
Copy link
Member Author

Found the problem while looking into PR #186, which comes back into our attention because of https://github.com/tarantool/tarantool/issues/5573.

@Totktonada
Copy link
Member Author

The temporary result file was removed, because the test passes:

test-run/lib/test.py

Lines 233 to 239 in 08a4817

elif (self.is_executed_ok and
self.is_equal_result and
self.is_valgrind_clean):
short_status = 'pass'
color_stdout("[ pass ]\n", schema='test_pass')
if os.path.exists(self.tmp_result):
os.remove(self.tmp_result)

But the worker stucks in waiting for stoping the server.

Totktonada added a commit that referenced this issue Nov 29, 2020
Totktonada added a commit that referenced this issue Nov 29, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Nov 30, 2020
avtikhon pushed a commit that referenced this issue Nov 30, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Nov 30, 2020
avtikhon pushed a commit that referenced this issue Nov 30, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Nov 30, 2020
avtikhon pushed a commit that referenced this issue Nov 30, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Nov 30, 2020
avtikhon pushed a commit that referenced this issue Nov 30, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Dec 1, 2020
avtikhon pushed a commit that referenced this issue Dec 1, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Dec 1, 2020
avtikhon pushed a commit that referenced this issue Dec 1, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Dec 1, 2020
avtikhon pushed a commit that referenced this issue Dec 1, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
avtikhon pushed a commit that referenced this issue Dec 3, 2020
avtikhon pushed a commit that referenced this issue Dec 3, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
Totktonada added a commit that referenced this issue Dec 4, 2020
Totktonada added a commit that referenced this issue Dec 4, 2020
When there is no output from workers during a long time (10 seconds by
default or 60 seconds when --long argument is passed), test-run prints a
warning and shows amount of lines in the temporary result file. It is
useful to understand on which statement a test hungs.

I reproduced the problem, when mangled tarantool to ignore SIGTERM and
SIGINT signals and run a simple 'tarantool = core' test. The test
successfully passes, but the worker stucks in waiting for stopping the
tarantool server.

This particular case should be resolved in PR #186, but just because the
timeout for stopping the server is less than the warning delay. This
assumption looks fragile, especially if we'll want to make some of those
timeouts / delays configurable. Let's handle the situation when the file
does not exist.

Found while looking into
https://github.com/tarantool/tarantool/issues/5573

Fixes #245
Totktonada added a commit to tarantool/tarantool that referenced this issue Dec 4, 2020
This changeset fixes a problem that unlikely will hit anyone, but in
theory it may be triggered by an incorrect behaviour of tarantool.

In brief, if tarantool does not react to SIGTERM after executing all
tests and a test-run's worker stucks at waiting for termination of the
tarantool process, the test-run's listener would fail at attempt to
access a temporary result file that does not exists. See more details in
[1].

[1]: tarantool/test-run#245
Totktonada added a commit to tarantool/tarantool that referenced this issue Dec 4, 2020
This changeset fixes a problem that unlikely will hit anyone, but in
theory it may be triggered by an incorrect behaviour of tarantool.

In brief, if tarantool does not react to SIGTERM after executing all
tests and a test-run's worker stucks at waiting for termination of the
tarantool process, the test-run's listener would fail at attempt to
access a temporary result file that does not exists. See more details in
[1].

[1]: tarantool/test-run#245

(cherry picked from commit e9579d5)
Totktonada added a commit to tarantool/tarantool that referenced this issue Dec 4, 2020
This changeset fixes a problem that unlikely will hit anyone, but in
theory it may be triggered by an incorrect behaviour of tarantool.

In brief, if tarantool does not react to SIGTERM after executing all
tests and a test-run's worker stucks at waiting for termination of the
tarantool process, the test-run's listener would fail at attempt to
access a temporary result file that does not exists. See more details in
[1].

[1]: tarantool/test-run#245

(cherry picked from commit e9579d5)
Totktonada added a commit to tarantool/tarantool that referenced this issue Dec 4, 2020
This changeset fixes a problem that unlikely will hit anyone, but in
theory it may be triggered by an incorrect behaviour of tarantool.

In brief, if tarantool does not react to SIGTERM after executing all
tests and a test-run's worker stucks at waiting for termination of the
tarantool process, the test-run's listener would fail at attempt to
access a temporary result file that does not exists. See more details in
[1].

[1]: tarantool/test-run#245

(cherry picked from commit e9579d5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant