-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move tarantoolctl to test-run tool submodule and add replication_sync_timeout to it #242
Conversation
Please, mark the first commit as part of #78. |
Please, also mention the issue in tarantool that motivates those changes. |
1638eaa
to
4320eff
Compare
Made all the changes as you suggested, please review. |
e4fc9fb
to
944f987
Compare
b190e0e
to
8928998
Compare
8928998
to
3a48123
Compare
Made several changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some tweaks and now everything looks okay for me. I think we both spin around here enough time and the change can be pushed. I verified in-source and out-of-source builds before and after the corresponding tarantool patch and everything looks okay too.
Found a nice side effect: now we can run |
3a48123
to
976c836
Compare
test-run copies .tarantooctl file from a test/ directory to a worker's working directory (e.g. test/var/001_app). It would be quite intuitive if tarantoolctl would use this copy when operating on tarantool instances of the worker. Suprisingly, it is not so. tarantoolctl checks whether the file exists inside a directory pointed by the PWD environment variable, not a real current working directory. test-run did not set PWD prior to this commit, however everything work. Why? Because test-run.py is always executed from the test/ directory, which contains the same .tarantoolctl file. Even `make test` / `make test-force` targets issue commands like `cd <...>/test && <...>/test-run.py <...>` (where `cd` sets PWD). The time has come. Let's set PWD and use test/var/001_app/.tarantoolctl configuration for instances of the 001_app worker. Side note: it would be good to fix tarantoolctl too to use a real cwd, not PWD. This inconsistency was spotted, when we perform attempt to remove the test/.tarantoolctl config from the main tarantool repository, move it to the test-run repository, but keep copying into a worker directory. As a side effect, test-run now could be invoked without manual changing a current directory to test/. So, where we run a command like this (example for in-source build): | (cd test && ./test-run.py) The following command can be used: | ./test/test-run.py Nice! Sadly, the trick `Popen(<...>, env=dict(os.environ, PWD=self.vardir))` does not work for us, because we use os.putenv() in test-run and Python tests. os.putenv() does not update os.environ, so we would feed non-actual environment to the subprocess. Because of this, we just set PWD before and after Popen() call. Part of tarantool/tarantool#5504 Part of #78 Co-authored-by: Alexander V. Tikhonov <[email protected]>
Moved .tarantoolctl to test-run tool submodule repository as: <tarantool repository>/test-run/.tarantoolctl Also set backward compability to use old path location: <tarantool repository>/test/.tarantoolctl as its primary place. Needed for tarantool/tarantool#5504 Part of #78
Found that tests may fail due to hang in seek_wait() loop on starting and stopping instances. It happened, because instances were not synced within default output timeout which is by default 120 seconds, while replication sync happens only each 300 seconds by default. To fix it replication_sync_timeout should be decreased to some value lower than 'no-output-timeout', so decided to set it to 100 seconds. The issue looked like in tests: --- replication/gh-5140-qsync-casc-rollback.result Mon Oct 19 17:29:46 2020 +++ /rw_bins/test/var/070_replication/gh-5140-qsync-casc-rollback.result Mon Oct 19 17:31:35 2020 @@ -169,56 +169,3 @@ -- all the records are replayed one be one without yields for WAL writes, and -- nothing should change. test_run:cmd('restart server default') - | -test_run:cmd('restart server replica') - | --- - | - true - | ... - Part of tarantool/tarantool#5504
976c836
to
d36db6c
Compare
(Regarding the last force-push.) Fixed a typo in a commit message. |
See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: tarantool/test-run#242 Part of #5504
See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: tarantool/test-run#242 Part of #5504 (cherry picked from commit 6c8efa8)
See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: tarantool/test-run#242 Part of #5504 (cherry picked from commit 6c8efa8)
See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: tarantool/test-run#242 Part of #5504 (cherry picked from commit 6c8efa8)
Updated the test-run submodule in tarantool in the following commits: 2.7.0-97-g6c8efa849, 2.6.1-84-g0d738037b, 2.5.2-48-gfa36857a3, 1.10.8-33-g64e2fb849. |
Removed the |
Move tarantoolctl to test-run tool submodule
Moved tarantoolctl to test-run tool submodule repository as:
/test-run/.tarantoolctl
Also set backward compability to use old path location:
/test/.tarantoolctl
as its primary place.
To set PWD environment for tarantoolctl for popen() call tried to use
os.environ. It was needed for the later commits where tarantoolctl
would be moved from to test-run repository and its configuration file
would be read right from the working directory of the test. In this
case all os.putenv() calls changed to environment set with os.environ,
to be able to modify environment directly, check [1] ("Note: Calling
putenv() directly does not change os.environ"). But in this case all
the tests had to be changed to use os.environ instead of os.putenv(),
it will block the use of os.putenv() in the future tests. Decided to
avoid of this way and temporary change 'PWD' around the Popen() call.
Needed for test: flaky hang tests tarantool#5504
Part of Simplify configuration needed for setup tap / tarantool tests #78
[1] - https://docs.python.org/2/library/os.html#process-parameters
Setup replication_sync_timeout at .tarantoolctl
Found that tests may fail due to hang in seek_wait() loop on starting
and stopping instances. It happened, because instances were not synced
within default output timeout which is by default 120 seconds, while
replication sync happens only each 300 seconds by default. To fix it
replication_sync_timeout should be decreased to some value lower than
'no-output-timeout', so decided to set it to 100 seconds.
The issue looked like in tests:
Part of test: flaky hang tests tarantool#5504