Skip to content
This repository was archived by the owner on Jun 20, 2024. It is now read-only.

shorten smoke test execution time #942

Open
rade opened this issue Jun 17, 2015 · 7 comments
Open

shorten smoke test execution time #942

rade opened this issue Jun 17, 2015 · 7 comments
Labels
Milestone

Comments

@rade
Copy link
Member

rade commented Jun 17, 2015

Here's the timing of the tests, obtained from a recent CircleCI run

112.805s. 600_proxy_docker_py_test.sh
 46.653s. 130_expose_test.sh
 43.903s. 210_dns_multicast_test.sh
 31.475s. 500_weave_multi_cidr_test.sh
 22.787s. 280_with_or_without_dns_test.sh
 22.164s. 630_proxy_dns_test.sh
 20.838s. 100_cross_hosts_test.sh1
 18.914s. 200_dns_test.sh
 16.206s. 640_proxy_restart_reattaches_test.sh
 15.922s. 150_connect_forget_test.sh
 15.914s. 220_dns_custom_test.sh
 15.385s. 270_use_name_as_hostname_test.sh
 13.828s. 260_dns_removal_test.sh
 10.957s. 660_proxy_ipam_test.sh
  9.847s. 240_dns_add_name_test.sh
  9.542s. 110_encryption_test.sh
  8.471s. 230_dns_unqualified_test.sh
  7.510s. 620_proxy_entrypoint_command_test.sh
  7.143s. 140_weave_local_test.sh
  6.952s. 290_dns_fallback_test.sh
  6.333s. 635_proxy_dns_unqualified_test.sh
  5.428s. 650_proxy_env_test.sh
  5.409s. 670_proxy_tls_test.sh
  5.274s. 610_proxy_wait_for_weave_test.sh

A few observations:

  • there is not much we can do to speed up 600_proxy_docker_py_test.sh. We didn't write it, and it does quite a lot. We could try to improve the code, but I doubt substantial gains will be easy to come by. Furthermore, very very few changes should ever result in issues caught by that test. So I suggest we introduce a flag to disable that and any future tests with similar characteristics.
  • a few of the tests are dominated by image downloading times. This actually skews the results obtained from CircleCI, since the images will be downloaded just once per shard. So whichever test happens to request an image first takes the hit. I suggest we make downloading of the two main test images part of run_all. This won't improve performance, but will make the test times more consistent.
  • weave launch/launch-dns typically take close to 2s. It should be half that. The reason is that we are polling the http i/f of the container, and can only do so at an interval >= 1s, since posix does not have a fractional sleep. We could check whether fractional sleep is supported / check whether we are running w/ --local (and hence inside weavexec, which we know to have fractional sleep).
@rade rade added the chore label Jun 17, 2015
@rade
Copy link
Member Author

rade commented Jun 17, 2015

Here's a neat way of getting fine-grained timing of shell execution:

$ WEAVE_DEBUG=1 weave launch |& while read line; do echo -n $(date -u -Ins); echo -e "\t$line"; done

Unfortunately this doesn't work for the test scripts since assert.sh appears to not like being executed under sh -x.

Incidentally, the weave launch timing doesn't show up anything unexpected. When I drop the wait_for_status sleep from 1s to 0.1s overall execution time is just over a second. 0.2 seconds is spent on launching a docker image, which we do twice here - once for weavexec and once for weave. wait_for_status takes about 0.25s. So that's 2/3rd of the overall time accounted for. The rest is spread over all the other stuff we do in the script.

@rade
Copy link
Member Author

rade commented Jun 17, 2015

assert.sh appears to not like being executed under sh -x.

Inserting set -x into config.sh just after we've sourced assert.sh gets around that.

A few observations from getting times for 130_*....

  • docker rm -f takes ages. Switching to docker kill; docker rm -f makes no difference.
  • launching a container takes about 3.5 seconds, vs 1s when I run the same command on my machine.
  • executing any weave command takes about 2 seconds
  • aside from the rm, nothing really sticks out; 130_* does a lot of container starts and runs a lot of weave commands, so it all just adds up.

Perhaps my test VM is somewhat anaemic.

@rade
Copy link
Member Author

rade commented Jun 22, 2015

Most of our tests use just one host, so one way to shorten the execution time would be make the test runner host that host. I hacked this into the config.sh which cut execution time of all single-host test cases from 14 minutes to 4 minutes on my machine.

@tomwilkie
Copy link
Contributor

Note circle has an ancient version of docker & ubuntu, so this is unlikely
to work there.

On Mon, Jun 22, 2015 at 2:10 PM, Matthias Radestock <
[email protected]> wrote:

Most of our tests use just one host, so one way to shorten the execution
time would be make the test runner host that host. I hacked this into the
config.sh which cut execution time of all single-host test cases from 14
minutes to 4 minutes on my machine.


Reply to this email directly or view it on GitHub
#942 (comment).

@tomwilkie
Copy link
Contributor

I was working on some improvements to gce.sh to make gce.sh setup faster.

It currently takes ~2mins. My hypothesis was that we could improve this by ~20s if we only make one call to find out all the IPs from GCE, and only ssh'd into the VMs once to setup hosts file.

Work in progress is here: 97d0d83

@rade
Copy link
Member Author

rade commented Jul 14, 2015

A few observations from getting times for 130_*....

  • docker rm -f takes ages. Switching to docker kill; docker rm -f makes no difference.
  • launching a container takes about 3.5 seconds, vs 1s when I run the same command on my machine.
  • executing any weave command takes about 2 seconds
  • aside from the rm, nothing really sticks out; 130_* does a lot of container starts and runs a lot of weave commands, so it all just adds up.

Perhaps my test VM is somewhat anaemic.

Or perhaps it was using the worst docker storage driver ever! Fixed by #1125.

@rade
Copy link
Member Author

rade commented Jul 14, 2015

  • a few of the tests are dominated by image downloading times. This actually skews the results obtained from CircleCI, since the images will be downloaded just once per shard. So whichever test happens to request an image first takes the hit. I suggest we make downloading of the two main test images part of run_all. This won't improve performance, but will make the test times more consistent.

Fixed in #989.

  • weave launch/launch-dns typically take close to 2s. It should be half that. The reason is that we are polling the http i/f of the container, and can only do so at an interval >= 1s, since posix does not have a fractional sleep. We could check whether fractional sleep is supported / check whether we are running w/ --local (and hence inside weavexec, which we know to have fractional sleep).

Fixed in #992.

@rade rade added the icebox label Dec 29, 2015
@rade rade removed the icebox label Jul 4, 2016
@rade rade modified the milestone: icebox Jul 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants