README.md

Service Timing Experiments

We discovered slow networking on GKE and reported the issue and these series of experiments are attempting to investigate different aspects.

run1 was my original testing done on GKE to report the issue (May 4, 2023)
run2 was a secondary test to run nslookup across different cluster flags (May 16, 2023)
run3 ran telnet in the worker pod to look at connection times/patterns to the broker leader (index 0)
run4 used a test deployment of the operator that wrapped flux start with strace
run5 attempts to remove DNS by getting pod ip addresses and writing them into /etc/hosts
run6 is an effort to put together best practices of what we learned and reproduce the run1 experiments with improvements (May 17, 2023)
run7 the same but adding back the coredns to see if it replicates the original error
run8 was one more attempt to reproduce the issue (done, and one huge timeout)
run9 was the final case to replicate (did)
run10 is the equivalent experiment but scaled up to a larger cluster
run11 are results from Dmitri on the Google networking team.
run12 a small run that tests the original experiment with only one hostname
run13 testing more random configurations hoping for insight