-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomize test ports #6915
Randomize test ports #6915
Conversation
d577e22
to
a73b5f8
Compare
a73b5f8
to
0dc1437
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a few comments. Thanks!
tests/testcore/onebox.go
Outdated
@@ -949,3 +893,15 @@ func (c *TemporalImpl) overrideDynamicConfig(t *testing.T, name dynamicconfig.Ke | |||
t.Cleanup(cleanup) | |||
return cleanup | |||
} | |||
|
|||
func portFromAddress(addr string) httpPort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add must
to clarify that this panics.
func portFromAddress(addr string) httpPort { | |
func mustPortFromAddress(addr string) httpPort { |
tests/testcore/test_cluster.go
Outdated
// allocate ports | ||
pp := temporalite.NewPortProvider() | ||
hostsByProtocolByService := map[transferProtocol]map[primitives.ServiceName]static.Hosts{ | ||
grpcProtocol: map[primitives.ServiceName]static.Hosts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grpcProtocol: map[primitives.ServiceName]static.Hosts{ | |
grpcProtocol: { |
tests/testcore/test_cluster.go
Outdated
pp := temporalite.NewPortProvider() | ||
hostsByProtocolByService := map[transferProtocol]map[primitives.ServiceName]static.Hosts{ | ||
grpcProtocol: map[primitives.ServiceName]static.Hosts{ | ||
primitives.FrontendService: static.Hosts{All: makeAddresses(pp, options.FrontendConfig.NumFrontendHosts)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
primitives.FrontendService: static.Hosts{All: makeAddresses(pp, options.FrontendConfig.NumFrontendHosts)}, | |
primitives.FrontendService: {All: makeAddresses(pp, options.FrontendConfig.NumFrontendHosts)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, I wonder why (my) Goland doesn't highlight this as redundant.
## What changed? <!-- Describe what has changed in this PR --> Follow-up to #6915 - making sure clusters are started one-by-one. ## Why? <!-- Tell your future self why have you made these changes --> To make sure there are no port allocation conflicts for parallel cluster starts. ## How did you test it? <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> ## Potential risks <!-- Assuming the worst case, what can be broken when deploying this change to production? --> It can slow down functional test execution. I haven't measured yet how long a cluster start takes. If it turns out to be too long, I can go back to the drawing board on this. ## Documentation <!-- Have you made sure this change doesn't falsify anything currently stated in `docs/`? If significant new behavior is added, have you described that in `docs/`? --> ## Is hotfix candidate? <!-- Is this PR a hotfix candidate or does it require a notification to be sent to the broader community? (Yes/No) -->
## What changed? <!-- Describe what has changed in this PR --> New freeport implementation; taken from battle-tested Temporal projects. ## Why? <!-- Tell your future self why have you made these changes --> Follow-up to #6915; making it simpler and more robust. ## How did you test it? <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> - [x] works on Linux (ie CI) - [x] works on Mac (ie my computer) [First CI run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/1?pr=6966), `TestNexusCallbackReplicated` failed with `Error while dialing: dial tcp 127.0.0.1:33279: connect: connection refused`. Looks like there was an unrelated data race (that I've reported to Slack). [Second CI run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/2?pr=6966) passed. [Third CI run](https://github.com/temporalio/temporal/actions/runs/12269035797?pr=6966), Versioning suite seems to have [timed out](https://github.com/temporalio/temporal/actions/runs/12269035797/job/34233452135?pr=6966#step:8:966). Unrelated. So it seems to be fine from a random port perspective. ## Potential risks <!-- Assuming the worst case, what can be broken when deploying this change to production? --> ## Documentation <!-- Have you made sure this change doesn't falsify anything currently stated in `docs/`? If significant new behavior is added, have you described that in `docs/`? --> ## Is hotfix candidate? <!-- Is this PR a hotfix candidate or does it require a notification to be sent to the broader community? (Yes/No) -->
## What changed? <!-- Describe what has changed in this PR --> New freeport implementation; taken from battle-tested Temporal projects. ## Why? <!-- Tell your future self why have you made these changes --> Follow-up to #6915; making it simpler and more robust. ## How did you test it? <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> - [x] works on Linux (ie CI) - [x] works on Mac (ie my computer) [First CI run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/1?pr=6966), `TestNexusCallbackReplicated` failed with `Error while dialing: dial tcp 127.0.0.1:33279: connect: connection refused`. Looks like there was an unrelated data race (that I've reported to Slack). [Second CI run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/2?pr=6966) passed. [Third CI run](https://github.com/temporalio/temporal/actions/runs/12269035797?pr=6966), Versioning suite seems to have [timed out](https://github.com/temporalio/temporal/actions/runs/12269035797/job/34233452135?pr=6966#step:8:966). Unrelated. So it seems to be fine from a random port perspective. ## Potential risks <!-- Assuming the worst case, what can be broken when deploying this change to production? --> ## Documentation <!-- Have you made sure this change doesn't falsify anything currently stated in `docs/`? If significant new behavior is added, have you described that in `docs/`? --> ## Is hotfix candidate? <!-- Is this PR a hotfix candidate or does it require a notification to be sent to the broader community? (Yes/No) -->
What changed?
Instead of using fixed ports in functional tests, use randomized ports.
The nested map is ... not pretty but effective. Happy to throw more design at it, if needed.
Why?
When starting two or more clusters (e.g. xdc tests), different hosts have to be used for the ports not to collide; which causes issues on at least MacOS.
How did you test it?
Existing tests.
Potential risks
Documentation
Is hotfix candidate?