Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomize test ports #6915

Merged
merged 2 commits into from
Dec 2, 2024
Merged

Conversation

stephanos
Copy link
Contributor

@stephanos stephanos commented Nov 30, 2024

What changed?

Instead of using fixed ports in functional tests, use randomized ports.

The nested map is ... not pretty but effective. Happy to throw more design at it, if needed.

Why?

When starting two or more clusters (e.g. xdc tests), different hosts have to be used for the ports not to collide; which causes issues on at least MacOS.

How did you test it?

Existing tests.

Potential risks

Documentation

Is hotfix candidate?

@stephanos stephanos force-pushed the onebox-random-ports branch 5 times, most recently from d577e22 to a73b5f8 Compare November 30, 2024 03:15
@stephanos stephanos marked this pull request as ready for review November 30, 2024 18:19
@stephanos stephanos requested a review from a team as a code owner November 30, 2024 18:19
@stephanos stephanos requested review from bergundy and dnr November 30, 2024 18:19
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a few comments. Thanks!

@@ -949,3 +893,15 @@ func (c *TemporalImpl) overrideDynamicConfig(t *testing.T, name dynamicconfig.Ke
t.Cleanup(cleanup)
return cleanup
}

func portFromAddress(addr string) httpPort {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add must to clarify that this panics.

Suggested change
func portFromAddress(addr string) httpPort {
func mustPortFromAddress(addr string) httpPort {

// allocate ports
pp := temporalite.NewPortProvider()
hostsByProtocolByService := map[transferProtocol]map[primitives.ServiceName]static.Hosts{
grpcProtocol: map[primitives.ServiceName]static.Hosts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grpcProtocol: map[primitives.ServiceName]static.Hosts{
grpcProtocol: {

pp := temporalite.NewPortProvider()
hostsByProtocolByService := map[transferProtocol]map[primitives.ServiceName]static.Hosts{
grpcProtocol: map[primitives.ServiceName]static.Hosts{
primitives.FrontendService: static.Hosts{All: makeAddresses(pp, options.FrontendConfig.NumFrontendHosts)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
primitives.FrontendService: static.Hosts{All: makeAddresses(pp, options.FrontendConfig.NumFrontendHosts)},
primitives.FrontendService: {All: makeAddresses(pp, options.FrontendConfig.NumFrontendHosts)},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I wonder why (my) Goland doesn't highlight this as redundant.

@stephanos stephanos enabled auto-merge (squash) December 2, 2024 19:15
@stephanos stephanos merged commit f1648b5 into temporalio:main Dec 2, 2024
49 checks passed
@stephanos stephanos deleted the onebox-random-ports branch December 2, 2024 19:40
@stephanos stephanos mentioned this pull request Dec 3, 2024
stephanos added a commit that referenced this pull request Dec 3, 2024
## What changed?
<!-- Describe what has changed in this PR -->

Follow-up to #6915 - making
sure clusters are started one-by-one.

## Why?
<!-- Tell your future self why have you made these changes -->

To make sure there are no port allocation conflicts for parallel cluster
starts.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

It can slow down functional test execution. I haven't measured yet how
long a cluster start takes.

If it turns out to be too long, I can go back to the drawing board on
this.

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
@stephanos stephanos mentioned this pull request Dec 11, 2024
2 tasks
stephanos added a commit that referenced this pull request Dec 12, 2024
## What changed?
<!-- Describe what has changed in this PR -->

New freeport implementation; taken from battle-tested Temporal projects.

## Why?
<!-- Tell your future self why have you made these changes -->

Follow-up to #6915; making it
simpler and more robust.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

- [x] works on Linux (ie CI)
- [x] works on Mac (ie my computer)

[First CI
run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/1?pr=6966),
`TestNexusCallbackReplicated` failed with `Error while dialing: dial tcp
127.0.0.1:33279: connect: connection refused`. Looks like there was an
unrelated data race (that I've reported to Slack).

[Second CI
run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/2?pr=6966)
passed.

[Third CI
run](https://github.com/temporalio/temporal/actions/runs/12269035797?pr=6966),
Versioning suite seems to have [timed
out](https://github.com/temporalio/temporal/actions/runs/12269035797/job/34233452135?pr=6966#step:8:966).
Unrelated.

So it seems to be fine from a random port perspective.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
stephanos added a commit that referenced this pull request Dec 20, 2024
## What changed?
<!-- Describe what has changed in this PR -->

New freeport implementation; taken from battle-tested Temporal projects.

## Why?
<!-- Tell your future self why have you made these changes -->

Follow-up to #6915; making it
simpler and more robust.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

- [x] works on Linux (ie CI)
- [x] works on Mac (ie my computer)

[First CI
run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/1?pr=6966),
`TestNexusCallbackReplicated` failed with `Error while dialing: dial tcp
127.0.0.1:33279: connect: connection refused`. Looks like there was an
unrelated data race (that I've reported to Slack).

[Second CI
run](https://github.com/temporalio/temporal/actions/runs/12269035797/attempts/2?pr=6966)
passed.

[Third CI
run](https://github.com/temporalio/temporal/actions/runs/12269035797?pr=6966),
Versioning suite seems to have [timed
out](https://github.com/temporalio/temporal/actions/runs/12269035797/job/34233452135?pr=6966#step:8:966).
Unrelated.

So it seems to be fine from a random port perspective.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants