Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple network interface binding #281

Merged
merged 16 commits into from
May 3, 2023

Conversation

al-rigazzi
Copy link
Collaborator

@al-rigazzi al-rigazzi commented Apr 26, 2023

This PR adds the possibility of binding Orchestrators and co-located DBs to multiple network interfaces. It also adds the --bind-source-addr argument to any Redis DB we deploy, making sure shards bind to an address we know a priori.

@codecov
Copy link

codecov bot commented Apr 26, 2023

Codecov Report

Merging #281 (12d21ac) into develop (950ef6b) will increase coverage by 0.04%.
The diff coverage is 92.30%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #281      +/-   ##
===========================================
+ Coverage    87.65%   87.70%   +0.04%     
===========================================
  Files           60       60              
  Lines         3386     3374      -12     
===========================================
- Hits          2968     2959       -9     
+ Misses         418      415       -3     
Impacted Files Coverage Δ
smartsim/_core/control/manifest.py 94.78% <0.00%> (ø)
smartsim/_core/config/config.py 98.38% <100.00%> (+3.45%) ⬆️
smartsim/_core/launcher/colocated.py 93.10% <100.00%> (+0.16%) ⬆️
smartsim/database/orchestrator.py 85.13% <100.00%> (+0.15%) ⬆️

@al-rigazzi al-rigazzi marked this pull request as ready for review April 27, 2023 18:03
@al-rigazzi al-rigazzi requested a review from MattToast April 27, 2023 18:27
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! a couple of minor style things and some concerns around the test that was changed, but otherwise lgtm!!

cmd = command + [f"--bind {lo_address}"]
else:
# bind to both addresses if the user specified a network
# address that exists and is not the loopback address
cmd = command + [f"--bind {lo_address} {ip_address}"]
cmd = command + [f"--bind {lo_address} " + " ".join(ip_addresses)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor knit pick, can we make this use either entierly f-strs or string concatanation just be be consistant? Maybe

[f"--bind {lo_address} {' '.join(ip_addresses)}"]

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -206,7 +212,7 @@ def main(
"\n\nCo-located database information\n"
+ "\n".join(
(
f"\tIP Address: {ip_address}",
f"\tIP Address(es): " + " ".join(ip_addresses + [lo_address]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same unify f-str vs str-concat knit pick here. Would prefer f-str to be consistent with line 218

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

)
logger.warning(f"Found network interfaces are: {available}")
for interface in self._interfaces:
if interface not in net_if_addrs and self._interfaces != "lo":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line

if interface not in net_if_addrs and self._interfaces != "lo":
#                                    ^^^^^^^^^^^^^^^^
# Should this be `interface` instead of `self._interfaces`?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense.

Comment on lines 340 to 354
exp.start(colo_model, block=True)
statuses = exp.get_status(colo_model)
assert all([stat == status.STATUS_COMPLETED for stat in statuses])
exp.start(colo_model, block=False)

completed = False
timeout = 90
check_interval = 5
while timeout and not completed:
timeout -= check_interval
time.sleep(check_interval)
statuses = exp.get_status(colo_model)
if all([stat == status.STATUS_COMPLETED for stat in statuses]):
completed = True

assert completed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand that change here from block=True to block=False with a timeout. Was this test hanging and this was a way to "fail fast"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it was hanging forever (even in CI/CD). Do you think there is a better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! just wanted make sure I understood the reason for the change!

completed = False
timeout = 90
check_interval = 5
while timeout and not completed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: If the timeout is expended here and the test fails, is the colo_model run killed and cleaned up, or is the process left running?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'd say it could be left running. I changed the way we handle it now.

for net_if_addr in net_if_addrs:
if net_if_addr.startswith("hsn"):
interfaces.append(net_if_addr)
return interfaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type does not match the type annotation of the fn: expected str but got list[str]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unionized, do you think it is fine? We could actually always return a list (even with just one item).

Copy link
Member

@MattToast MattToast May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh I kinda like the idea of unifying the type and making it always return a list just for consistency's sake. That way I could always count on being doing something like

for network in Config().test_interface:
    # `network` is guaranteed to be a str of name of network
    # not chars in a str
    # Not sure why I would want to do this, but could ¯\_(ツ)_/¯
    ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrug is the way. I'll fix it.

Comment on lines 170 to 173
ip_addresses = []
ip_addresses.extend(
[current_ip(interface) for interface in network_interface.split(",")]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these lines be simplified to just be

ip_addresses = [current_ip(interface) for interface in network_interface.split(",")]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense.

current_ip(net_if) for net_if in network_interface.split(",")
]
ip_address = ip_addresses[0]
cmd = command + ["--bind " + " ".join(ip_addresses)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now purely stylistic: Can we also make this an f-str to be consistent with the colo script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it always bothered me.

:param interface: network interface, defaults to "lo"
:type interface: str, optional
:param interface: network interface(s), defaults to "lo"
:type interface: str, list[str], optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor doc thing: earlier in the docstrings, the type "string or list of strings" was conveyed as str | list[str]. Should we do that here as well/update the earlier one to be str, list[str] just to be consistent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be Optional[str] | Optional[list[str]]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The "proper" (i.e. mypy accepted) type hint would be Optional[Union[str, list[str]]] in py<=3.9 syntax or str | list[str] | None in py>=3.10 syntax

But this also brings up our somewhat inconsistent use of , optional. For the most part, when , optional is provided in the :type: tag we usually imply that a type of None is accepted and we simply do not annotate the None in the docstr :type:. I swear I've seen other packages do something similar, but cannot seem to recall any offhand.

Any preference between :type interface: str | list[str] | None, optional or :type interface: str | list[str], optional? I honestly could be swayed either way...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we either put None or optional... both seem redundant. Maybe keep optional for now?


if lo_address == ip_address or not ip_address:
if (
all([lo_address == ip_address for ip_address in ip_addresses])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor "performance" thing: If you drop the square brackets here, all will evaluate this lazily and will return on the first instance of False

That said, this list is small and we are unlikely to see literally any difference, so this is an entirely optional refactor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping characters is always good.

@al-rigazzi al-rigazzi requested a review from MattToast May 2, 2023 18:25
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two last conversations to close out, but otherwise LGTM!

Approving in advance bc everything else looks great!

@al-rigazzi al-rigazzi merged commit f0ed512 into CrayLabs:develop May 3, 2023
@al-rigazzi al-rigazzi deleted the multi-interface branch May 10, 2023 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants