Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable type checking #5936

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Dec 18, 2024

Proposed Commit Message

chore: enable type checking for core Python modules

Add various datastructures and type annotations to main.py and stages.py
to support type checking and simplify logic.

Refactor logic in Init().

This change reduces McCabe complexity index in main.py:

    main_init: 31 -> 28
    status_wrapper: 20 -> 13

Remove the following from the mypy override list:

    cloudinit.analyze
    cloudinit.cmd.main
    cloudinit.config.modules
    cloudinit.distros
    cloudinit.distros.ubuntu
    cloudinit.helpers
    cloudinit.net.cmdline
    cloudinit.sources
    cloudinit.ssh_util
    cloudinit.stages
    cloudinit.templater

Additional context

Note to reviewers: most of the file changes should be self-explanatory, with two exceptions:

stages.py
---------

- Init.__init__(): signature change to accept new Stage type and to set cache_mode 
- renamed methods
- some new helper methods to simplify logic
- add module-level Stage variables to reduce magic string literals

main.py
-------

- add TypedDict to define status.json internal datastructures
- replace magic literal use with Stage types from stages.py
- stop using dsmode strings for controlling non-datasource control flow

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@@ -122,7 +122,7 @@ class SystemctlReader:

def __init__(self, property, parameter=None):
self.epoch = None
self.args = [subp.which("systemctl"), "show"]
self.args = ["systemctl", "show"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subp.which can return None, this is redundant

check = "check"


# used in status.json and some log strings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section defines a variety of magic strings that were previously scattered throughout main.py and stages.py.


# These module variables store static data that is associated with each stage.
# https://docs.cloud-init.io/en/latest/explanation/boot.html
local: Final = BootStage(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dataclasses hold all the (formerly) magic strings. Note that dsmode is not included because dsmode is a datasource & runtime concept, not a thing that is known before running the code.


stage: which Stage Init() is currently running in
reporter: a reporter object
cache_mode: force a specific CacheMode
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_mode is known before instantiation, so set it on instantiation

@@ -276,36 +305,6 @@ def attempt_cmdline_url(path, network=True, cmdline=None) -> Tuple[int, str]:
)


def purge_cache_on_python_version_change(init):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense as an Init() method. A few other functions in this file should probably be moved as well (_should_bring_up_interfaces(), _maybe_persist_instance_data(), _maybe_set_hostname()), but this PR is already too big.

if args.local:
deps = [sources.DEP_FILESYSTEM]
stage = stages.local
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stage holds all of the (formerly) magic strings as well as deps - this is all statically defined data that is known prior to runtime

sys.stderr.write("%s\n" % (netinfo.debug_info()))
else:
existing = "check"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic still exists. See Init().cache_mode


if mode == sources.DSMODE_NETWORK:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using dsmode variables for flow control that isn't actually relevant to the dsmode is unnecessarily convoluted.

All stage-specific flow control* now uses the local stage variable and the stages.{local,network,config,final} dataclasses.

*with the exception of the (two?) spots in the code where datasource.dsmode modifies the flow of code


# Stage 5
bring_up_interfaces = _should_bring_up_interfaces(init, args)
try:
init.fetch(existing=existing)
# if in network mode, and the datasource is local
init.fetch()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_mode is either set at Init() instantiation, or discovered at runtime in Init().cache_mode

# Before network comes up, set any configured hostname to allow
# dhcp clients to advertize this hostname to any DDNS services
# LP: #1746455.
_maybe_set_hostname(init, stage="local", retry_stage="network")

init.apply_network_config(bring_up=bring_up_interfaces)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still gets called in both branches, just a small refactor to eliminate redundant branches.

LOG.debug(
"[%s] Exiting. datasource %s not in local mode.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [%s] was redundant. If the datasource is not in local mode, then it is in network mode. The surrounding logs also hold this information.

init.datasource,
)
return (init.datasource, [])
else:
LOG.debug(
"[%s] %s is in local mode, will apply init modules now.",
mode,
"%s is in local mode, will apply init modules now.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [%s] was redundant.

@@ -342,9 +587,7 @@ def _get_datasources(self):
cfg_list = self.cfg.get("datasource_list") or []
return (cfg_list, pkg_list)

def _restore_from_checked_cache(self, existing):
if existing not in ("check", "trust"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Init() is annotated with the stages.CacheMode, and all callsites of Init() are checked by mypy, we shouldn't need to do these runtime checks anymore.

@@ -431,7 +679,7 @@ def _get_ipath(self, subname=None):
)
return instance_dir

def _write_network_config_json(self, netcfg: dict):
def _write_network_config_json(self, netcfg: Optional[dict]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because self._find_networking_config() has codepaths that return None as the first argument.

else:
raise_on_invalid_command(args)
bytes_args = [
x if isinstance(x, bytes) else x.encode("utf-8") for x in args
]
try:
with performance.Timed(
"Running {}".format(logstring if logstring else args)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy didn't like the multiple types that this could be. Standardize above to List[bytes] and log it as a List[string].

@holmanb holmanb marked this pull request as ready for review December 19, 2024 23:52
Copy link

github-actions bot commented Jan 7, 2025

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 7, 2025
@holmanb holmanb removed the stale-pr Pull request is stale; will be auto-closed soon label Jan 7, 2025
Copy link

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 22, 2025
@holmanb holmanb removed the stale-pr Pull request is stale; will be auto-closed soon label Jan 22, 2025
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Overall some really nice changes. I left some comments inline.

Additionally, I know you mentioned it earlier, but how hard is it to split out some of the changes into commits? In particular, the BootStage changes along with read_cfg_paths changes have some far reaching implications, so it's hard to verify those with all of the other interleaved changes.

@@ -63,6 +75,180 @@
)


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to keep this a named tuple?

]


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(frozen=True) for the dataclasses defined here?

# [DEP_FILESYSTEM, DEP_NETWORK] - this means Network stage
#
# XXX: TODO - simplify this logic
Deps = Literal["FILESYSTEM", "NETWORK"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a value add here. Other than the next two lines, there's a single call point that could define the literal there.

DSMODE_NETWORK = "net"
DSMODE_PASS = "pass"
DsMode = Literal["disabled", "local", "net", "pass"]
DSMODE_DISABLED: Final[DsMode] = "disabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how typing these variables here as Literal helps us at all. Why not move the Literal typing to line 212?


UNSET = "_unset"
METADATA_UNKNOWN = "unknown"
UNSET: Final[dict] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is referenced in a bunch of places outside of this file. I think I'd rather see us replace instances of this one by one rather than trying to do a big bang update here.

@@ -583,7 +583,7 @@ def update_hostname(self, hostname, fqdn, prev_hostname_fn):
# Remove duplicates (incase the previous config filename)
# is the same as the system config filename, don't bother
# doing it twice
update_files = set([f for f in update_files if f])
update_files = list(set([f for f in update_files if f]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?



# used in status.json and some log strings
UserLongName = Literal[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining all of these string literals feels like overkill to me. As far as I can tell, after the BootStage refactor (which is really nice btw), the boot stages are all initialized once right below these definitions. Do we really need to define every possible string value that a welcome value can take or the internal name that we log? How does that benefit us?

My preference would be to kill all of the of the Literal defintions here, and have a single BootStage class defined as:

class BootStage:  # Or whatever we wanna call it
    deps: List[sources.Deps]
    description: str
    welcome: str
    name: str
    report: str
    long: str
    status: str
    internal: str

The other stages can use a "" for fields that aren't used for those stages. I don't think the benefit of getting rid of 3 unused fields (that only get logged/printed anyway) is worth the complexity of a whole class hierarchy.

@stappersg
Copy link
Contributor

stappersg commented Feb 9, 2025

Just a silly update to prevent that the git-actions-bot marks this merge request as stale. (Otherwise would human intervention be needed to remove the stale pr label.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants