-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Warn user of unexpected run mode #5209
feat: Warn user of unexpected run mode #5209
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking some boxes:
- ✔ Alpine (openrc-based) will invoke cloud-init boot stages from non-1 PPID, but we are ignoring non-systemd cases
- ✔non-linux and therefore non-systemd is ignored
Here's a suggested diff to ensure the logs are actually written, and the log message has a bit more data about the launched boot stage. It doesn't resolve either the lack of logging PID on non-linux/non-systemd or the dropping of util.is_Linux
.
diff --git a/cloudinit/cmd/main.py b/cloudinit/cmd/main.py
index ad9910e04..aff4d380f 100644
--- a/cloudinit/cmd/main.py
+++ b/cloudinit/cmd/main.py
@@ -72,15 +72,20 @@ def print_exc(msg=""):
sys.stderr.write("\n")
-def log_ppid(distro):
+def log_ppid(distro, bootstage_name):
if distro.is_linux:
ppid = os.getppid()
log = LOG.info
extra_message = ""
if 1 != ppid and distro.uses_systemd():
log = LOG.warning
- extra_message = ("Not a supported configuration.",)
- log("PID [%s] started cloud-init. %s", ppid, extra_message)
+ extra_message = " Unsupported configuration: boot stage called outside of systemd"
+ log(
+ "PID [%s] started cloud-init '%s'.%s",
+ ppid,
+ bootstage_name,
+ extra_message,
+ )
def welcome(action, msg=None):
@@ -321,14 +326,11 @@ def main_init(name, args):
# objects config as it may be different from init object
# 10. Run the modules for the 'init' stage
# 11. Done!
- if not args.local:
- w_msg = welcome_format(name)
- else:
- w_msg = welcome_format("%s-local" % (name))
+ bootstage_name = "%s-local" % (name) if args.local else name
+ w_msg = welcome_format(bootstage_name)
init = stages.Init(ds_deps=deps, reporter=args.reporter)
# Stage 1
init.read_cfg(extract_fns(args))
- log_ppid(init.distro)
# Stage 2
outfmt = None
errfmt = None
@@ -354,6 +356,7 @@ def main_init(name, args):
# config applied. We send the welcome message now, as stderr/out have
# been redirected and log now configured.
welcome(name, msg=w_msg)
+ log_ppid(init.distro, bootstage_name)
# re-play early log messages before logging was setup
for lvl, msg in early_logs:
@@ -581,11 +584,11 @@ def main_modules(action_name, args):
# the modules objects configuration
# 5. Run the modules for the given stage name
# 6. Done!
- w_msg = welcome_format("%s:%s" % (action_name, name))
+ bootstage_name = "%s:%s" % (action_name, name)
+ w_msg = welcome_format(bootstage_name)
init = stages.Init(ds_deps=[], reporter=args.reporter)
# Stage 1
init.read_cfg(extract_fns(args))
- log_ppid(init.distro)
# Stage 2
try:
init.fetch(existing="trust")
@@ -620,6 +623,7 @@ def main_modules(action_name, args):
# now that logging is setup and stdout redirected, send welcome
welcome(name, msg=w_msg)
+ log_ppid(init.distro, bootstage_name)
if name == "init":
util.deprecate(
cloudinit/cmd/main.py
Outdated
@@ -324,6 +328,7 @@ def main_init(name, args): | |||
init = stages.Init(ds_deps=deps, reporter=args.reporter) | |||
# Stage 1 | |||
init.read_cfg(extract_fns(args)) | |||
log_ppid(init.distro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot happen until after log.setup_logging
has been called.
cloudinit/cmd/main.py
Outdated
@@ -580,6 +585,7 @@ def main_modules(action_name, args): | |||
init = stages.Init(ds_deps=[], reporter=args.reporter) | |||
# Stage 1 | |||
init.read_cfg(extract_fns(args)) | |||
log_ppid(init.distro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to move this down a bit too after log.setup_logging
otherwise no logs are written to /var/log/cloud-init.log. To retain original behavior, we may want to log this just after the welcome(... w_msg)
to log it during the boot stage block.
cloudinit/cmd/main.py
Outdated
extra_message = "" | ||
if 1 != ppid and distro.uses_systemd(): | ||
log = LOG.warning | ||
extra_message = ("Not a supported configuration.",) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add details in the extra message to tell use what's unsupported about it.
Unsupported configuration: boot stage called outside of systemd
@@ -490,6 +490,12 @@ def multi_log( | |||
|
|||
@lru_cache() | |||
def is_Linux(): | |||
"""deprecated: prefer Distro object's `is_linux` property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just drop this function altogether if we can figure a better approach to sort cloud-init analyze dump
as the only other use-case right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
cloudinit/cmd/main.py
Outdated
if 1 != ppid and distro.uses_systemd(): | ||
log = LOG.warning | ||
extra_message = ("Not a supported configuration.",) | ||
log("PID [%s] started cloud-init. %s", ppid, extra_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we intentionally not logging the PID when not distro.is_linux
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the code did before, but I suppose we should be able to log PPID on non-linux too. I'm curious whether the PPID=1 should hold true on any of the BSDs, and it wouldn't hurt to log it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good and behaves well. One thing I think we need to sort here is integration tests which call cloud-init --local over SSH as they'll now exit non-zero due to the warning.
Tests affected are at least the following which both check response.ok
from a cloud-init init --local call.
- tests/integration_tests/datasources/test_ec2_ipv6.py
- tests/integration_tests/test_upgrade.py
We might be able to adapt those tests to generally disregard exit 2 or maybe specifically call verify_clean_boot with a non-empty ignore_warnings item to ignore this expected warning log.
Because we patched out the We would see an |
yep and yep. I think it's worth a followup quilt patch in ubuntu/noble:debian/patches/retain-ppid-unsupported-config-as-info-level.patch to ensure we don't set warning level on the logs in noble. |
we could always just make this a deprecation log and deal with this in the same way that we choose to deal with other new deprecations on older series - a patch per log like this seems really expensive to maintain in the long run |
That is a much better idea. Good suggestion, let's go with deprecation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on stable releases, we'll patch the warning to a deprecation. Only thing left to land this PR is integration test fixes per this comment
I think this is just waiting on two integration test fixes. But, I'd like this in 24.3 as we are introducing a "breaking change" as far as warnings being emitted where there were not warnings before-hand and I'd like us to avoid trying to quickly stuff quilt patches into /noble for this across the SRU boundary for 24.2 |
Resolving this involves either:
I don't think that we want 1) (it involves muddying
In order to make I'll share what I've been working on shortly. |
@blackboxsw Last commit contains what I had in mind. With this commit I can successfully run
And it passes. Thoughts? |
Thanks @holmanb . I've added two supplemental commits to your branch for reviiew
|
bb67e92
to
370552d
Compare
tests/integration_tests/util.py
Outdated
).stdout.strip() | ||
) | ||
assert expected_traceback_count == traceback_count, ( | ||
f"{traceback_count - expected_traceback_count} unexpected traceback(s)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: This fits the technical requirement for checking for tracebacks, but I'd prefer to have a nicer UX than "we found something we didn't expect, go figure it out", if possible.
Currently the one-line pytest failure summary will say something like:
Unexpected traceback found in /var/log/cloud-init.log
However, if the error message included the exception message I think that would be more useful while iterating during development or debugging.
Unexpected tracebacks: ZeroDivisionError: division by zero
Also, looking back at _format_found()
, we only include the unexpected log when there is a single unexpected error / warning / etc. I'd like to change this to include the first unexpected encountered rather than use a generic header - again so that the pytest summary has richer information about what failed.
We don't need to make these changes in this PR, but I just wanted to lay out some UI improvements that I'd like to see eventually which should make developing and debugging easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this too, I did also look at processing/extracting tracebacks from the cloud-init.log and thought it a bit more complex than trying to piggy back on this PR. But, agreed a richer UX would be helpful in the event of failures and manually jumping into the logged artifacts to get this data after the fact is painful.
cloudinit/cmd/main.py
Outdated
DEPRECATE_BOOT_STAGE_MESSAGE = ( | ||
"Triggering cloud-init boot stages outside of intial system boot is not a" | ||
" fully supported operation which can lead to incomplete of incorrect" | ||
" configuration due to either race-conditions with other active services" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
race conditions and incomplete triggering are two reasons these may not work, but there are other reasons, for example: non-idempotent behavior
Maybe just drop the "due to ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
cloudinit/cmd/main.py
Outdated
" fully supported operation which can lead to incomplete of incorrect" | ||
" configuration due to either race-conditions with other active services" | ||
" or incomplete triggering of all cloud-init boot stages. As such," | ||
" cloud-init is deprecating this undocumented feature in the future. If" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these subcommands are documented (man page, docs), we just don't actively encourage their use
I think it would be best if we make them undocumented, or at least make the wording in the man page and docs more clear that these aren't intended to be used manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That probably makes sense in this PR too.
ee02b64
to
6134d93
Compare
@holmanb I think I resolved your last pass of review comments. Please take a look and make sure I didn't miss anything. |
@holmanb I think we want this #5209 in milestone 24.2 only if #5411 merged because #5491will give us an easy downstream/stable release mechanism to avoid warnings on "new" deprecation messages introduced across SRU boundary. If we think 5491 will be reviewable for milestone 24.2 today/tomorrow, I'll spend some time on that PR now and could provide some feedback here to get #5411 across the line. |
I think now that milestone 24.2 has been released. We can merge this pending your approval @holmanb |
Thanks @blackboxsw taking one last look at this now, I think that I'd like to see two minor changes before merge:
WDYT? |
|
On systemd, services are started by PID 1. When this doesn't happen, cloud-init is in an unknown run state and should warn the user. Reorder pid log to be able to reuse Distro information. Add docstring deprecating util.is_Linux().
…5209) Implement verify_clean_boot() to ignore certain expected logs in a platform-specific way.
6134d93
to
ee44a4e
Compare
…anonical#5209) Ensure ignore_warnings=True or ignore_errors=True is honored and not overridden by supplemental warning texts appended.
…l#5209) Avoid using warning level messages as there may be some use-cases in the wild that need to invoke cloud-init boot stages after boot for some reason unknown to upstream. Provide a detailed warning message informing admins to file issues against cloud-init to better represent those feature needs before dropping this feature altogether.
cloudinit/cmd/main.py
Outdated
"Unsupported configuration: boot stage called " | ||
f"by PID [{ppid}] outside of systemd" | ||
), | ||
deprecated_version="24.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 24.3 now
…5209) Implement verify_clean_boot() to ignore certain expected logs in a platform-specific way.
…anonical#5209) Ensure ignore_warnings=True or ignore_errors=True is honored and not overridden by supplemental warning texts appended.
…l#5209) Avoid using warning level messages as there may be some use-cases in the wild that need to invoke cloud-init boot stages after boot for some reason unknown to upstream. Provide a detailed warning message informing admins to file issues against cloud-init to better represent those feature needs before dropping this feature altogether.
ee44a4e
to
8805555
Compare
On systemd, services are started by PID 1. When this doesn't happen, cloud-init is in an unknown run state and should warn the user. Reorder pid log to be able to reuse Distro information. Add docstring deprecating util.is_Linux().
Implement verify_clean_boot() to ignore certain expected logs in a platform-specific way.
…5209) Ensure ignore_warnings=True or ignore_errors=True is honored and not overridden by supplemental warning texts appended.
On systemd, services are started by PID 1. When this doesn't happen, cloud-init is in an unknown run state and should warn the user. Reorder pid log to be able to reuse Distro information. Add docstring deprecating util.is_Linux().
…5209) Implement verify_clean_boot() to ignore certain expected logs in a platform-specific way.
…anonical#5209) Ensure ignore_warnings=True or ignore_errors=True is honored and not overridden by supplemental warning texts appended.
…l#5209) Avoid using warning level messages as there may be some use-cases in the wild that need to invoke cloud-init boot stages after boot for some reason unknown to upstream. Provide a detailed warning message informing admins to file issues against cloud-init to better represent those feature needs before dropping this feature altogether.
On systemd, services are started by PID 1. When this doesn't happen, cloud-init is in an unknown run state and should warn the user. Reorder pid log to be able to reuse Distro information. Add docstring deprecating util.is_Linux().
Implement verify_clean_boot() to ignore certain expected logs in a platform-specific way.
…5209) Ensure ignore_warnings=True or ignore_errors=True is honored and not overridden by supplemental warning texts appended.
Avoid using warning level messages as there may be some use-cases in the wild that need to invoke cloud-init boot stages after boot for some reason unknown to upstream. Provide a detailed warning message informing admins to file issues against cloud-init to better represent those feature needs before dropping this feature altogether.
Proposed Commit Message
Merge type