Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert from print() to logging #233

Merged
merged 29 commits into from
Oct 17, 2015
Merged

Conversation

anarcat
Copy link
Contributor

@anarcat anarcat commented Oct 1, 2015

the logging level used varies: most is logging.info(), in some place
logging.warning() or logging.error() are used when the condition is
clearly an error or warning. in other cases, we keep using print, but
force writing to sys.stderr, unless we interact with the user.

there were 77 calls to print before this commit, now there are 7, most
of which in the archiver module, which interacts directly with the
user. in one case there, we still use print() only because logging is
not setup properly yet during argument parsing.

it could be argued that commands like info or list should use print
directly, but we have converted them anyways, without ill effects on
the unit tests

unit tests still use print() in some places

this switches all informational output to stderr, which should help
with, if not fix jborg/attic#312 directly

note that this does not add support for syslog or logfiles yet,
although that would be fairly trivial to add now.

i think that verbose should be an integer and that adding more -v flags would increase debugging information available. but this keeps the current behavior, which we can change easily later, if we choose to start using logging.debug() calls.

this will help when we want to refactor output functions

this function should definitely be replaced by a __repr__() or
__str__() however
the logging level varies: most is logging.info(), in some place
logging.warning() or logging.error() are used when the condition is
clearly an error or warning. in other cases, we keep using print, but
force writing to sys.stderr, unless we interact with the user.

there were 77 calls to print before this commit, now there are 7, most
of which in the archiver module, which interacts directly with the
user. in one case there, we still use print() only because logging is
not setup properly yet during argument parsing.

it could be argued that commands like info or list should use print
directly, but we have converted them anyways, without ill effects on
the unit tests

unit tests still use print() in some places

this switches all informational output to stderr, which should help
with, if not fix jborg/attic#312 directly
@anarcat anarcat mentioned this pull request Oct 1, 2015
@@ -650,7 +652,7 @@ def init_chunks(self):
def report_progress(self, msg, error=False):
if error:
self.error_found = True
print(msg, file=sys.stderr if error else sys.stdout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use a logger and use log.error/log.warn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that exactly what that patch does?

@codecov-io
Copy link

Current coverage is 83.36%

Merging #233 into master will increase coverage by +0.23% as of dbc183f

@@            master    #233   diff @@
======================================
  Files           31      33     +2
  Stmts         6748    6829    +81
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           5610    5693    +83
  Partial          0       0       
+ Missed        1138    1136     -2

Review entire Coverage Diff as of dbc183f


Uncovered Suggestions

  1. +0.62% via borg/fuse.py#126...167
  2. +0.50% via borg/xattr.py#199...232
  3. +0.42% via borg/xattr.py#166...194
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

@@ -143,13 +144,14 @@ def update(self, size, csize, unique):

def print_(self, label, cache):
total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks = cache.chunks.summarize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like at that place there should be newstyle string formating and a namedtuple return value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't change that code, i'm trying to keep from doing unrelated changes here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do that later.

@RonnyPfannschmidt
Copy link
Contributor

blindly using the root logger is problematic, instead each module should have own loggers to give control over the granularity

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

i am still unclear what to do about --stats, it could be that the argument is redundant and we should just show stats if --verbose. right now, i have kept them shown...

also, the test coverage suffers a bit from the patch, not sure how to fix that...

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

blindly using the root logger is problematic, instead each module should have own loggers to give control over the granularity

well, it's a first pass... of course each "module" could use its own logger, but really, do we have modules?

we use logging.warning in info and list, but print() more usage bits.

we also now support logging.debug() and by default are more silent
@ThomasWaldmann
Copy link
Member

@anarcat modules like python modules. it is useful if one want rather lots of debug logging, but not from every module, but just some specific one which is being currently debugged.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

true. i'm not sure how to architecture that, suggestions? (by "that" i mean per module loggers)

@ThomasWaldmann
Copy link
Member

About abuse of log levels:

steering verbosity ONLY via log level (in the sense of ERROR, WARNING, INFO, DEBUG) seems wrong, especially if it needs hacking the log level in a unnatural way. logging.warning should be only used if it is actually a warning that you're logging, not to make something visible that might not be visible because you reduced output by setting the minimum level to warning.

So I propose the following:

Use logging.error ONLY for errors, logging.warning ONLY for warning, logging.debug ONLY for debugging.

Informative output should be logged via logging.info.

This change should be implemented within this pull request.

Additionally, we could later introduce one of the alternatives:

a) There could be an additional "verbosity level" that controls how much informative output one gets (which could be controlled via verbose being an integer created either from -v0..9 or -v, -vv, -vvv - whatever is easiest with the argument parser).

b) There could be additional "output options" that control which kinds of informative outputs are emitted.
E.g. whether to show stats or not, whether to show this or that or not.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

so one of the things i'm trying to do here is to silence borg by default. that means that logging.info() is not shown by default. the "rule of silence". that means that borg info doesn't work by default. that doesn't make sense.

so i agree with your proposal, but i would extend that things that need to talk to the user by default (help, info, list, prompts) should use print(file=sys.stderr) instead of logging. this is in accordance with the logging documentation as well.

as for levels, i think that having three verbosity (silent, verbose, really verbose) makes sense and is a common occurance in software programs. i would like to stick with that. that would map to WARNING, INFO and DEBUG levels.

as for output options, maybe using logging modules could help discriminating what output we see? don't know.

@ThomasWaldmann
Copy link
Member

As borg info is just the user interactively requesting to see some information on the terminal, I agree it should be print to sys.stderr.

I had the same kind of decision problems print vs. logging at quite some places.

About verbosity levels: defining what appears on what level needs a lot of thinking and agreement (and not every user might agree with the set of things we show to appear on some level). Thus, defining named features, we can still define some predefined feature sets that will be the official sets for silent, terse, verbose. One could also give a custom set via cmdline then to get whatever one wants.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

i think we're in violent agreement. :)

@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

so i looked again at the logging modules stuff, and it seems fairly trivial to log to different loggers based on the module name. i don't know what you guys had in mind, but it seems this could be sufficient, from:

import logging

logging.info('hello world')

we would instead do:

import logging
logger = logging.getLogger(__name__)

logger.info('hello world')

it's a slightly longer import, but then we can have different logging levels for different parts easily with an INI-style configuration file, or some Python dict object, that can be generated from anything we want. I would advise against creating a complex hierarchy of commandline options for this: there is a neat config file format that is well documented, let's use that.

also: the current format is basically "%(message)s" so it doesn't show the log level or the module name. I would add the module name, but i would leave the log level out: it's really annoying to have stuff like:

INFO: hello world!
INFO: hello again!

principle of silence again...

@ThomasWaldmann
Copy link
Member

@anarcat per module logger is good to have, but not a replacement for logging feature sets.

I am using per module logging in moin since long, but it is not flexible enough.

And at least for borgweb, there must be the log level in the output. And I think it is generally useful, some users misinterpreted attic/borgs output in the past because it was not clear it was just a warning, not a (fatal) error.

that way we have one logger per module, and we can pick and choose
which module we want verbose, for example
the heuristics i used are the following:

 1. if we are prompting the use, use print on stderr (input() may
    produce some stuff on stdout, but it's outside the scope of this
    patch). we do not want those prompts to end up on the standard
    output in case we are piping stuff around

 2. if the command is primarily producing output for the user on the
    console (`list`, `info`, `help`), we simply print on the default
    file descriptor.

 3. everywhere else, we use the logging module with varying levels of
    verbosity, as appropriate.
we should find other ways of forcing that to be shown, it seems
@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

i've reverted parts of the print() to logging.warning() conversion. for stats, the heuristic i use is that when the command is primarly aimed at producing output interactively (info, list), we use print, on stdout. when the output is auxiliary (e.g. prune, create, etc), i use log.info() because we may want those to end up in logs instead of just on the console.

in prompts, we use print on stderr, just because i assume that we may be piping stuff and i don't want to clutter the output (a problem we don't have with list and info).

i have also pushed per-module loggers, which will give use crude, but at least some control on the output, per module.

it doesn't resolve the case of --stats. i am not sure how to deal with this: true, per modules controler are not enough for this, but what's the alternative?

maybe we are conflating two issues here... the way --stats is output can be treated separately from the logging conversion. in other words, we could keep throwing --stats on stderr and simply not log those at all. or we could check if the logging module will output to the console and avoid dupes...

not sure how to deal with that.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

note that the output changes with the new formatter, maybe that's not desired. before:

$ borg init /tmp/borg-test
Initializing repository at "/tmp/borg-test"
Encryption NOT enabled.
Use the "--encryption=repokey|keyfile|passphrase" to enable encryption.
Synchronizing chunks cache...
Archives: 0, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 0.
Done.

after:

$ borg init -v /tmp/borg-test3
borg.archiver: Initializing repository at "/tmp/borg-test3"
borg.key: Encryption NOT enabled.
Use the "--encryption=repokey|keyfile|passphrase" to enable encryption.
borg.cache: Synchronizing chunks cache...
borg.cache: Archives: 0, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 0.
borg.cache: Done.

yuck. i think i'll just revert this. :)

@ThomasWaldmann
Copy link
Member

maybe printing the module name is not needed.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

for --stats, i think that we could have topical flags that force console output. things like:

  • --progress - to output a progress bar (eventually?)
  • --changed - to output the file listing what -v currently outputs
  • --unchanged - to also include unchanged files :p (see do not show unchanged files #236 for that specific bikeshed)
  • --stats - to show stats in the end

we could have a standard way of handling that behavior. for example:

logging.info(stats)
if args.stats and not magically_figure_out_if_logging_wrote_to_stderr():
    print(stats, file=sys.stderr)

that is: print to the console if requested, but always log so that logfiles have the relevant info, and avoid duplicates. obviously, the above method is incomplete, as i have no idea how to avoid those duplicates.

one way to accomplish this would be through logging filters that allow censoring certain content. if we send only objects to the logger, the filter will see objects and can decide whether we need to censor it or not, depending on args or whatever. that would mean that each object shoved in there would need to have a proper __str__() representation, but that is something that seems necessary anyways.

then those --stats, --changed, and so on flags only become filters. this could look something like:

def map_args_to_cls(args):
    cls = []
    if args.stats:
        cls += [ThatsStatsClassName]
    # [...] maybe have some way of hooking in here?
    return cls

def filtertemplate(args):
    cls = map_args_to_cls(args)
    def filter(record):
        return type(record.msg) is in cls:
    return filter

def setup_logging(args, ...):
    # [...]
    # python 3.2+ specific, before it expects a class, not a single function
    stderr_stream_handler.addFilter(filtertemplate(args))
    # [...]

what do you think? notice how we filter only stderr, which allows us to log everything anyways.

this greatly simplifies the display of those objects, as the
__format__() parameter allows for arbitrary display of the internal
fields of both objects

this will allow us to display those summaries without having to pass a
label to the string representation. we can also print the objects
directly without formatting at all.
except AttributeError:
# somehow we failed to find our module
# return the logger module name by default
return __name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that fn is unusual, why do you need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to standardise logger creation somewhere. the function is necessary to create per-modules loggers, otherwise:

def create_logger():
     return logger.getLogger(__name__)

will always return a logger with the name 'logger' (because that's where the function is defined).

maybe overengineered.

assert io_logger.getvalue() == "borg.testsuite.logger: hello world 2\n"

def test_parent_module():
assert find_parent_module() == __name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this module is not necessary. we can expect logging stdlib module has own tests / has been tested.
we only need to test our stuff, not whether stdlib works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was necessary to test that the setup wasn't running twice and therefore adding mutliple stream handlers. i had seen such cases during development and was trying to catch it.

we also test that the module detection code is actually working - i had cases where find_parent_module() would work yet logging.info() would use a global logger instead of the module-specific one (i was naively doing a logger = logging.getLogger('') in logger.py)...

our default verbosity shows only warnings, we'd have to tweak tests to be verbose for this to work

This reverts commit 27be46a.
this was a remnant of when i was writing the converter/upgrader code,
and was destined to be a general progress message in the migration
process. i removed a more technical, internal debugging message in
exchange
Conflicts:
	borg/archive.py
	borg/archiver.py
	borg/cache.py
	borg/key.py

we use it to avoid errors and provide a more standard API.
"""
return logging.getLogger(name or find_parent_module())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit too little to rectify an own function + that exotic support code.

in each module, it could be just:

import logging
logger = logging.getLogger(__name__)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i don't know. that's how i originally did it, of course. then i switched in anarcat@42cc17c

the idea was that, as you mentionned, the logging module can be tricky to setup. we haven't had that case yet, but it could be that in some cases we'd want to make sure the logging is setup before creating loggers. having our own functions could enforce that, for example.

@anarcat anarcat self-assigned this Oct 9, 2015
@ThomasWaldmann
Copy link
Member

i suggest to either merge this to master or create a branch in borg repo so we can collaboratively work there. thanks for your effort!

@anarcat
Copy link
Contributor Author

anarcat commented Oct 13, 2015

i have pushed to logging-refactor in the borg repo for now. i'd like to shove more testing into this before merging it myself (this will happen soon!).

@ThomasWaldmann
Copy link
Member

ok, fine. :)

@anarcat
Copy link
Contributor Author

anarcat commented Oct 17, 2015

after further testing, i am satisfied with this, so I am merging it on master because a bunch of my other work depends on it.

we can improve on it later on, indeed, my verbosity branch does just that.

anarcat added a commit that referenced this pull request Oct 17, 2015
convert from print() to logging
@anarcat anarcat merged commit 128ff40 into borgbackup:master Oct 17, 2015
@anarcat anarcat deleted the logging-refactor branch October 21, 2015 13:34
@anarcat anarcat mentioned this pull request Nov 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exit gracefully when output pipe is broken
4 participants