Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyYAML for formatting the rip .log file #384

Closed
wants to merge 1 commit into from

Conversation

Freso
Copy link
Member

@Freso Freso commented Mar 17, 2019

The .log file is intended to be in YAML, but until now we have "manually" formatted this output, which is prone to errors.

This new approach generates a dictionary containing the log information and then calls to PyYAML's YAML implementation to format it out to a string which can then be saved to a file.

Note that key order of dictionaries prior to Python 3.7 is non-deterministic (or not guaranteed to be) and thus may need further tweaking.
Also, PyYAML presently only supports YAML 1.1. It may be needed/wanted to switch to ruamel.yaml instead for YAML 1.2 support.

@Freso Freso added Status: in progress Issue/pull request which is currently being worked on Priority: low Low priority Improvement Minor improvement to code labels Mar 17, 2019
@Freso Freso self-assigned this Mar 17, 2019
@Freso Freso requested a review from MerlijnWajer March 17, 2019 01:19
@Freso Freso force-pushed the task/use-yaml-for-logging branch 2 times, most recently from 3d96749 to 40cf856 Compare March 17, 2019 03:28
The .log file is intended to be in YAML, but until now we have
"manually" formatted this output, which is prone to errors.

This new approach generates a dictionary containing the log information
and then calls to PyYAML's YAML implementation to format it out to a
string which can then be saved to a file.

Note that key order of dictionaries prior to Python 3.7 is
non-deterministic (or not guaranteed to be) and thus may need
further tweaking.
Also, PyYAML presently only supports YAML 1.1. It may be needed/wanted
to switch to ruamel.yaml[1] instead for YAML 1.2 support.

[1] https://yaml.readthedocs.io/

Signed-off-by: Frederik “Freso” S. Olesen <[email protected]>
@Freso Freso force-pushed the task/use-yaml-for-logging branch from 40cf856 to 515f110 Compare March 17, 2019 04:10
@Freso
Copy link
Member Author

Freso commented Mar 17, 2019

Sample .log output as of now:

CD metadata:
  CDDB Disc ID: 2d047f04
  MusicBrainz Disc ID: TX6lKZ481BHv1ZW6pd6007j6OY4-
  MusicBrainz lookup url: https://musicbrainz.org/cdtoc/attach?toc=1+4+86497+150+24687+47627+68002&tracks=4&id=TX6lKZ481BHv1ZW6pd6007j6OY4-
  Release: Ginuwine - Pony
Conclusive status report:
  AccurateRip summary: All tracks accurately ripped
  EOF: End of status report
  Health status: ''
Health status: No errors occurred
Log created by: whipper 0.7.4.dev52+g40cf856 (internal logger)
Log creation date: '2019-03-17T04:10:43Z'
Ripping phase information:
  CD-R detected: 'No'
  Defeat audio cache: 'Yes'
  Drive: TSSTcorpCDDVDW SN-208BB  (revision SB02)
  Extraction engine: cdparanoia III 10.2 libcdio 2.0.0 x86_64-pc-linux-gnu
  Gap detection: !!python/unicode 'cdrdao 1.2.4'
  Overread into lead-out: 'No'
  Read offset correction: 6
SHA-256 hash: 1C5D1F061CEFC9BE55495603C6F72447ED1A9D695A6AA7DBCEA69EBEF2926904
TOC:
  1:
    End sector: 24536
    Length: 05:27:12
    Start: 00:00:00
    Start sector: 0
  2:
    End sector: 47476
    Length: 05:05:65
    Start: 05:27:12
    Start sector: 24537
  3:
    End sector: 67851
    Length: 04:31:50
    Start: '10:33:02'
    Start sector: 47477
  4:
    End sector: 86346
    Length: 04:06:45
    Start: '15:04:52'
    Start sector: 67852
Tracks:
  1:
    AccurateRip v1:
      Confidence: 26
      Local CRC: BE390E39
      Remote CRC: BE390E39
      Result: Found, exact match
    AccurateRip v2:
      Confidence: 34
      Local CRC: 7A03BC55
      Remote CRC: 7A03BC55
      Result: Found, exact match
    Copy CRC: 255B4AFF
    Extraction quality: 100.00 %
    Extraction speed: 2.0 X
    Filename: !!python/unicode '/home/freso/tmp/Ginuwine - Pony (1996) [FLAC]/01.
      Ginuwine - Pony _album version_.flac'
    Peak level: '0.901611'
    Pre-emphasis: 'No'
    Status: Copy OK
    Test CRC: 255B4AFF
  2:
    AccurateRip v1:
      Confidence: 24
      Local CRC: 9E9AABC0
      Remote CRC: 9E9AABC0
      Result: Found, exact match
    AccurateRip v2:
      Confidence: 33
      Local CRC: '79008752'
      Remote CRC: '79008752'
      Result: Found, exact match
    Copy CRC: 59B050EA
    Extraction quality: 100.00 %
    Extraction speed: 2.2 X
    Filename: !!python/unicode '/home/freso/tmp/Ginuwine - Pony (1996) [FLAC]/02.
      Ginuwine - Pony _Ride It mix_.flac'
    Peak level: '0.988556'
    Pre-emphasis: 'No'
    Pre-gap length: 00:01:65
    Status: Copy OK
    Test CRC: 59B050EA
  3:
    AccurateRip v1:
      Confidence: 24
      Local CRC: E6B184BE
      Remote CRC: E6B184BE
      Result: Found, exact match
    AccurateRip v2:
      Confidence: 34
      Local CRC: B1D6AEFE
      Remote CRC: B1D6AEFE
      Result: Found, exact match
    Copy CRC: 3ECA8CF5
    Extraction quality: 100.00 %
    Extraction speed: 2.4 X
    Filename: !!python/unicode '/home/freso/tmp/Ginuwine - Pony (1996) [FLAC]/03.
      Ginuwine - Pony _Black Market radio mix_.flac'
    Peak level: '0.988556'
    Pre-emphasis: 'No'
    Pre-gap length: 00:01:07
    Status: Copy OK
    Test CRC: 3ECA8CF5
  4:
    AccurateRip v1:
      Confidence: 24
      Local CRC: D44CB1C3
      Remote CRC: D44CB1C3
      Result: Found, exact match
    AccurateRip v2:
      Confidence: 33
      Local CRC: 796B0878
      Remote CRC: 796B0878
      Result: Found, exact match
    Copy CRC: 979CAC27
    Extraction quality: 100.00 %
    Extraction speed: 2.7 X
    Filename: !!python/unicode '/home/freso/tmp/Ginuwine - Pony (1996) [FLAC]/04.
      Ginuwine - Hello.flac'
    Peak level: '0.966064'
    Pre-emphasis: 'No'
    Pre-gap length: 00:01:05
    Status: Copy OK
    Test CRC: 979CAC27

It's a bit scrambled and not very human-readable… and not nice with the !!python/unicodes in there either. But I feel this is a good first step. It's working and producing an actual log at least. I'll try and clean it up more tomorrow.

@Freso
Copy link
Member Author

Freso commented Mar 17, 2019

I'm thinking of replacing the "Yes"/"No" answers with proper booleans (ie., passing True/False instead of strings) which will change the YAML output to true/false. As of YAML 1.2 (which PyYAML doesn't support currently, so not what is being used in the PR… currently) using "Yes"/"No" (or "On"/"Off") is no longer supported and only true/false is an acceptable boolean. I feel like this is a change that should be done before 1.0.0 (if we want to do it) and that this PR is a good place to do it.

@MerlijnWajer
Copy link
Collaborator

True/False sounds fine. We want to get rid of the !!python stuff for sure though.

@itismadness
Copy link
Contributor

itismadness commented Jul 10, 2019

What is the status of this PR and is there something I could help with here? With using whipper log checking more out in the wild, I have seen some instances of invalid logs being generated that I cannot then parse. An example is that

lines.append(" Release: %s - %s" %
(ripResult.artist, ripResult.title))

can output a broken yaml if the artist or album contains a : sequence (e.g. Myst: The Soundtrack). Could submit a patch to correct these manually through string replaces, but it seems like this route is infinitely more preferable.

@Freso
Copy link
Member Author

Freso commented Jul 12, 2019

Well, I was trying to stick to PyYAML with this PR, but we recently made a change to the log format that is YAML 1.2 specific, which PyYAML does not support, so I may try and redo this using ruamel.yaml. I think the status of this PR is pretty much what the prior comments say, apart from the recent change in the log output (which is pretty much this comment anyway).

The main issue here is that supported versions of Python do not guarantee order of dictionaries which creates a "messy" .log file compared to handmade ones.

About the Release key sometimes resulting in broken YAML, please open a new, separate issue for that.

@itismadness
Copy link
Contributor

You'd only really need to switch from PyYAML to ruamel.yaml if you were loading stuff, but for dumping, you'll get the same value for outputting a booleans or None (the only difference really being that PyYAML outputs with a newline at the end and ruamel.yaml doesn't):

>>> import yaml
>>> from ruamel.yaml import YAML
>>> import sys
>>> r_yaml = YAML(typ='safe')
>>> yaml.safe_dump({'a': True, 'b': False, 'c': None})
'a: true\nb: false\nc: null\n'
>>> r_yaml.dump({'a': True, 'b': False, 'c': None}, sys.stdout)
{a: true, b: false, c: null}

With regards to this point:

The main issue here is that supported versions of Python do not guarantee order of dictionaries which creates a "messy" .log file compared to handmade ones.

Why not use OrderedDict which is in Python 2.7+. It ends up being a bit more verbose for writing, but gives a guaranteed order for the output (which I'd argue is mandatory for these logs so that they're human readable), using nested OrderedDicts as necessary:

from collections import OrderedDict
# ... implementation details

        riplog = OrderedDict()
        riplog["Log created by"] = logger_name
        riplog["Log creation date"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(epoch)).strip(),
        riplog["Ripping phase information"] = OrderedDict()
        riplog["Ripping phase information"]["Drive"] = drive
        riplog["Ripping phase information"]["Extraction engine"] = engine
        riplog["Ripping phase information"]["Defeat audio cache"] = cache_defeat
        riplog["Ripping phase information"]["Read offset correction"] = ripResult.offset
        riplog["Ripping phase information"]["Overread into lead-out"] = overread
        # Fully works only using patched cdparanoia package
        # "Fill up missing offset samples with silence": "Yes"
        riplog["Ripping phase information"]["Gap detection"] = "cdrdao %s" % ripResult.cdrdaoVersion
        riplog["Ripping phase information"]["CD-R detected"] = ripResult.isCdr is True
        # ... rest of the log information

and then just mandate PyYAML > 5.1 which you can then easily do:

yaml.dump(data, default_flow_style=False, sort_keys=False)

or if you wish to support older versions, then you'll have to do something like https://stackoverflow.com/a/16782282. ruamel.yaml I think has natively supported OrderedDict from it's initial release.

@JoeLametta
Copy link
Collaborator

Supplanted by #415?

@JoeLametta JoeLametta closed this Oct 21, 2019
@JoeLametta JoeLametta deleted the task/use-yaml-for-logging branch December 19, 2019 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Minor improvement to code Priority: low Low priority Status: in progress Issue/pull request which is currently being worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants