Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give more helpful encoding exceptions in pack_sim #545

Merged
merged 3 commits into from
May 31, 2023

Conversation

mferrera
Copy link
Collaborator

@mferrera mferrera commented May 31, 2023

Resolves #537

  • Added pytest-xdist to parallelize the tests
  • Pinned urllib3<2, as urllib3>=2 is pulled in through the dependency tree and is incompatible with RGS machines
  • Gave a more helpful exception reason on utf-8 encoding errors. It roughly copies how ERT handles it but stops at the first bad character

A specific command to do the conversion could be given but it's probably better not to be responsible if it damages the file.

mferrera added 2 commits May 31, 2023 11:13
The `requests` package is pulled in through the dependency tree. As of
requests==2.30.0 it uses urllib3>=2 which is incompatible with the
version of OpenSSL found on RGS machines.
@mferrera mferrera requested review from alifbe and rnyb May 31, 2023 10:01
@codecov-commenter
Copy link

codecov-commenter commented May 31, 2023

Codecov Report

Merging #545 (46f08ba) into master (87a3dde) will increase coverage by 0.04%.
The diff coverage is 89.47%.

❗ Current head 46f08ba differs from pull request most recent head c9818c7. Consider uploading reports for the commit c9818c7 to get more accurate results

@@            Coverage Diff             @@
##           master     #545      +/-   ##
==========================================
+ Coverage   86.17%   86.21%   +0.04%     
==========================================
  Files          49       49              
  Lines        7025     7046      +21     
==========================================
+ Hits         6054     6075      +21     
  Misses        971      971              
Impacted Files Coverage Δ
src/subscript/pack_sim/pack_sim.py 90.10% <89.47%> (+0.82%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mferrera mferrera force-pushed the packsim-encoding branch from 4ae73c7 to 341f414 Compare May 31, 2023 10:43
Copy link
Collaborator

@alifbe alifbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mferrera very nice 👍

Do we need to check it as well for inspect_file and _md5checksum as well?

Also would it be nice/needed to print out the correct encoding to end-user? I see that komodo
has chardet library install which can guess the correct encoding format or alternatively file -ib

@mferrera mferrera force-pushed the packsim-encoding branch from 341f414 to c9818c7 Compare May 31, 2023 14:04
@mferrera
Copy link
Collaborator Author

I don't think we need to check in inspect_file or _md5checksum. The root .DATA file won't make it to inspect_file with an invalid character, but it is possible that INCLUDEd files have invalid characters. The script already checks for this but concludes that they are actually binary and just copies them without extra processing, so it won't fail (but maybe it should). I added another test case for this.

There are a number of possible improvements that could address possibly-unintentional logic like this in the script but maybe out of scope.

I'm not certain most users would gain a lot of actionable insight if we tell them the encoding. My guess is that the vast majority will probably find ISO-8859-1 a bit meaningless to them 😅 but if you think it's a good idea I'm happy to add it

@alifbe alifbe self-requested a review May 31, 2023 14:26
@mferrera mferrera force-pushed the packsim-encoding branch from c9818c7 to 6b59885 Compare May 31, 2023 14:28
Copy link
Collaborator

@alifbe alifbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mferrera looks good to me.

Regarding printing out correct encoding, I’m not sure either. I saw in slack channel that user can change/convert the file encoding by using iconv command and specify which encoding to convert from.

alifbe

This comment was marked as duplicate.

alifbe

This comment was marked as duplicate.

@mferrera mferrera merged commit 0c80ede into equinor:master May 31, 2023
@mferrera mferrera deleted the packsim-encoding branch May 31, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pack_sim issues with none UTF files
3 participants