Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jpeg EXIF comment missing last character #173

Closed
rustygreen opened this issue Jun 20, 2019 · 2 comments · Fixed by #277
Closed

Jpeg EXIF comment missing last character #173

rustygreen opened this issue Jun 20, 2019 · 2 comments · Fixed by #277
Labels
bug format-exif help wanted image-queue Actionable issue with sample image

Comments

@rustygreen
Copy link

I have a JPG image with an EXIF UserComment and the last character in the comments shows up as a unicode block character, rather than the actual value "1":
image

Here is the image that causes the issue:
0054012NNN01

@drewnoakes drewnoakes added format-exif help wanted image-queue Actionable issue with sample image bug labels Jun 21, 2019
@kwhopper
Copy link
Collaborator

Difficult to describe, but there are two things going on here:

  • ExifDescriptorBase.GetUserCommentDescription is checking 10 bytes in the encodingName and throwing away extra nulls and spaces out of those 10, which is not correct. ExifTool always assumes this type of comment header is 8 bytes in length and has conditionals for nulls and spaces to choose the encoder. Everything in the byte array after those 8 to the end is part of the comment text.
  • In this case the comment bytes are stored in big endian, but no BOM is present (usual for many byte arrays) to help make an encoder decision.

What happens here is ignoring \0 up to the first 10 ends up removing the first '0' byte that's actually part of the text. Unicode (LE) decoding then does work since it is fed (apparently) LE bytes, but the bytes then end one byte short to make a correct LE character.

I get the feeling this 10 byte thing was done to fix other comments stored in BE. The only real way to fix this is: 1) look at only the first 8 bytes; followed by 2) guess the byte order.

@kwhopper
Copy link
Collaborator

Another option is to correct the 10 bytes vs. 8 bytes issue, but then store the header and comment bytes separately in the directory and NOT try to guess the encoding or byte order -- at least not try very hard.

Users would then have access to the actual comment bytes that matter and could try other encodings or byte orders as they saw fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug format-exif help wanted image-queue Actionable issue with sample image
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants