Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for UTF-8 #42

Closed
MarvinJWendt opened this issue Jun 6, 2021 · 7 comments · Fixed by #45
Closed

Support for UTF-8 #42

MarvinJWendt opened this issue Jun 6, 2021 · 7 comments · Fixed by #45

Comments

@MarvinJWendt
Copy link

MarvinJWendt commented Jun 6, 2021

Hi, I noticed that some characters are not shown properly:

image

I guess it's the file encoding.

Repository to reproduce: https://github.com/pterm/pterm

@MarvinJWendt
Copy link
Author

This happens when I show the file with cat:
image

@princjef
Copy link
Owner

Thanks for the report! I found the file in the repo (linking here so i can quickly refer back): https://github.com/pterm/pterm/blob/9d43bad55166341b6349c3c93cc2e764051d9de6/bigtext_printer.go#L135

I'm betting it's an issue with the escaping logic. Let me work up a test case to see if I can find the problem here

@princjef
Copy link
Owner

I created some tests but they appear to be producing the output correctly. I tried cloning pterm and running gomarkdoc . on it. The output seems to be correct to me: https://gist.github.com/princjef/ba9265239132c477e438cabd55431186

Can you share some more information about how you're configuring/running gomarkdoc? That might help us get to the bottom of this.

@MarvinJWendt
Copy link
Author

Hi @princjef, thanks for the reply. I just tried it again, with a new clone of PTerm and I still get the wrong characters when I run gomarkdoc . > test.md.

I don't have any custom configuration.

IntelliJ IDEA tells me, that the output file is written in UTF-16LE encoding. I think, this might be the problem? I am using Windows 10 with PowerShell.

@princjef
Copy link
Owner

@MarvinJWendt ahh okay I think Powershell is the key detail here. I don't have a Windows machine handy to test, but it looks like when piping output, Powershell defaults the encoding to UTF16: https://automationpie.com/changing-powershells-default-output-encoding-to-utf-8/

From what I'm seeing, it doesn't look like there's any way for me to signal the encoding from the gomarkdoc executable itself to override that behavior. I think there are two options:

  1. Change the encoding in your Powershell settings as described in the linked page so that data is passed in UTF-8 encoding.
  2. Use the --output option so that gomarkdoc bypasses Powershell's piping and writes the file directly. This should provide the right encoding for your output file: gomarkdoc --output test.md .

I'd be curious to see if option 2 works for you. If it does, then it may make sense to update the examples to use --output so that people get more consistent cross-platform behavior out of the box.

@MarvinJWendt
Copy link
Author

MarvinJWendt commented Jun 22, 2021

Hi @princjef, sorry for the late reply. Yes the output flag seems to work!

And the pipe seems to be the problem. Even as a long time Windows user I never experienced this.
I wrote a sample file, with the chars that where producing the weird behaviour, and piped it into another file. And look at that, the error occurs. So it might be the best to update the example :)

Have a nice day and thanks for the research :)

@princjef
Copy link
Owner

No worries, thanks for checking it and reporting back!

I honestly wasn't aware of the Powershell encoding quirk either, so I'm glad you reported the strangeness. I'll get the docs updated so people don't have the same issue in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants