Replies: 1 comment 2 replies
-
Hi, I'm open for improvements of the encoding handling. I know that dealing with multiple encodings is not a strong point of XNEdit. The original NEdit was never designed for that and I took a practical approach and decided to use UTF-8 internally. Therefore any non-UTF8 file is internally converted. Adding a warning, when a file is saved that contains the U+FFFD unicode replacement character, sounds like a good idea for me and it is very easy to implement. However, encoding errors might not be the only problem. If the source encoding is guessed wrong, all characters might be converted without errors but in the text the characters are just not correct. Adding the option to not convert text to UTF-8 at all might be possible, but it would require some work. It would also have a small performance impact, because I need a unicode string for text rendering. |
Beta Was this translation helpful? Give feedback.
-
Hello.
First, thank you very much for making xnedit UTF-8 aware.
Now I was wondering if you could add a default behaviour that prevents files from (accidentally) being saved, when xnedit opened them in UTF-8 mode and encountered "unknown characters".
The problem is: when ALL characters in the 128 .. 255 range are replaced by a single UTF-8 character (probably U+FFFD) on reading, then information is lost. If the file is saved afterwards, there is no automated way to get this information back. - The only way is to manually look up every occurence of the U+FFFD symbol, read the text around this and type the correct UTF-8 character or ANSI character (depending on the ultimately desired file format).
In pre-existing German, French or whatever texts this can introduce thousands of errors and require (the good luck of noticing the problem immediately and having good backups or) days of rescue work or very bad surprises later on after, all caused by a single Ctrl-S.
The display of the charset selection / Reload buttons in the top right area is not sufficient to prevent this harmful side effect of UTF-8 use.
I experience this behaviour of UTF-8 applications as an extremely harmful approach. It is really annoying and gives me a tendency to fall back to ASCII only whereever I can :-( because after the "forced UTF-8 revolution", ASCII is now apparently the only character set which is both universally supported AND does not come with tools actively crippling (!) existing old files.
In my opinion, when an interpreter does not understand a character, it may display this problem - but it should never change the actual file content - especially not in a way that causes information loss and is therefore irreversible.
So if xnedit uses a (third party?) read-from-file approach that replaces unknown character codes by information loss on the fly - it would be helpful if it locked any file-save functionality when it came over the first "unknown character", until the user expressly states that he wants to have the information loss written back to the file. Even better would it be to not replace unknown characters at all in the edited data - but to use the U+FFFD character only for display purposes, but not to write them back to the file upon saving.
If you think these considerations might not be of interest for anybody else, I might also try to implement this myself. But I think that anybody handling a mixture of old and new might be happy about this feature. So thank you very much for considering my suggestions.
Thank you and kind regards, Joerg
Beta Was this translation helpful? Give feedback.
All reactions