You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently hgrep supports only UTF-8 texts. This means that hgrep tries to print UTF-16 texts as if they are encoded in UTF-8, resulting in a quite broken output.
ripgrep supports UTF-16 by --encoding option so technically hgrep can support it too. ripgrep transcodes UTF-16 to UTF-8 on memory removing BOM using encoding_rs_io::DecodeReaderBytesBuilder. It means that ripgrep reports byte offsets for matched regions in transcoded UTF-8 text.
hgrep can read matched file transcoding UTF-16 to UTF-8 as well. Currently hgrep reads file contents as-is. --encoding (-E) option can be added by reading files through the encoding_rs encoders.
Currently hgrep supports only UTF-8 texts. This means that hgrep tries to print UTF-16 texts as if they are encoded in UTF-8, resulting in a quite broken output.
ripgrep supports UTF-16 by
--encoding
option so technically hgrep can support it too. ripgrep transcodes UTF-16 to UTF-8 on memory removing BOM usingencoding_rs_io::DecodeReaderBytesBuilder
. It means that ripgrep reports byte offsets for matched regions in transcoded UTF-8 text.https://github.com/BurntSushi/ripgrep/blob/d922b7ac114c24d6800ae5f79d2967481f380c83/crates/searcher/src/searcher/mod.rs#L720-L744
hgrep can read matched file transcoding UTF-16 to UTF-8 as well. Currently hgrep reads file contents as-is.
--encoding
(-E
) option can be added by reading files through theencoding_rs
encoders.hgrep/src/chunk.rs
Lines 220 to 223 in 6f49cb0
Encoding::for_bom
and transcode input to UTF-8--encode
option which accepts encoding labelsEncoding::for_label
The text was updated successfully, but these errors were encountered: