Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip preview for binary files #62

Merged
merged 1 commit into from
Jan 20, 2025
Merged

Skip preview for binary files #62

merged 1 commit into from
Jan 20, 2025

Conversation

mmrwoods
Copy link
Collaborator

@mmrwoods mmrwoods commented Jan 19, 2025

Previewing binary files as text can mess up the screen, characters can
get rendered outside of the preview window which are not later cleared
when the popups are closed. It also looks pretty awful.

This adds a simple check for a NUL byte in the first 128 bytes, which is
fast and should be enough to detect a binary file. This is similar to
how ripgrep and other grep programs typically detect a binary, though
they may read the entire file. The 128 bytes might seem arbitrary, but
it comes from Leaderf, added in September 2023 and unchanged since then,
so presumably works well enough. Telescope.nvim does things differently,
actually running the file program to detect a mime type, but I wanted to
avoid that, at least until trying the simpler NUL byte check.

Note: as this code reads the first 128 bytes in binary mode and does no
encoding conversion, it will generate false positives for file encodings
that allow NUL bytes, like UTF-16 and UTF-32. This is IMHO an acceptable
trade-off to keep the code simple, Leaderf also detects encodings that
allow NUL bytes as binary, Telescope.nvim renders them as UTF-8 (the
default encoding in Neovim), so with a load of NUL byte marker garbage.
Rendering them in the Fuzzyy preview window would also include NUL byte
marker garbage as the file is read raw, without any encoding conversion.

Also note that this does not currently apply to the grep preview because
all the supported grep programs are configured to skip binaries anyway.

Before
before

After
after

@mmrwoods mmrwoods force-pushed the no-preview-binary branch 6 times, most recently from 9a24f0d to c97ecab Compare January 19, 2025 17:08
@mmrwoods mmrwoods changed the title WIP: Skip preview of binary files Skip preview of binary files Jan 19, 2025
@mmrwoods mmrwoods changed the title Skip preview of binary files Skip preview for binary files Jan 19, 2025
@mmrwoods mmrwoods requested a review from Donaldttt January 19, 2025 17:11
@mmrwoods mmrwoods changed the title Skip preview for binary files WIP: Skip preview for binary files Jan 19, 2025
@mmrwoods
Copy link
Collaborator Author

mmrwoods commented Jan 19, 2025

Still WIP, going to revisit and add checks for UTF-16 and UTF-32 BOMs, that should be good enough (Vim relies on a BOM to detect the encoding and convert to UTF-8, so files without a BOM aren't rendered correctly in Vim anyway)

Did some testing with Leaderf, Telescope and the Fuzzyy preview window... Leaderf and Telescope have similar problems with UTF-16 and UTF-32 encoded files, even with BOMs, though they deal with them slightly differently (Leaderf detects as binary, Telescope tries to render, but as UTF-8, so with a load of NUL byte marker garbage). The Fuzzyy preview window works the same way as Leaderf and Telescope, reading files raw and populating the buffer, so has the same issue trying to render files encoded in a different encoding to the default encoding (before adding the binary check, it will include the NUL byte marker garbage, just as Telescope does, after it will just show the binary file warning). Trying to render files with NUL types as UTF-8 also seems to be very slow, so I think the binary file warning is far better.

@mmrwoods mmrwoods force-pushed the no-preview-binary branch 3 times, most recently from 8ed575c to 169ae8a Compare January 20, 2025 07:43
@mmrwoods mmrwoods changed the title WIP: Skip preview for binary files Skip preview for binary files Jan 20, 2025
@mmrwoods
Copy link
Collaborator Author

Some screenshots showing how this changes the preview of text files encoded with UTF-16 (UTF-32 will be similar). Before the change the preview is littered with NUL byte markers, i.e. ^@. After the change a UTF-16 file is detected as binary.

Before
Screenshot 2025-01-20 at 08 13 36

After
Screenshot 2025-01-20 at 08 14 18

Previewing binary files as text can mess up the screen, characters can
get rendered outside of the preview window which are not later cleared
when the popups are closed. It also looks pretty awful.

This adds a simple check for a NUL byte in the first 128 bytes, which is
fast and should be enough to detect a binary file. This is similar to
how ripgrep and other programs typically detect a binary, though they
may read the entire file. The 128 bytes might seem arbitrary, but it
comes from Leaderf, added in September 2023 and unchanged since then, so
presumably works well enough. Telescope.nvim does things differently,
actually running the file program to detect a mime type, but I wanted to
avoid that, at least until trying the simpler NUL byte check.

Note: as this code reads the first 128 bytes in binary mode and does no
encoding conversion, it will generate false positives for file encodings
that allow NUL bytes, like UTF-16 and UTF-32. This is IMHO an acceptable
trade-off to keep the code simple, Leaderf also detects encodings that
allow NUL bytes as binary, Telescope.nvim renders them as UTF-8, so with
a load of NUL byte marker garbage, git even considers them to be binary.
Rendering them in the Fuzzyy preview window would also include NUL byte
marker garbage as the file is read raw, without any encoding conversion.

Also note that this does not currently apply to the grep preview because
all the supported grep programs are configured to skip binaries anyway.
@mmrwoods mmrwoods merged commit 21759db into main Jan 20, 2025
@mmrwoods mmrwoods deleted the no-preview-binary branch January 20, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants