-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More extensible syntax mapping mechanism #2755
Conversation
@keith-hall would you please take a look at |
I don't see any problems with this approach 👍 |
I'll leave it here for a week or two for people to take a look. If there are no major objections I will go ahead with the implementation. |
I like the idea in general. My main concern is startup time, but in the linked issues you are talking about doing clever stuff at compile time, which has the potential to even improve startup time. I don't think we can at this point "we promise to include your solution in bat", but I am very interested in looking at a prototype and evaluating that, then we can decide on the next steps. I hope am not sounding discouraging! Would be great if we could solve syntax mapping more elegantly. |
I was investigating the possibility of doing this with proc macros, because I suspected that would make the code somewhat more elegant. In conclusion I decided that using a build script is probably still the way to go (at least for the moment). This comment is just to document my research; I will add more info if they are worthy of documenting.
|
5c863c1
to
f8349a8
Compare
Rebased onto #2756. |
34567fe
to
e856d7f
Compare
Okay, I have an initial prototype of the codegen part working. Right now it makes something like this: // I formatted it manually to make it easier for the eye
static STATIC_RULES: [(&str, MappingTarget); 6] = [
(
r#"${XDG_CONFIG_HOME}/git/config"#,
MappingTarget::MapTo(r#"Git Config"#),
),
(
r#"${HOME}/.config/git/config"#,
MappingTarget::MapTo(r#"Git Config"#),
),
(
r#"${XDG_CONFIG_HOME}/git/ignore"#,
MappingTarget::MapTo(r#"Git Ignore"#),
),
(
r#"${HOME}/.config/git/ignore"#,
MappingTarget::MapTo(r#"Git Ignore"#),
),
(
r#"${XDG_CONFIG_HOME}/git/attributes"#,
MappingTarget::MapTo(r#"Git Attributes"#),
),
(
r#"${HOME}/.config/git/attributes"#,
MappingTarget::MapTo(r#"Git Attributes"#),
),
]; I'm thinking, I can actually take this one step further by moving the string searches to compile time. So instead of storing a string as the matcher, we store something like this: enum CompileTimeMatcher {
Static(&'static str),
Dynamic(Vec<MatcherSegment>),
}
// ... where
enum MatcherSegment {
Text(&'static str),
Env(&'static str),
} I assume this is faster, but the benefit might be minimal. |
Okay, I think I have a decent implementation of the aforementioned idea. This is what the generated code currently looks like: // formatted
static STATIC_RULES: [(BuiltinMatcher, MappingTarget); 7] = [
(
BuiltinMatcher::Dynamic(Lazy::new(|| {
join_segments(&[
MatcherSegment::Env(r#"XDG_CONFIG_HOME"#),
MatcherSegment::Text(r#"/git/config"#),
])
})),
MappingTarget::MapTo(r#"Git Config"#),
),
(
BuiltinMatcher::Dynamic(Lazy::new(|| {
join_segments(&[
MatcherSegment::Env(r#"HOME"#),
MatcherSegment::Text(r#"/.config/git/config"#),
])
})),
MappingTarget::MapTo(r#"Git Config"#),
),
(
BuiltinMatcher::Dynamic(Lazy::new(|| {
join_segments(&[
MatcherSegment::Env(r#"XDG_CONFIG_HOME"#),
MatcherSegment::Text(r#"/git/ignore"#),
])
})),
MappingTarget::MapTo(r#"Git Ignore"#),
),
(
BuiltinMatcher::Dynamic(Lazy::new(|| {
join_segments(&[
MatcherSegment::Env(r#"HOME"#),
MatcherSegment::Text(r#"/.config/git/ignore"#),
])
})),
MappingTarget::MapTo(r#"Git Ignore"#),
),
(
BuiltinMatcher::Dynamic(Lazy::new(|| {
join_segments(&[
MatcherSegment::Env(r#"XDG_CONFIG_HOME"#),
MatcherSegment::Text(r#"/git/attributes"#),
])
})),
MappingTarget::MapTo(r#"Git Attributes"#),
),
(
BuiltinMatcher::Dynamic(Lazy::new(|| {
join_segments(&[
MatcherSegment::Env(r#"HOME"#),
MatcherSegment::Text(r#"/.config/git/attributes"#),
])
})),
MappingTarget::MapTo(r#"Git Attributes"#),
),
(
BuiltinMatcher::Fixed(r#"*.conf"#),
MappingTarget::MapExtensionToUnknown,
),
]; The downside is that this uses a couple of new types and functions, which doesn't feel quite right wherever I put them:
Ended up going with option 1. I guess it's just a minor nuisance at the end of the day. It's worth it if there are tangible performance improvements. |
So I fat-fingered tab in the middle of typing, right before I hit space... And that closed the PR 🤦. What a UI design disaster. |
I previously said I'll leave it for a week or two, but I had time today so I thought I'd just do it. Maintainers, please don't hesitate to ask for big changes if you think they are warranted. |
3f0a772
to
cbe755f
Compare
Rebased onto |
I skimmed over the code and it looks like you know what you are doing. Nothing jumped out at me as a red flag. I didn't look very close at the code though. I don't want to risk wasting your time, but how much work would it be to port all of the existing syntax mappings to the new system? So that we can see how it impacts performance. Or maybe you can partly port it? So that half of syntax mappings use new and half use old. Maybe that is enough to git a sense for how this impacts performance. To check performance, do this (approximate instructions):
|
Thanks for the compliment. I'm not ready to start porting just yet - currently the codegen is working, but the As of the porting, there's not actually that much to port. Maybe 20 files (which can be done in like 30 minutes). I'll run the benchmark after it's all done. |
Impl is all done. Tests are failing because mappings haven't been ported. Will do said porting tomorrow. |
7127f0e
to
fc8eb15
Compare
I ran the benchmark like so (using patch in #2768): cd "$(git rev-parse --show-toplevel)/tests/benchmarks"
rm -rf results
mkdir -p results/{pre,post}
git checkout master
cargo build -r
RUN_COUNT=50 ./run-benchmarks.sh --release
cp benchmark-results/report.md results/pre/
git checkout syntax-mapping-refactor
cargo build -r
RUN_COUNT=50 ./run-benchmarks.sh --release
cp benchmark-results/report.md results/post/
bat results/*/*.md Current
|
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bat |
3.9 ± 0.9 | 2.4 | 7.8 | 1.00 |
bat … small-CpuInfo-file.cpuinfo |
20.4 ± 1.7 | 18.6 | 28.4 | 1.00 |
bat … small-Markdown-file.md |
25.4 ± 1.8 | 23.5 | 34.1 | 1.00 |
bat … --language=txt numpy_test_multiarray.py |
5.4 ± 2.4 | 3.7 | 43.3 | 1.00 |
bat … grep-output-ansi-sequences.txt |
38.6 ± 1.8 | 36.3 | 44.0 | 1.00 |
bat … jquery.js |
520.4 ± 5.8 | 514.5 | 533.6 | 1.00 |
bat … miniz.c |
50.1 ± 1.7 | 48.1 | 56.0 | 1.00 |
bat … numpy_test_multiarray.py |
688.9 ± 5.0 | 683.3 | 697.1 | 1.00 |
bat … grep-output-ansi-sequences.txt |
36.1 ± 2.3 | 33.3 | 45.2 | 1.00 |
bat … jquery.js |
515.6 ± 5.0 | 511.1 | 527.1 | 1.00 |
bat … miniz.c |
49.6 ± 2.3 | 47.6 | 59.6 | 1.00 |
bat … numpy_test_multiarray.py |
688.0 ± 6.1 | 680.7 | 697.3 | 1.00 |
bat … --language=txt *.txt |
4.6 ± 1.0 | 3.3 | 8.8 | 1.00 |
syntax-mapping-refactor
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
bat |
2.2 ± 0.7 | 1.1 | 5.6 | 1.00 |
bat … small-CpuInfo-file.cpuinfo |
20.3 ± 1.6 | 18.6 | 28.2 | 1.00 |
bat … small-Markdown-file.md |
25.9 ± 1.6 | 24.3 | 31.7 | 1.00 |
bat … --language=txt numpy_test_multiarray.py |
3.6 ± 1.1 | 2.2 | 8.3 | 1.00 |
bat … grep-output-ansi-sequences.txt |
38.3 ± 1.4 | 36.6 | 45.6 | 1.00 |
bat … jquery.js |
522.3 ± 4.7 | 517.0 | 532.5 | 1.00 |
bat … miniz.c |
48.9 ± 2.5 | 46.4 | 59.8 | 1.00 |
bat … numpy_test_multiarray.py |
690.9 ± 4.2 | 685.4 | 700.7 | 1.00 |
bat … grep-output-ansi-sequences.txt |
36.4 ± 1.8 | 34.1 | 42.3 | 1.00 |
bat … jquery.js |
517.1 ± 5.6 | 513.0 | 532.4 | 1.00 |
bat … miniz.c |
48.8 ± 1.8 | 46.1 | 53.6 | 1.00 |
bat … numpy_test_multiarray.py |
689.0 ± 3.7 | 683.8 | 694.9 | 1.00 |
bat … --language=txt *.txt |
3.1 ± 1.1 | 1.7 | 7.8 | 1.00 |
Unsurprisingly, the simpler the task, the bigger the speedup. For heavier tasks there is barely a difference.
One thing I was somewhat concerned about was a possible increase of the binary size due to the pervasive use of cd "$(git rev-parse --show-toplevel)"
rm -rf res
mkdir -p res/{pre,post}
git checkout master
cargo build -r
cp target/release/bat res/pre/
git checkout syntax-mapping-refactor
cargo build -r
cp target/release/bat res/post/
du --summarize --bytes res/*/bat
Eh. 20KB difference seems fine. |
This is what the codegen looks like right now (Linux). I think this is pretty much what we will end up with: /// Generated by build script from /src/syntax_mapping/builtins/.
pub(crate) static BUILTIN_MAPPINGS: [(Lazy<Option<GlobMatcher>>, MappingTarget); 56] = [
(Lazy::new(|| Some(build_matcher_fixed(r#"httpd.conf"#))), MappingTarget::MapTo(r#"Apache Conf"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/apache2/**/*.conf"#))), MappingTarget::MapTo(r#"Apache Conf"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/apache2/sites-*/**/*"#))), MappingTarget::MapTo(r#"Apache Conf"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/bat/config"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"Containerfile"#))), MappingTarget::MapTo(r#"Dockerfile"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.h"#))), MappingTarget::MapTo(r#"C++"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#".clang-format"#))), MappingTarget::MapTo(r#"YAML"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.fs"#))), MappingTarget::MapTo(r#"F#"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"fish_history"#))), MappingTarget::MapTo(r#"YAML"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"XDG_CONFIG_HOME"#), MatcherSegment::Text(r#"/git/config"#)])), MappingTarget::MapTo(r#"Git Config"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"HOME"#), MatcherSegment::Text(r#"/.config/git/config"#)])), MappingTarget::MapTo(r#"Git Config"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"XDG_CONFIG_HOME"#), MatcherSegment::Text(r#"/git/ignore"#)])), MappingTarget::MapTo(r#"Git Ignore"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"HOME"#), MatcherSegment::Text(r#"/.config/git/ignore"#)])), MappingTarget::MapTo(r#"Git Ignore"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"XDG_CONFIG_HOME"#), MatcherSegment::Text(r#"/git/attributes"#)])), MappingTarget::MapTo(r#"Git Attributes"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"HOME"#), MatcherSegment::Text(r#"/.config/git/attributes"#)])), MappingTarget::MapTo(r#"Git Attributes"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.jsonl"#))), MappingTarget::MapTo(r#"JSON"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.ksh"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/var/spool/mail/*"#))), MappingTarget::MapTo(r#"Email"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/var/mail/*"#))), MappingTarget::MapTo(r#"Email"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"nginx.conf"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"mime.types"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/nginx/**/*.conf"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/nginx/sites-*/**/*"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.nse"#))), MappingTarget::MapTo(r#"Lua"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/os-release"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/usr/lib/os-release"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/initrd-release"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/usr/lib/extension-release.d/extension-release.*"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/usr/share/libalpm/hooks/*.hook"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/pacman.d/hooks/*.hook"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.pac"#))), MappingTarget::MapTo(r#"JavaScript (Babel)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.ron"#))), MappingTarget::MapTo(r#"Rust"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.sarif"#))), MappingTarget::MapTo(r#"JSON"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/profile"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/.ssh/config"#))), MappingTarget::MapTo(r#"SSH Config"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/systemd/**/*.conf"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/systemd/**/*.example"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.automount"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.device"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.dnssd"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.link"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.mount"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.netdev"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.network"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.nspawn"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.path"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.service"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.scope"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.slice"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.socket"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.swap"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.target"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.timer"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.conf"#))), MappingTarget::MapExtensionToUnknown),
(Lazy::new(|| Some(build_matcher_fixed(r#"build"#))), MappingTarget::MapToUnknown),
(Lazy::new(|| Some(build_matcher_fixed(r#"rails"#))), MappingTarget::MapToUnknown)
]; |
Maintainers, can you please take a look at my porting? I've classified the rules according to their applicable platforms; please help check if all mappings are in their correct subdirs. Thanks. |
d90cdb2
to
04ce25c
Compare
Will close #2747.
Checklist