Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line ending detection #224

Merged
merged 28 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3756c21
rebase on branch line_ending_detection
janhrastnik Jun 16, 2021
17f69a0
ran cargo clippy and cargo fmt
janhrastnik Jun 11, 2021
5eb6918
resolved conflict in rebase
janhrastnik Jun 16, 2021
9c419fe
added more changes from pr review for line_ending_detection
janhrastnik Jun 16, 2021
e4849f4
fix typo
janhrastnik Jun 13, 2021
a9a718c
added some tests and a line_ending helper function in document.rs
janhrastnik Jun 13, 2021
a4f5a01
trying out line ending helper functions in commands.rs
janhrastnik Jun 14, 2021
7cf0fa0
doc.line_ending() now returns &'static str
janhrastnik Jun 16, 2021
9c3eadb
fixed some problems from rebasing
janhrastnik Jun 16, 2021
8bccd6d
applied changes from pr review
janhrastnik Jun 17, 2021
ecb884d
added get_line_ending from pr comment
janhrastnik Jun 19, 2021
97323dc
ran cargo fmt
janhrastnik Jun 19, 2021
cdd9347
Merge remote-tracking branch 'origin/master' into line_ending_detection
janhrastnik Jun 19, 2021
1e80fbb
fix merge issue
janhrastnik Jun 19, 2021
701eb0d
changed some hardcoded newlines, removed a else if in line_ending.rs
janhrastnik Jun 19, 2021
8634e04
added the line_end helper function
janhrastnik Jun 20, 2021
5d22e3c
Misc fixes and clean up of line ending detect code.
cessen Jun 20, 2021
4efd671
Work on moving code over to LineEnding instead of assuming '\n'.
cessen Jun 20, 2021
e686c3e
Merge branch 'master' of github.com:helix-editor/helix into line_endi…
cessen Jun 20, 2021
3d3149e
Silence clippy warning.
cessen Jun 20, 2021
7140020
Don't need getters/setters for line_ending property.
cessen Jun 21, 2021
07e2880
Add function to get the line ending of a str slice.
cessen Jun 21, 2021
23d6188
Update `replace` command to use document line ending setting.
cessen Jun 21, 2021
e436c30
Make split_selection_on_newline command handle all line endings.
cessen Jun 21, 2021
d333556
Convert remaining commands to use the document's line ending setting.
cessen Jun 21, 2021
7c4fa18
Fix clippy warnings.
cessen Jun 21, 2021
a18d50b
Add command to set the document's default line ending.
cessen Jun 21, 2021
f2954fa
Flesh out the line ending utility unit tests.
cessen Jun 21, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion helix-core/src/auto_pairs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pub const PAIRS: &[(char, char)] = &[
('`', '`'),
];

const CLOSE_BEFORE: &str = ")]}'\":;> \n"; // includes space and newline
const CLOSE_BEFORE: &str = ")]}'\":;> \n\r\u{000B}\u{000C}\u{0085}\u{2028}\u{2029}"; // includes space and newlines

// insert hook:
// Fn(doc, selection, char) => Option<Transaction>
Expand Down
122 changes: 107 additions & 15 deletions helix-core/src/chars.rs
Original file line number Diff line number Diff line change
@@ -1,25 +1,44 @@
/// Determine whether a character is a line break.
pub fn char_is_linebreak(c: char) -> bool {
matches!(
c,
'\u{000A}' | // LineFeed
'\u{000B}' | // VerticalTab
'\u{000C}' | // FormFeed
'\u{000D}' | // CarriageReturn
'\u{0085}' | // NextLine
'\u{2028}' | // Line Separator
'\u{2029}' // ParagraphSeparator
)
use crate::LineEnding;

#[derive(Debug, Eq, PartialEq)]
pub enum CharCategory {
Whitespace,
Eol,
Word,
Punctuation,
Unknown,
}

#[inline]
pub fn categorize_char(ch: char) -> CharCategory {
if char_is_line_ending(ch) {
CharCategory::Eol
} else if ch.is_whitespace() {
CharCategory::Whitespace
} else if char_is_word(ch) {
CharCategory::Word
} else if char_is_punctuation(ch) {
CharCategory::Punctuation
} else {
CharCategory::Unknown
}
}

/// Determine whether a character is a line ending.
#[inline]
pub fn char_is_line_ending(ch: char) -> bool {
LineEnding::from_char(ch).is_some()
}

/// Determine whether a character qualifies as (non-line-break)
/// whitespace.
pub fn char_is_whitespace(c: char) -> bool {
#[inline]
pub fn char_is_whitespace(ch: char) -> bool {
// TODO: this is a naive binary categorization of whitespace
// characters. For display, word wrapping, etc. we'll need a better
// categorization based on e.g. breaking vs non-breaking spaces
// and whether they're zero-width or not.
match c {
match ch {
//'\u{1680}' | // Ogham Space Mark (here for completeness, but usually displayed as a dash, not as whitespace)
'\u{0009}' | // Character Tabulation
'\u{0020}' | // Space
Expand All @@ -34,8 +53,81 @@ pub fn char_is_whitespace(c: char) -> bool {
// En Quad, Em Quad, En Space, Em Space, Three-per-em Space,
// Four-per-em Space, Six-per-em Space, Figure Space,
// Punctuation Space, Thin Space, Hair Space, Zero Width Space.
c if ('\u{2000}' ..= '\u{200B}').contains(&c) => true,
ch if ('\u{2000}' ..= '\u{200B}').contains(&ch) => true,

_ => false,
}
}

#[inline]
pub fn char_is_punctuation(ch: char) -> bool {
use unicode_general_category::{get_general_category, GeneralCategory};

matches!(
get_general_category(ch),
GeneralCategory::OtherPunctuation
| GeneralCategory::OpenPunctuation
| GeneralCategory::ClosePunctuation
| GeneralCategory::InitialPunctuation
| GeneralCategory::FinalPunctuation
| GeneralCategory::ConnectorPunctuation
| GeneralCategory::DashPunctuation
| GeneralCategory::MathSymbol
| GeneralCategory::CurrencySymbol
| GeneralCategory::ModifierSymbol
)
}

#[inline]
pub fn char_is_word(ch: char) -> bool {
ch.is_alphanumeric() || ch == '_'
}

#[cfg(test)]
mod test {
use super::*;

#[test]
fn test_categorize() {
const EOL_TEST_CASE: &'static str = "\n\r\u{000B}\u{000C}\u{0085}\u{2028}\u{2029}";
const WORD_TEST_CASE: &'static str =
"_hello_world_あいうえおー12345678901234567890";
const PUNCTUATION_TEST_CASE: &'static str =
"!\"#$%&\'()*+,-./:;<=>?@[\\]^`{|}~!”#$%&’()*+、。:;<=>?@「」^`{|}~";
const WHITESPACE_TEST_CASE: &'static str = "      ";

for ch in EOL_TEST_CASE.chars() {
assert_eq!(CharCategory::Eol, categorize_char(ch));
}

for ch in WHITESPACE_TEST_CASE.chars() {
assert_eq!(
CharCategory::Whitespace,
categorize_char(ch),
"Testing '{}', but got `{:?}` instead of `Category::Whitespace`",
ch,
categorize_char(ch)
);
}

for ch in WORD_TEST_CASE.chars() {
assert_eq!(
CharCategory::Word,
categorize_char(ch),
"Testing '{}', but got `{:?}` instead of `Category::Word`",
ch,
categorize_char(ch)
);
}

for ch in PUNCTUATION_TEST_CASE.chars() {
assert_eq!(
CharCategory::Punctuation,
categorize_char(ch),
"Testing '{}', but got `{:?}` instead of `Category::Punctuation`",
ch,
categorize_char(ch)
);
}
}
}
3 changes: 3 additions & 0 deletions helix-core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ pub mod diagnostic;
pub mod graphemes;
pub mod history;
pub mod indent;
pub mod line_ending;
pub mod macros;
pub mod match_brackets;
pub mod movement;
Expand Down Expand Up @@ -106,6 +107,7 @@ pub use tendril::StrTendril as Tendril;
#[doc(inline)]
pub use {regex, tree_sitter};

pub use graphemes::RopeGraphemes;
pub use position::{coords_at_pos, pos_at_coords, Position};
pub use selection::{Range, Selection};
pub use smallvec::SmallVec;
Expand All @@ -114,4 +116,5 @@ pub use syntax::Syntax;
pub use diagnostic::Diagnostic;
pub use state::State;

pub use line_ending::{LineEnding, DEFAULT_LINE_ENDING};
pub use transaction::{Assoc, Change, ChangeSet, Operation, Transaction};
Loading