Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

Open
delthas opened this issue Jun 7, 2024 · 0 comments

Comments

@delthas
Copy link

delthas commented Jun 7, 2024

Hi,

The FirstGraphemeCluster function can be used to iteratively extract grapheme clusters from a string (without additional allocations). The function mentions that a state should be passed (initially set to -1), is then returned and should be passed again on the next call, in order to preserve some state across calls of this function.

This state contains the current grapheme cluster parser state, and the property of the next codepoint.

It did not make sense to me that decoding grapheme cluster depended on earlier state: I'd expected that each grapheme cluster was fully independent.

To test this, I took the full test case for grapheme cluster boundary processing of Unicode 14.0 (the version supported by the library), and ran a simple test by calling FirstGraphemeClusterInString and comparing the results with the spec:

  • When preserving the state across grapheme clusters: everything works (as expected: the library is compliant 😋)
  • When explicitly resetting the state to -1 across calls to FirstGraphemeClusterInString (should be incorrect): everything still works, all tests pass!!!

This would mean that even when not preserving any state, the actual grapheme clusters that are returned are always the same.

So, from my understanding, there shouldn't be the need for any state at all between calls of the library; and the state parameter can be fully deprecated.

Full test case (see the TODO line), try running in the Go playground (prints All tests passed): https://gist.github.com/delthas/0965a2c198b3a114fbb6706435786b73

aymanbagabas added a commit to charmbracelet/x that referenced this issue Aug 2, 2024
We don't need to keep track of grapheme states, see rivo/uniseg#58
aymanbagabas added a commit to charmbracelet/x that referenced this issue Aug 2, 2024
We don't need to keep track of grapheme states, see rivo/uniseg#58
aymanbagabas added a commit to charmbracelet/x that referenced this issue Aug 5, 2024
We don't need to keep track of grapheme states, see rivo/uniseg#58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant