Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of deinflection of verbs #1814

Merged
merged 2 commits into from
Jun 13, 2024
Merged

Conversation

enellis
Copy link
Contributor

@enellis enellis commented Jun 10, 2024

Related to #1811.

Overview

This pull request introduces three new intermediate types for deinflection of Godan verbs:
(Ichidan verbs, with their single stem, would not benefit from these changes.)

  • TaTeStem
  • DaDeStem
  • MasuStem

These intermediate types allow for the abstraction of inflections that follow them. Specifically:

  • After TaTeStem/DaDeStem-stem:

    • て-form and past (た-)form
    • たら
    • たり
    • とく
    • ちゃう
  • After MasuStem:

    • All forms of the polite helper verb ます
    • なさい
    • そう
    • たい
    • すぎる

To help with that, Reason.None has been introduced and is used when forwarding TaTe/DaDe-stem forms to the plain verb form. This step is irrelevant for subsequent data processing, so while deinflection, it will never be added to the reasons array.

Benefits

  • The specialized implementations for all the aforementioned forms could be deleted, as they are now covered by the new intermediate types.
  • Many irregular verbs exhibit irregularities just in their て-stem or ます-stem. With these changes, it becomes sufficient to implement the irregular stem, ensuring that all subsequent inflections are correctly parsed.

Copy link
Member

@birtles birtles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks great. Thank you!

Two comments:

  1. Do you think we need to update the description of DeinflectRule given these recent changes?

    (While we're at it, perhaps we should (i) move DeinflectRule and Type before deinflectRuleData, (ii) Make DeinflectRule a type rather than an interface, (iii) label the tuple elements of deinflectRuleData, e.g. [from: string, to: string, toType: number, fromType: number, reason: Reason]?)

  2. I wonder how important defining InitialOrMasu and IchidanOrKuru are? My hunch is that it would be simpler and clearer just inlining the definitions for these?

@enellis
Copy link
Contributor Author

enellis commented Jun 11, 2024

Likewise, thank you for your input!

Do you think we need to update the description of DeinflectRule given these recent changes?

You're right, I completely overlooked this. I will update the description to reflect these changes including adding a few lines about the new intermediate types.

(While we're at it, perhaps we should (i) move DeinflectRule and Type before deinflectRuleData, (ii) Make DeinflectRule a type rather than an interface, (iii) label the tuple elements of deinflectRuleData, e.g. [from: string, to: string, toType: number, fromType: number, reason: Reason]?)

  1. I wonder how important defining InitialOrMasu and IchidanOrKuru are? My hunch is that it would be simpler and clearer just inlining the definitions for these?

The main reason for defining InitialOrMasu and IchidanOrKuru separately was because inlining these definitions causes the lines to become too long, leading to wrapping which makes the deinflectRuleData array a lot harder to read for me. The same would apply to implementing (ii) and (iii), I suppose.
Is there an option to tell Prettier not to wrap the contents of this specific array? Without the wrapping issue I'd be happy to make all of these changes and inline the definitions again.

@birtles
Copy link
Member

birtles commented Jun 11, 2024

Do you think we need to update the description of DeinflectRule given these recent changes?

You're right, I completely overlooked this. I will update the description to reflect these changes including adding a few lines about the new intermediate types.

Thank you!

  1. I wonder how important defining InitialOrMasu and IchidanOrKuru are? My hunch is that it would be simpler and clearer just inlining the definitions for these?

The main reason for defining InitialOrMasu and IchidanOrKuru separately was because inlining these definitions causes the lines to become too long, leading to wrapping which makes the deinflectRuleData array a lot harder to read for me. The same would apply to implementing (ii) and (iii), I suppose. Is there an option to tell Prettier not to wrap the contents of this specific array? Without the wrapping issue I'd be happy to make all of these changes and inline the definitions again.

Ah, yes, that makes sense.

Yes, you can tell Prettier not to wrap the array by preceding it with // prettier-ignore: https://prettier.io/docs/en/ignore.html#javascript

@enellis
Copy link
Contributor Author

enellis commented Jun 11, 2024

Update:

  • To be honest, I struggled a bit being precise and concise updating the DeinflectRule description. I hope it is clear enough now. Otherwise I'm very open for any suggestions.
  • When changing DeinflectRule to a named tuple, we would lose the ability to access the elements by dot notation later, like in getDeinflectRuleGroups(). When changing DeinflectRule to an object the whole deinflectRuleData becomes very verbose. What do you think is the best approach?

I will squash the commits after your review.

@enellis
Copy link
Contributor Author

enellis commented Jun 11, 2024

Thinking about it, I’d like to change the "reason" field to an array to eliminate the need for compound forms like PolitePast, PolitePastNegative, etc. This would be an user-facing change because for example, 食べませんでした would be parsed as < polite < past < negative (or better < polite < negative < past as this reflects the grammatical structure more closely) instead of < polite past negative.

In the future, this would allow us to parse forms like 食べへんかった as < dialectal < negative < past or 食べまして as < polite < te.

Would you be okay with this change? And if so, should that be a separate PR?

src/background/deinflect.ts Outdated Show resolved Hide resolved
src/background/deinflect.ts Outdated Show resolved Hide resolved
src/background/deinflect.ts Outdated Show resolved Hide resolved
src/background/deinflect.ts Outdated Show resolved Hide resolved
src/background/deinflect.ts Show resolved Hide resolved
@birtles
Copy link
Member

birtles commented Jun 12, 2024

Update:

  • To be honest, I struggled a bit being precise and concise updating the DeinflectRule description. I hope it is clear enough now. Otherwise I'm very open for any suggestions.

Looks good! I added a few minor suggestions just now.

  • When changing DeinflectRule to a named tuple, we would lose the ability to access the elements by dot notation later, like in getDeinflectRuleGroups(). When changing DeinflectRule to an object the whole deinflectRuleData becomes very verbose. What do you think is the best approach?

Sorry, I didn't mean to change DeinflectRule to a named tuple, just the typing for deinflectRuleData!

I will squash the commits after your review.

Thank you! Don't worry if it's too much work, but if it's possible to have one commit where we move the order of the deinflectRuleData and related types and one commit where we make the other changes I think that would produce the easiest commit history to bisect.

@birtles
Copy link
Member

birtles commented Jun 12, 2024

Thinking about it, I’d like to change the "reason" field to an array to eliminate the need for compound forms like PolitePast, PolitePastNegative, etc. This would be an user-facing change because for example, 食べませんでした would be parsed as < polite < past < negative (or better < polite < negative < past as this reflects the grammatical structure more closely) instead of < polite past negative.

I think "polite past negative" might be preferable to most Japanese learners. Even if "< polite < negative < past" is more accurate linguistically, I think most learners learn ~ませんでした as a set and are used to thinking of it as such so I think "< polite < negative < past" might actually be less useful.

In the future, this would allow us to parse forms like 食べへんかった as < dialectal < negative < past or 食べまして as < polite < te.

I've been looking forward to tackling dialects for a while. This will be great! I think in this example I'd actually prefer the annotation for 食べへんかった to read "(Kansai-ben) < negative < past".

Would you be okay with this change? And if so, should that be a separate PR?

Given that this is a functional change, I think a separate PR / issue would be better so we can discuss the user-facing impact.

Thank you so much!

@enellis enellis force-pushed the refactor-deinflect branch from 178c931 to 1e2f447 Compare June 12, 2024 05:38
enellis added 2 commits June 12, 2024 13:32
- add three intermediate types: TaTeStem, DaDeStem, and MasuStem for Godan verbs
- abstract inflections following these stems
- introduce Reason.None to forward TaTe/DaDe-stem forms to plain verb form
- remove specialized implementations now covered by new types
@enellis enellis force-pushed the refactor-deinflect branch from 1e2f447 to 5da4e53 Compare June 12, 2024 11:47
@enellis
Copy link
Contributor Author

enellis commented Jun 12, 2024

Don't worry if it's too much work, but if it's possible to have one commit where we move the order of the deinflectRuleData and related types and one commit where we make the other changes I think that would produce the easiest commit history to bisect.

The changes inside the deinflectRuleData array kind of depend on the changes around it, so i went with

  • Moving DeinflectRule and Type before deinflectRuleData and condense the array elements into a single line.
  • Everything else.

Is this acceptable as well?

@enellis
Copy link
Contributor Author

enellis commented Jun 12, 2024

I think "polite past negative" might be preferable to most Japanese learners. Even if "< polite < negative < past" is more accurate linguistically, I think most learners learn ~ませんでした as a set and are used to thinking of it as such so I think "< polite < negative < past" might actually be less useful.

(I really wonder why it became the standard learning Japanese in a complicated way. ^^)
Yeah, I can see that. Removing the compound forms was just a quick idea which isn't really necessary. The switch to an array would have some other good advantages.

I've been looking forward to tackling dialects for a while. This will be great! I think in this example I'd actually prefer the annotation for 食べへんかった to read "(Kansai-ben) < negative < past".
[...]
Given that this is a functional change, I think a separate PR / issue would be better so we can discuss the user-facing impact.

Cool, I have a working branch for better 敬語 support, which would also benefit from the possibility to add multiple reasons to one rule (ませ as < polite < imperative). I think I want to focus on that first and then open a new issue and take care of some 方言・関西弁.

@birtles birtles merged commit c306edd into birchill:main Jun 13, 2024
1 check passed
@birtles
Copy link
Member

birtles commented Jun 13, 2024

Sorry for the delay in reviewing. I wanted to go through it properly on a large screen. It looks great. Thanks for splitting up the commits too—that made it much easier to review.

@birtles
Copy link
Member

birtles commented Jun 13, 2024

I think "polite past negative" might be preferable to most Japanese learners. Even if "< polite < negative < past" is more accurate linguistically, I think most learners learn ~ませんでした as a set and are used to thinking of it as such so I think "< polite < negative < past" might actually be less useful.

(I really wonder why it became the standard learning Japanese in a complicated way. ^^) Yeah, I can see that. Removing the compound forms was just a quick idea which isn't really necessary. The switch to an array would have some other good advantages.

I'm not too worried which form we end up with. My initial reaction to "< polite < negative < past" was just that it looked complicated for something that is reasonably common but maybe that's more to do with the "<" delimiters. Maybe we should consider another way of combining the reasons?

I've been looking forward to tackling dialects for a while. This will be great! I think in this example I'd actually prefer the annotation for 食べへんかった to read "(Kansai-ben) < negative < past".
[...]
Given that this is a functional change, I think a separate PR / issue would be better so we can discuss the user-facing impact.

Cool, I have a working branch for better 敬語 support, which would also benefit from the possibility to add multiple reasons to one rule (ませ as < polite < imperative).

Sounds great. I have long-term ambitions of doing some really helpful things with お聞きしたい, ご利用になる, 御手紙, おビール etc. but the implementation for that will be quite different.

I think I want to focus on that first and then open a new issue and take care of some 方言・関西弁.

By all means. Looking forward to it! Thanks again!

@enellis
Copy link
Contributor Author

enellis commented Jun 13, 2024

Sorry for the delay in reviewing. I wanted to go through it properly on a large screen. It looks great. Thanks for splitting up the commits too—that made it much easier to review.

Absolutely no worries. Thank you for reviewing and merging this!

I'm not too worried which form we end up with. My initial reaction to "< polite < negative < past" was just that it looked complicated for something that is reasonably common but maybe that's more to do with the "<" delimiters. Maybe we should consider another way of combining the reasons?

Maybe it would help already making them a little bit smaller, like "‹ polite ‹ negative ‹ past"? (I will open a thread in Discussions for this.)

Sounds great. I have long-term ambitions of doing some really helpful things with お聞きしたい, ご利用になる, 御手紙, おビール etc. but the implementation for that will be quite different.

Interesting. I can imagine this to be a little bit more complicated to implement, but it would be a nice level up for 10ten.

@enellis enellis deleted the refactor-deinflect branch June 13, 2024 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants