Refactoring of deinflection of verbs #1814

enellis · 2024-06-10T13:40:58Z

Related to #1811.

Overview

This pull request introduces three new intermediate types for deinflection of Godan verbs:
(Ichidan verbs, with their single stem, would not benefit from these changes.)

TaTeStem
DaDeStem
MasuStem

These intermediate types allow for the abstraction of inflections that follow them. Specifically:

After TaTeStem/DaDeStem-stem:
- て-form and past (た-)form
- たら
- たり
- とく
- ちゃう
After MasuStem:
- All forms of the polite helper verb ます
- なさい
- そう
- たい
- すぎる

To help with that, Reason.None has been introduced and is used when forwarding TaTe/DaDe-stem forms to the plain verb form. This step is irrelevant for subsequent data processing, so while deinflection, it will never be added to the reasons array.

Benefits

The specialized implementations for all the aforementioned forms could be deleted, as they are now covered by the new intermediate types.
Many irregular verbs exhibit irregularities just in their て-stem or ます-stem. With these changes, it becomes sufficient to implement the irregular stem, ensuring that all subsequent inflections are correctly parsed.

birtles

In general, this looks great. Thank you!

Two comments:

Do you think we need to update the description of DeinflectRule given these recent changes?

(While we're at it, perhaps we should (i) move DeinflectRule and Type before deinflectRuleData, (ii) Make DeinflectRule a type rather than an interface, (iii) label the tuple elements of deinflectRuleData, e.g. [from: string, to: string, toType: number, fromType: number, reason: Reason]?)
I wonder how important defining InitialOrMasu and IchidanOrKuru are? My hunch is that it would be simpler and clearer just inlining the definitions for these?

enellis · 2024-06-11T06:06:00Z

Likewise, thank you for your input!

Do you think we need to update the description of DeinflectRule given these recent changes?

You're right, I completely overlooked this. I will update the description to reflect these changes including adding a few lines about the new intermediate types.

(While we're at it, perhaps we should (i) move DeinflectRule and Type before deinflectRuleData, (ii) Make DeinflectRule a type rather than an interface, (iii) label the tuple elements of deinflectRuleData, e.g. [from: string, to: string, toType: number, fromType: number, reason: Reason]?)

I wonder how important defining InitialOrMasu and IchidanOrKuru are? My hunch is that it would be simpler and clearer just inlining the definitions for these?

The main reason for defining InitialOrMasu and IchidanOrKuru separately was because inlining these definitions causes the lines to become too long, leading to wrapping which makes the deinflectRuleData array a lot harder to read for me. The same would apply to implementing (ii) and (iii), I suppose.
Is there an option to tell Prettier not to wrap the contents of this specific array? Without the wrapping issue I'd be happy to make all of these changes and inline the definitions again.

birtles · 2024-06-11T06:55:49Z

Do you think we need to update the description of DeinflectRule given these recent changes?

You're right, I completely overlooked this. I will update the description to reflect these changes including adding a few lines about the new intermediate types.

Thank you!

I wonder how important defining InitialOrMasu and IchidanOrKuru are? My hunch is that it would be simpler and clearer just inlining the definitions for these?

The main reason for defining InitialOrMasu and IchidanOrKuru separately was because inlining these definitions causes the lines to become too long, leading to wrapping which makes the deinflectRuleData array a lot harder to read for me. The same would apply to implementing (ii) and (iii), I suppose. Is there an option to tell Prettier not to wrap the contents of this specific array? Without the wrapping issue I'd be happy to make all of these changes and inline the definitions again.

Ah, yes, that makes sense.

Yes, you can tell Prettier not to wrap the array by preceding it with // prettier-ignore: https://prettier.io/docs/en/ignore.html#javascript

enellis · 2024-06-11T15:39:54Z

Update:

To be honest, I struggled a bit being precise and concise updating the DeinflectRule description. I hope it is clear enough now. Otherwise I'm very open for any suggestions.
When changing DeinflectRule to a named tuple, we would lose the ability to access the elements by dot notation later, like in getDeinflectRuleGroups(). When changing DeinflectRule to an object the whole deinflectRuleData becomes very verbose. What do you think is the best approach?

I will squash the commits after your review.

enellis · 2024-06-11T18:36:33Z

Thinking about it, I’d like to change the "reason" field to an array to eliminate the need for compound forms like PolitePast, PolitePastNegative, etc. This would be an user-facing change because for example, 食べませんでした would be parsed as < polite < past < negative (or better < polite < negative < past as this reflects the grammatical structure more closely) instead of < polite past negative.

In the future, this would allow us to parse forms like 食べへんかった as < dialectal < negative < past or 食べまして as < polite < te.

Would you be okay with this change? And if so, should that be a separate PR?

src/background/deinflect.ts

birtles · 2024-06-12T02:29:46Z

Update:

To be honest, I struggled a bit being precise and concise updating the DeinflectRule description. I hope it is clear enough now. Otherwise I'm very open for any suggestions.

Looks good! I added a few minor suggestions just now.

When changing DeinflectRule to a named tuple, we would lose the ability to access the elements by dot notation later, like in getDeinflectRuleGroups(). When changing DeinflectRule to an object the whole deinflectRuleData becomes very verbose. What do you think is the best approach?

Sorry, I didn't mean to change DeinflectRule to a named tuple, just the typing for deinflectRuleData!

I will squash the commits after your review.

Thank you! Don't worry if it's too much work, but if it's possible to have one commit where we move the order of the deinflectRuleData and related types and one commit where we make the other changes I think that would produce the easiest commit history to bisect.

birtles · 2024-06-12T02:34:52Z

Thinking about it, I’d like to change the "reason" field to an array to eliminate the need for compound forms like PolitePast, PolitePastNegative, etc. This would be an user-facing change because for example, 食べませんでした would be parsed as < polite < past < negative (or better < polite < negative < past as this reflects the grammatical structure more closely) instead of < polite past negative.

I think "polite past negative" might be preferable to most Japanese learners. Even if "< polite < negative < past" is more accurate linguistically, I think most learners learn ～ませんでした as a set and are used to thinking of it as such so I think "< polite < negative < past" might actually be less useful.

In the future, this would allow us to parse forms like 食べへんかった as < dialectal < negative < past or 食べまして as < polite < te.

I've been looking forward to tackling dialects for a while. This will be great! I think in this example I'd actually prefer the annotation for 食べへんかった to read "(Kansai-ben) < negative < past".

Would you be okay with this change? And if so, should that be a separate PR?

Given that this is a functional change, I think a separate PR / issue would be better so we can discuss the user-facing impact.

Thank you so much!

src/background/deinflect.ts

- add three intermediate types: TaTeStem, DaDeStem, and MasuStem for Godan verbs - abstract inflections following these stems - introduce Reason.None to forward TaTe/DaDe-stem forms to plain verb form - remove specialized implementations now covered by new types

enellis · 2024-06-12T11:51:07Z

Don't worry if it's too much work, but if it's possible to have one commit where we move the order of the deinflectRuleData and related types and one commit where we make the other changes I think that would produce the easiest commit history to bisect.

The changes inside the deinflectRuleData array kind of depend on the changes around it, so i went with

Moving DeinflectRule and Type before deinflectRuleData and condense the array elements into a single line.
Everything else.

Is this acceptable as well?

enellis · 2024-06-12T12:34:33Z

I think "polite past negative" might be preferable to most Japanese learners. Even if "< polite < negative < past" is more accurate linguistically, I think most learners learn ～ませんでした as a set and are used to thinking of it as such so I think "< polite < negative < past" might actually be less useful.

^{_{(I really wonder why it became the standard learning Japanese in a complicated way. ^^)}}
Yeah, I can see that. Removing the compound forms was just a quick idea which isn't really necessary. The switch to an array would have some other good advantages.

I've been looking forward to tackling dialects for a while. This will be great! I think in this example I'd actually prefer the annotation for 食べへんかった to read "(Kansai-ben) < negative < past".
[...]
Given that this is a functional change, I think a separate PR / issue would be better so we can discuss the user-facing impact.

Cool, I have a working branch for better 敬語 support, which would also benefit from the possibility to add multiple reasons to one rule (ませ as < polite < imperative). I think I want to focus on that first and then open a new issue and take care of some 方言・関西弁.

birtles · 2024-06-13T03:03:28Z

Sorry for the delay in reviewing. I wanted to go through it properly on a large screen. It looks great. Thanks for splitting up the commits too—that made it much easier to review.

birtles · 2024-06-13T03:10:58Z

I think "polite past negative" might be preferable to most Japanese learners. Even if "< polite < negative < past" is more accurate linguistically, I think most learners learn ～ませんでした as a set and are used to thinking of it as such so I think "< polite < negative < past" might actually be less useful.

(I really wonder why it became the standard learning Japanese in a complicated way. ^^) Yeah, I can see that. Removing the compound forms was just a quick idea which isn't really necessary. The switch to an array would have some other good advantages.

I'm not too worried which form we end up with. My initial reaction to "< polite < negative < past" was just that it looked complicated for something that is reasonably common but maybe that's more to do with the "<" delimiters. Maybe we should consider another way of combining the reasons?

I've been looking forward to tackling dialects for a while. This will be great! I think in this example I'd actually prefer the annotation for 食べへんかった to read "(Kansai-ben) < negative < past".
[...]
Given that this is a functional change, I think a separate PR / issue would be better so we can discuss the user-facing impact.

Cool, I have a working branch for better 敬語 support, which would also benefit from the possibility to add multiple reasons to one rule (ませ as < polite < imperative).

Sounds great. I have long-term ambitions of doing some really helpful things with お聞きしたい, ご利用になる, 御手紙, おビール etc. but the implementation for that will be quite different.

I think I want to focus on that first and then open a new issue and take care of some 方言・関西弁.

By all means. Looking forward to it! Thanks again!

enellis · 2024-06-13T06:37:14Z

Sorry for the delay in reviewing. I wanted to go through it properly on a large screen. It looks great. Thanks for splitting up the commits too—that made it much easier to review.

Absolutely no worries. Thank you for reviewing and merging this!

I'm not too worried which form we end up with. My initial reaction to "< polite < negative < past" was just that it looked complicated for something that is reasonably common but maybe that's more to do with the "<" delimiters. Maybe we should consider another way of combining the reasons?

Maybe it would help already making them a little bit smaller, like "‹ polite ‹ negative ‹ past"? (I will open a thread in Discussions for this.)

Sounds great. I have long-term ambitions of doing some really helpful things with お聞きしたい, ご利用になる, 御手紙, おビール etc. but the implementation for that will be quite different.

Interesting. I can imagine this to be a little bit more complicated to implement, but it would be a nice level up for 10ten.

birtles reviewed Jun 11, 2024

View reviewed changes

birtles reviewed Jun 12, 2024

View reviewed changes

enellis force-pushed the refactor-deinflect branch from 178c931 to 1e2f447 Compare June 12, 2024 05:38

enellis commented Jun 12, 2024

View reviewed changes

src/background/deinflect.ts Show resolved Hide resolved

enellis added 2 commits June 12, 2024 13:32

chore: tidy up deinflect.ts

512b83c

enellis force-pushed the refactor-deinflect branch from 1e2f447 to 5da4e53 Compare June 12, 2024 11:47

birtles merged commit c306edd into birchill:main Jun 13, 2024
1 check passed

enellis deleted the refactor-deinflect branch June 13, 2024 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of deinflection of verbs #1814

Refactoring of deinflection of verbs #1814

enellis commented Jun 10, 2024

birtles left a comment

enellis commented Jun 11, 2024 •

edited

Loading

birtles commented Jun 11, 2024

enellis commented Jun 11, 2024 •

edited

Loading

enellis commented Jun 11, 2024 •

edited

Loading

birtles commented Jun 12, 2024 •

edited

Loading

birtles commented Jun 12, 2024

enellis commented Jun 12, 2024

enellis commented Jun 12, 2024

birtles commented Jun 13, 2024

birtles commented Jun 13, 2024

enellis commented Jun 13, 2024

Refactoring of deinflection of verbs #1814

Refactoring of deinflection of verbs #1814

Conversation

enellis commented Jun 10, 2024

Overview

Benefits

birtles left a comment

Choose a reason for hiding this comment

enellis commented Jun 11, 2024 • edited Loading

birtles commented Jun 11, 2024

enellis commented Jun 11, 2024 • edited Loading

enellis commented Jun 11, 2024 • edited Loading

birtles commented Jun 12, 2024 • edited Loading

birtles commented Jun 12, 2024

enellis commented Jun 12, 2024

enellis commented Jun 12, 2024

birtles commented Jun 13, 2024

birtles commented Jun 13, 2024

enellis commented Jun 13, 2024

enellis commented Jun 11, 2024 •

edited

Loading

enellis commented Jun 11, 2024 •

edited

Loading

enellis commented Jun 11, 2024 •

edited

Loading

birtles commented Jun 12, 2024 •

edited

Loading