Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are the part-of-speech labels placed at the end? #436

Closed
SaltfishAmi opened this issue Nov 26, 2020 · 34 comments · Fixed by #464
Closed

Why are the part-of-speech labels placed at the end? #436

SaltfishAmi opened this issue Nov 26, 2020 · 34 comments · Fixed by #464

Comments

@SaltfishAmi
Copy link
Contributor

I think most dictionaries show them at the beginning of each sense.
Is there any particular reason for this?

By the way, I am finally able to get rid of the puzzling part-of-speech annotation in English (by deleting _locales\en and building my own version of 0.3.0)
I'd been really looking forward to this version, so congrats on release!

@birtles
Copy link
Member

birtles commented Nov 26, 2020

I think most dictionaries show them at the beginning of each sense.
Is there any particular reason for this?

Yes, the reason is that putting the part-of-speech labels at the start makes it harder to scan the list of meanings. However, I'm quite happy to change that if people dislike it.

By the way, I am finally able to get rid of the puzzling part-of-speech annotation in English (by deleting _locales\en and building my own version of 0.3.0)

Which part do you find puzzling? There is a setting to show dictionary codes vs explanations vs nothing at all. What is the behavior you would prefer? Should there be another setting?

I'd been really looking forward to this version, so congrats on release!

Thank you and thank you for all your help!

@SaltfishAmi
Copy link
Contributor Author

SaltfishAmi commented Nov 26, 2020

Which part do you find puzzling? There is a setting to show dictionary codes vs explanations vs nothing at all. What is the behavior you would prefer? Should there be another setting?

Nah. I did not learn Japanese using English, so anything explaining Japanese grammar in English is puzzling to me. Absolutely personal.
The definitions in English is fine though, as they are easy to understand, without any particular terms.
I'm now using the explanations in ja locale, as English explanations look the same as English codes.

@birtles
Copy link
Member

birtles commented Nov 26, 2020

Nah. I did not learn Japanese using English, so anything explaining Japanese grammar in English is puzzling to me. Absolutely personal.
The definitions in English is fine though, as they are easy to understand, without any particular terms.
I'm now using the explanations in ja locale, as English explanations look the same as English codes.

Ah, I see. I use the Japanese version of Firefox, partly just so I can test the Japanese localization. One day it would be nice to be able to change the Rikaichamp language without having to change the browser language. Unfortunately the Web Extensions i18n framework doesn't allow that so it would have to be something custom.

@SaltfishAmi
Copy link
Contributor Author

Yes, the reason is that putting the part-of-speech labels at the start makes it harder to scan the list of meanings. However, I'm quite happy to change that if people dislike it.

I'd like to have an option to put the labels at start, because I use the part-of-speech as a way to find the meaning that I might need.
Also my instincts expect the labels to be in a kind of aligned order, like, I expect them to appear at a similar relative position to every meaning. But putting the part-of-speech labels at the end meaning that I need to scan through the variable-length definition text before finding the labels, which makes me feel very... weird, kind of counter-intuitive. Every next label's position is unexpected.

@birtles
Copy link
Member

birtles commented Nov 26, 2020

That's a good comment.

I'm probably going to do a point release to fix the problem that the SVG star icon can be too large for people who are upgrading and haven't yet reloaded the page (since it will use the old stylesheet).

I might try to change the position of the part-of-speech labels in the same point release.

@SaltfishAmi
Copy link
Contributor Author

One day it would be nice to be able to change the Rikaichamp language without having to change the browser language.

And it also makes me feel weird that the part-of-speech labels and deinflection labels belong to the Rikaichamp extension locale instead of dictionary language. Very weird😂

@birtles
Copy link
Member

birtles commented Nov 26, 2020

I tried changing the position of the part-of-speech labels, but I wonder if it makes the meanings less clear:

image

@birtles
Copy link
Member

birtles commented Nov 26, 2020

And it also makes me feel weird that the part-of-speech labels and deinflection labels belong to the Rikaichamp extension locale instead of dictionary language. Very weird😂

Yeah, it's a bit odd I guess. It made sense to me simply because I know some Japanese people who use this add-on and I assume they would prefer the metadata to be localized.

Also, the JMdict data doesn't have part-of-speech information for non-English glosses. So even once we make it possible to select the dictionary language (hopefully in the next major release in December), the part-of-speech information will only show up for the English glosses.

@SaltfishAmi
Copy link
Contributor Author

SaltfishAmi commented Nov 26, 2020

Hmm... I think it's fine, but we may need other people's opinions.

edit: I've already included this in my custom build, and it looks great.

And for the first 日 in the image, as the 3 meanings have a same p-o-s, is it possible to show only one label before all the meanings?

@birtles
Copy link
Member

birtles commented Nov 26, 2020

And for the first 日 in the image, as the 3 meanings have a same p-o-s, is it possible to show only one label before all the meanings?

We could. Actually JMdict used to follow that pattern where any p-o-s label was assumed to apply to all following glosses unless they specified their own. However, that changes a few weeks ago when they seemed to be mass-copied to each gloss.

Perhaps we should drop redundant p-o-s labels and move the position of the labels in the same release.

@birtles
Copy link
Member

birtles commented Nov 26, 2020

For now I put the change to move the part-of-speech label on a separate branch so I can ship the CSS fix separately.

SaltfishAmi pushed a commit to SaltfishAmi/rikaichamp that referenced this issue Nov 28, 2020
@nicolasmaia
Copy link

I kinda like having the POS at the beginning for consistency; it's easy to know where to find them, everytime. It might be worth having this as default and trailing POS as an optional setting.

@SaltfishAmi
Copy link
Contributor Author

And it also makes me feel weird that the part-of-speech labels and deinflection labels belong to the Rikaichamp extension locale instead of dictionary language. Very weird😂

And I can recall that I added unconditional lang='ja' to the whole popup window to prevent rendering Japanese in Chinese font months ago. Now if I set my browser locale to zh-hans, this time it will render the Chinese tags in Japanese font. Wow. Looks like every dog...ahem, Every locale has its day!

@birtles
Copy link
Member

birtles commented Nov 30, 2020

Ah, that's funny. I think now that we are using structured data for all the entries, however, we should be marking all the Japanese text as Japanese already so we don't need the root-level "lang=ja" attribute anymore.

For the localized strings, if we can work out which language we are using, we could add lang attributes to those elements too. As far as I can tell, we can use getUILanguage to work out the browser language but not necessarily which locale got applied unless we manually try to compare that against the list of locales we have prepared.

@SaltfishAmi
Copy link
Contributor Author

It’s a bizarre design to assume that every extension happily follows the browser locale, being unable to choose their own. I’d say the i18n guys are not nerdy enough.

@birtles
Copy link
Member

birtles commented Dec 1, 2020

I had a bit more of a fiddle with this today, using ditto marks for repeated part-of-speech as well as for the misc items too. This is a bit of an extreme case but...

image

@SaltfishAmi
Copy link
Contributor Author

SaltfishAmi commented Dec 1, 2020

[image]

This looks really good in this image. But I'm pretty sure the ditto marks can introduce some kind of ambiguity?
Like, if there is an entry whose: sense 1-3 is noun; sense 4-6 is verb, sense 7-9 is adj, and maybe with some further detailed subcategoies, or not. How will the ditto marks behave?

@birtles
Copy link
Member

birtles commented Dec 1, 2020

Like, if there is an entry whose: sense 1-3 is noun; sense 4-6 is verb, sense 7-9 is adj, and maybe with some further detailed subcategoies, or not. How will the ditto marks behave?

Thanks! I think you would just have:

  1. (noun) ...
  2. (〃) ...
  3. (〃) ...
  4. (verb) ...
  5. (〃) ...
  6. (〃) ...
  7. (adjective) ...
  8. (〃) ...
  9. (〃) ...

For what it's worth, the code for the screenshot is on the make-pos-first branch.

@SaltfishAmi
Copy link
Contributor Author

As far as I can tell, we can use getUILanguage to work out the browser language but not necessarily which locale got applied unless we manually try to compare that against the list of locales we have prepared.

How about add a ”current_locale”: {message: “(locale)”} to every _locales/(locale)/messages.json and use getMessage to determine the active locale

@birtles
Copy link
Member

birtles commented Dec 2, 2020

How about add a ”current_locale”: {message: “(locale)”} to every _locales/(locale)/messages.json and use getMessage to determine the active locale

That's a brilliant idea. I'm going to make up a separate patch for fixing language tagging.

@birtles
Copy link
Member

birtles commented Dec 2, 2020

b425057 should fix the language tagging so that Chinese text now correctly shows up as such

@birtles
Copy link
Member

birtles commented Dec 2, 2020

I had a go at the grouping feature and with a bit of tweaking I think it might work:

image

@birtles
Copy link
Member

birtles commented Dec 2, 2020

I made a few tweaks including separating the different part-of-speech labels into separate tags

image

I've left the 〃 behavior for misc labels however:

image

But perhaps if we simply change the localization for "uK" etc. from "usually kanji" to just "kanji" maybe we wouldn't need the 〃?

@SaltfishAmi
Copy link
Contributor Author

I made a few tweaks including separating the different part-of-speech labels into separate tags
[image]

This looks great! So the sense 3. counter for days has both suffix and counter right?

@birtles
Copy link
Member

birtles commented Dec 2, 2020

This looks great! So the sense 3. counter for days has both suffix and counter right?

Thanks! Right, that's the idea.

It just tries to group on at least one common part-of-speech. It we group only senses having all the same parts-of-speech, for some entries we end up with too many groups and the display is too long.

@SaltfishAmi
Copy link
Contributor Author

I really think the misc should be put staight under the headword if and only if all senses under the entry share a same misc label, like the usually kana under 狐 or する.

@birtles
Copy link
Member

birtles commented Dec 2, 2020

I really think the misc should be put staight under the headword if and only if all senses under the entry share a same misc label, like the usually kana under 狐 or する.

Ok, I think that makes sense. I think it might make sense to keep the existing grouping by part-of-speech, and if all the senses within a part-of-speech group share a misc label of some sort, move it to the group heading.

@birtles
Copy link
Member

birtles commented Dec 2, 2020

After doing that, I think we could probably drop the 〃 behavior.

@birtles
Copy link
Member

birtles commented Dec 2, 2020

So I think we have two alternatives in mind here:

a) Move misc labels to the top (below the headwords and before any other groups) only when all the senses for an entry share it.
b) Move any misc labels shared by a part-of-speech group to the part-of-speech group heading.

I really don't know which is better and should probably implement both and screenshot them for comparison.

@birtles
Copy link
Member

birtles commented Dec 2, 2020

Ok, first of all, here is option (b).


image

する
image

And for something where the misc labels actually appear on the group headings see 日
image

Or this one:
image

For my reference, the patch for this is here: https://gist.github.com/birtles/584cdbf3a7673e7131d4cdc5889b7656

@birtles
Copy link
Member

birtles commented Dec 2, 2020

And for (a)


image

する
image


image

IN
image

Patch for my reference: https://gist.github.com/birtles/a01ff073bc71a8904ec26f4be31092e0

@SaltfishAmi
Copy link
Contributor Author

I changed my mind. b) looks really great.
But there comes another problem: If there is only one sense in a group, the the grouping seems unnecessary and redundant as it turns a possibly one-line sense into two lines. But if we don't group them, ambiguity would be introduced again. Do you have any good idea about this?

@birtles
Copy link
Member

birtles commented Dec 3, 2020

I changed my mind. b) looks really great.
But there comes another problem: If there is only one sense in a group, the the grouping seems unnecessary and redundant as it turns a possibly one-line sense into two lines. But if we don't group them, ambiguity would be introduced again. Do you have any good idea about this?

Thanks!

Currently the approach we use is to skip grouping unless there is at least one group with more than one item. So at least we avoid the worst-case situation where every line becomes two.

For some cases like IN above, the grouping make the definition quite a bit longer, but I think it's not too common and ok?

@birtles
Copy link
Member

birtles commented Dec 7, 2020

I've updated the branch now to do (b).

One possible approach we could take is to do the grouping and the compare how much longer the entry becomes as a result (by doing a naive count of the number of lines, ignoring word wrapping). If it is, say, 50% or more longer, we could abandon the grouping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants