Long expressions #319

nicolasmaia · 2020-07-07T04:44:45Z

For some reason, Rikai can't recognize the expression 先生と言われるほどの馬鹿でなし.

It's been in JMdict since last year, though: https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&e=2070202

birtles · 2020-07-07T04:56:01Z

Actually, ignore my previous comment. We have a hard-coded limit for how long a string we look up. It's currently set at 13 characters.

nicolasmaia · 2020-07-07T05:06:16Z

Can we extend it?

birtles · 2020-07-07T05:09:31Z

Sure. We should probably do some analysis on the database to see what the longest entries are.

There will be some performance impact too since we'll do a number of more lookups than we need to, but hopefully it's acceptable? It would be nice if we could do some performance comparison though.

SaltfishAmi · 2020-07-08T14:12:18Z

1       3193
2       58979
3       61481
4       75303
5       49891
6       39011
7       30063
8       21971
9       13759
10      8512
11      5391
12      3409
13      2072
14      1332
15      889
16      554
17      371
18      236
19      168
20      95
21      62
22      51
23      35
24      17
25      16
26      11
27      7
28      3
29      3
30      2
31      2
32      1
33      3
34      2
35      0
36      0
37      1

This is from the bundled dict.idx. Didn't find anything longer than 37 full width characters.
The numbers include both kana-only and mixed kana-kanji entries.

birtles · 2020-07-09T00:37:35Z

Thanks! That's super useful.

I'm concerned that once I (finally) switch to storing the data in IndexedDB the extra lookup times for each substring are going produce a pretty significant performance impact.

Perhaps we could extend the limit to 16 for now? 19 at most?

SaltfishAmi · 2020-07-13T05:22:31Z

Thanks! That's super useful.

I'm concerned that once I (finally) switch to storing the data in IndexedDB the extra lookup times for each substring are going produce a pretty significant performance impact.

Perhaps we could extend the limit to 16 for now? 19 at most?

Or maybe this should be adjustable in the options page?

Lebon14 · 2020-08-04T02:29:41Z

Thanks! That's super useful.
I'm concerned that once I (finally) switch to storing the data in IndexedDB the extra lookup times for each substring are going produce a pretty significant performance impact.
Perhaps we could extend the limit to 16 for now? 19 at most?

Or maybe this should be adjustable in the options page?

This!

birtles · 2020-08-11T00:55:27Z

I've updated the hard-coded limit to 16 for now in: 91846e1

I try really hard to avoid adding options so I'd prefer to wait until I can profile the performance difference properly and, if there's no noticeable impact, just increase the limit to 37.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long expressions #319

Long expressions #319

nicolasmaia commented Jul 7, 2020

birtles commented Jul 7, 2020 •

edited

Loading

nicolasmaia commented Jul 7, 2020

birtles commented Jul 7, 2020

SaltfishAmi commented Jul 8, 2020 •

edited

Loading

birtles commented Jul 9, 2020

SaltfishAmi commented Jul 13, 2020

Lebon14 commented Aug 4, 2020 •

edited

Loading

birtles commented Aug 11, 2020

Long expressions #319

Long expressions #319

Comments

nicolasmaia commented Jul 7, 2020

birtles commented Jul 7, 2020 • edited Loading

nicolasmaia commented Jul 7, 2020

birtles commented Jul 7, 2020

SaltfishAmi commented Jul 8, 2020 • edited Loading

birtles commented Jul 9, 2020

SaltfishAmi commented Jul 13, 2020

Lebon14 commented Aug 4, 2020 • edited Loading

birtles commented Aug 11, 2020

birtles commented Jul 7, 2020 •

edited

Loading

SaltfishAmi commented Jul 8, 2020 •

edited

Loading

Lebon14 commented Aug 4, 2020 •

edited

Loading