-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for selecting a language to reconcile with #7
Comments
Hi @hay did you happen to see and try using the lowercase language codes for the labels, descriptions, aliases, and sitelinks? |
@hay, thank you for testing and for your good example. I have added automatic language detector using the langid-library. It should easily detect Dutch. Alternatively it's possible to specify 'language="lang-code"' in the annotate- and contextual_matching-functions. Note, that your example is a one-column table. The important feature of bbw is contextual matching with at least two cells in a row. We augment one-column table by copying the column. |
@shigapov I wonder if some of that info could be put into the docs or README.md? That seems useful to know. Hmm, maybe start a new /doc folder and begin putting some .md files in there just as a starting part for users who could also contribute back with PR's! |
Thanks, i tried running my list again on the same codebase and the language detection works pretty well. Thanks! |
I tried using this tool to reconcile a list of about 100 church denominations (a gist can be found here). Unfortunately, the results were pretty mediocre (only around 5 got matched) because the list is in Dutch while the matching is done using only English labels.
I think it would be a very useful addition to make sure it's possible to set up the language code. For both the OpenRefine reconciliation endpoint as well as the WD query service this is very easy. Also see my wdreconcile tool for some inspiration on how something like that could be done.
The text was updated successfully, but these errors were encountered: