Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate different ElasticSearch features #19

Open
antonisloizou opened this issue Jan 21, 2016 · 5 comments
Open

Evaluate different ElasticSearch features #19

antonisloizou opened this issue Jan 21, 2016 · 5 comments

Comments

@antonisloizou
Copy link

So this about things like stemming , synonyms, misspellings..
I guess many of these are nice to have, but the more we allow "non-exact" matches , the more false positives we might introduce in the results.

So we'll need to experiment to find a good balance

@ianwdunlop
Copy link
Member

Misspellings can be done using 'fuzziness' eg. Asprin would (in the current index) list Aspirin as the second hit using the default 'AUTO' setting for fuzzines. See https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-query.html for more details.

@ianwdunlop
Copy link
Member

Setting the fuzziness to '1' makes Aspirin the top hit for Asprin.

@ianwdunlop
Copy link
Member

Is there a list of life sciences/pharamceutical/chemistry synonyms?

@nicklynch
Copy link

Suppose there are different options:
For alternate drug names: Drugbank
For general chemistry names:
Did ConceptWiki store that? Could we ask them?
There is type of free service but its not the full data set, just an API http://www.commonchemistry.org/

Not sure about common typos

@ianwdunlop
Copy link
Member

For stemming we need to index a different field at load time which contains the stemmed version and then include it in the searched fields via the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants