Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synonyms.disablePhraseQueries is ignored if using WhitespaceTokenizer #34

Closed
nolanlawson opened this issue Nov 1, 2013 · 1 comment
Closed
Labels
Milestone

Comments

@nolanlawson
Copy link
Member

Unfortunately, the fixes I made for #9 and #26 (to use the WhitespaceTokenizer instead of the StandardTokenizer as the default config) also cause phrases to never be expanded (since the quotes are considered part of the word).

I'm thinking the best solution to this will be a somewhat hacky workaround using Token Filters to remove the quotes, since the WhitespaceTokenizer seemed to work so well in just about every other case.

@nolanlawson
Copy link
Member Author

Some closing comments on this issue:

I find myself running into more and more problems with the fact that we use a single analyzer for both the query (during synonym expansion) and the synonym file. It almost feels like we need to move beyond the synonym file format and create a special parser to capture the kinds of synonyms people are using:

血と骨, Blood and Bones          # UTF-8, people expect no tokenization between characters
e-commerce, electronic commerce # people expect no tokenization on hyphens

and the kinds of queries they expect to "just work" in these scenarios:

# full phrase, people expect the quotes to be ignored when matched against synonym file
"canis familiaris"

# StandardTokenizer splits this into 2 tokens, but people expect the synonyms to work
e-commerce

# currently broken (due to the dot), because we're using a special
# whitespace-and-quotes tokenizer to favor proper parsing of the synonym file.
# Luckily most people don't type these characters when they search.
dog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant