You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately, the fixes I made for #9 and #26 (to use the WhitespaceTokenizer instead of the StandardTokenizer as the default config) also cause phrases to never be expanded (since the quotes are considered part of the word).
I'm thinking the best solution to this will be a somewhat hacky workaround using Token Filters to remove the quotes, since the WhitespaceTokenizer seemed to work so well in just about every other case.
The text was updated successfully, but these errors were encountered:
I find myself running into more and more problems with the fact that we use a single analyzer for both the query (during synonym expansion) and the synonym file. It almost feels like we need to move beyond the synonym file format and create a special parser to capture the kinds of synonyms people are using:
血と骨, Blood and Bones # UTF-8, people expect no tokenization between characters
e-commerce, electronic commerce # people expect no tokenization on hyphens
and the kinds of queries they expect to "just work" in these scenarios:
# full phrase, people expect the quotes to be ignored when matched against synonym file
"canis familiaris"
# StandardTokenizer splits this into 2 tokens, but people expect the synonyms to work
e-commerce
# currently broken (due to the dot), because we're using a special
# whitespace-and-quotes tokenizer to favor proper parsing of the synonym file.
# Luckily most people don't type these characters when they search.
dog.
Unfortunately, the fixes I made for #9 and #26 (to use the WhitespaceTokenizer instead of the StandardTokenizer as the default config) also cause phrases to never be expanded (since the quotes are considered part of the word).
I'm thinking the best solution to this will be a somewhat hacky workaround using Token Filters to remove the quotes, since the WhitespaceTokenizer seemed to work so well in just about every other case.
The text was updated successfully, but these errors were encountered: