synonyms.disablePhraseQueries is ignored if using WhitespaceTokenizer #34

nolanlawson · 2013-11-01T21:59:16Z

Unfortunately, the fixes I made for #9 and #26 (to use the WhitespaceTokenizer instead of the StandardTokenizer as the default config) also cause phrases to never be expanded (since the quotes are considered part of the word).

I'm thinking the best solution to this will be a somewhat hacky workaround using Token Filters to remove the quotes, since the WhitespaceTokenizer seemed to work so well in just about every other case.

nolanlawson · 2013-11-01T23:26:01Z

Some closing comments on this issue:

I find myself running into more and more problems with the fact that we use a single analyzer for both the query (during synonym expansion) and the synonym file. It almost feels like we need to move beyond the synonym file format and create a special parser to capture the kinds of synonyms people are using:

血と骨, Blood and Bones          # UTF-8, people expect no tokenization between characters
e-commerce, electronic commerce # people expect no tokenization on hyphens

and the kinds of queries they expect to "just work" in these scenarios:

# full phrase, people expect the quotes to be ignored when matched against synonym file
"canis familiaris"

# StandardTokenizer splits this into 2 tokens, but people expect the synonyms to work
e-commerce

# currently broken (due to the dot), because we're using a special
# whitespace-and-quotes tokenizer to favor proper parsing of the synonym file.
# Luckily most people don't type these characters when they search.
dog.

nolanlawson added a commit that referenced this issue Nov 1, 2013

add broken unit test to demonstrate issue #34

35e403c

nolanlawson closed this as completed in 8a397ec Nov 1, 2013

nolanlawson added a commit that referenced this issue Nov 1, 2013

add more unit tests for #34

11a44c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synonyms.disablePhraseQueries is ignored if using WhitespaceTokenizer #34

synonyms.disablePhraseQueries is ignored if using WhitespaceTokenizer #34

nolanlawson commented Nov 1, 2013

nolanlawson commented Nov 1, 2013

synonyms.disablePhraseQueries is ignored if using WhitespaceTokenizer #34

synonyms.disablePhraseQueries is ignored if using WhitespaceTokenizer #34

Comments

nolanlawson commented Nov 1, 2013

nolanlawson commented Nov 1, 2013