Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly set analyzers #414

Merged
merged 1 commit into from
Dec 16, 2019
Merged

Explicitly set analyzers #414

merged 1 commit into from
Dec 16, 2019

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Dec 13, 2019

cherry-picked from #412 and based on #413. view diff

This is something I've been wanting to do for a while, it explicitly sets the analyzer and search_analyzer property for all text fields.

The default behaviour of elasticsearch is to default all analyzers to standard when not otherwise defined..
... and to default the search_analyzer property to equal analyzer.

So while it's not totally necessary to define an explicit search_analyzer when it's equal to the analyzer I have made this mandatory and covered it with tests, this ensures that it is considered when adding new fields or adapting existing ones.

This is hopefully a no-op refactor (basing the search_analyzer settings on what's in the defaults for pelias/api).
It will benefit any queries where the analyzer was not set on the query for whatever reason.

 field                            type                             analyzer                  search_analyzer           normalizer
< name.*                           text                             peliasIndexOneEdgeGram    peliasIndexOneEdgeGram    n/a
< phrase.*                         text                             peliasPhrase              peliasPhrase              n/a
---
> name.*                           text                             peliasIndexOneEdgeGram    peliasQuery               n/a
> phrase.*                         text                             peliasPhrase              peliasQuery               n/a

@missinglink missinglink force-pushed the explicitly_set_analyzers branch from eb480d5 to ca9e38f Compare December 13, 2019 15:57
Copy link
Member

@Joxit Joxit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM splitting #412 was a good idea 👍

@missinglink
Copy link
Member Author

missinglink commented Dec 16, 2019

I know there are some queries such as this where the analyzer is not being set.

It's hopefully not a big deal right now because that query doesn't target the name.* and phrase.* fields which require a different search_analyzer from the index-time analyzer.

In the future, I'd like to explore having a new search_analyzer for address_parts.street since right now it's doing a bunch of synonym substitutions at query-time which can likely be avoided.

@missinglink
Copy link
Member Author

missinglink commented Dec 16, 2019

Also worth noting the parent.{placetype}.ngram fields were previously using an ngram analysis at query-time which is just plain wrong.

@missinglink
Copy link
Member Author

This looks good to me.
It should provide an immediate performance benefit for queries targeting the admin ngrams fields as well as any queries which were failing to correctly use the peliasQuery analyzer.

@missinglink missinglink merged commit 1098353 into master Dec 16, 2019
@missinglink missinglink deleted the explicitly_set_analyzers branch December 16, 2019 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants