Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When calling the _analyze API, it should be a 4xx when dictionary files are missing #121443

Closed
benwtrent opened this issue Jan 31, 2025 · 1 comment · Fixed by #121568
Closed
Assignees
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@benwtrent
Copy link
Member

Elasticsearch Version

any

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

When calling the _analyze API, its possible to explore using it to see how things work with various dictionary files. However ,if a dictionary file is completely missing, this API throws a 5xx. For all analyzers that depend on user provided dictionaries, we should return a 4xx not a 5xx.

Example: when using hunspell it should be a 4xx if the dictionary is missing as that needs to be provided by the admin of the cluster.

Steps to Reproduce

GET /_analyze
{
  "tokenizer": "standard",
  "filter": {"type": "hunspell", "locale": "en_US"},
  "text": "the knife foxes jumping quickly over knives"
}

Logs (if relevant)

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_state_exception",
        "reason": "failed to load hunspell dictionary for locale: en_US"
      }
    ],
    "type": "illegal_state_exception",
    "reason": "failed to load hunspell dictionary for locale: en_US",
    "caused_by": {
      "type": "exception",
      "reason": "Could not find hunspell dictionary [en_US]"
    }
  },
  "status": 500
}

Full error:

org.elasticsearch.transport.RemoteTransportException: [es-es-search-5b7c555d59-xtvz7][100.66.132.194:9300][indices:admin/analyze[s]]\nCaused by: java.lang.IllegalStateException: failed to load hunspell dictionary for locale: en_US\n\tat [email protected]/org.elasticsearch.indices.analysis.HunspellService.lambda$new$0(HunspellService.java:102)\n\tat java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1713)\n\tat [email protected]/org.elasticsearch.indices.analysis.HunspellService.getDictionary(HunspellService.java:119)\n\tat [email protected]/org.elasticsearch.index.analysis.HunspellTokenFilterFactory.<init>(HunspellTokenFilterFactory.java:34)\n\tat [email protected]/org.elasticsearch.indices.analysis.AnalysisModule.lambda$setupTokenFilters$0(AnalysisModule.java:163)\n\tat [email protected]/org.elasticsearch.plugins.AnalysisPlugin$1.get(AnalysisPlugin.java:128)\n\tat [email protected]/org.elasticsearch.index.analysis.AnalysisRegistry.getComponentFactory(AnalysisRegistry.java:132)\n\tat [email protected]/org.elasticsearch.index.analysis.AnalysisRegistry.buildCustomAnalyzer(AnalysisRegistry.java:267)\n\tat [email protected]/org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.buildCustomAnalyzer(TransportAnalyzeAction.java:218)\n\tat [email protected]/org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:143)\n\tat [email protected]/org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:129)\n\tat [email protected]/org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:66)\n\tat [email protected]/org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:113)\n\tat [email protected]/org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)\n\tat [email protected]/org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)\n\tat [email protected]/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1575)\nCaused by: org.elasticsearch.ElasticsearchException: Could not find hunspell dictionary [en_US]\n\tat [email protected]/org.elasticsearch.indices.analysis.HunspellService.loadDictionary(HunspellService.java:168)\n\tat [email protected]/org.elasticsearch.indices.analysis.HunspellService.lambda$new$0(HunspellService.java:100)\n\t... 20 more\n
@benwtrent benwtrent added :Search Relevance/Analysis How text is split into tokens >bug labels Jan 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jan 31, 2025
@mayya-sharipova mayya-sharipova self-assigned this Jan 31, 2025
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Feb 3, 2025
If a custom analyzer provided in _analyze API
can not be built, return 400 instead of the current
500. This most probably means that the user's provided
analyzer specificatons are wrong.

Closes elastic#121443
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Feb 5, 2025
If a custom analyzer provided in _analyze API can not be built, return
400 instead of the current 500. This most probably means that the user's
provided analyzer specifications are wrong.

Closes elastic#121443
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Feb 5, 2025
If a custom analyzer provided in _analyze API can not be built, return
400 instead of the current 500. This most probably means that the user's
provided analyzer specifications are wrong.

Closes elastic#121443
elasticsearchmachine pushed a commit that referenced this issue Feb 5, 2025
If a custom analyzer provided in _analyze API can not be built, return
400 instead of the current 500. This most probably means that the user's
provided analyzer specifications are wrong.

Closes #121443
elasticsearchmachine pushed a commit that referenced this issue Feb 5, 2025
If a custom analyzer provided in _analyze API can not be built, return
400 instead of the current 500. This most probably means that the user's
provided analyzer specifications are wrong.

Closes #121443
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants