Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-26807: First implementation for SolrCloud support #86

Closed
wants to merge 9 commits into from

Conversation

pspanja
Copy link
Contributor

@pspanja pspanja commented Dec 22, 2016

https://jira.ez.no/browse/EZP-26807

This targets Solr search engine v2.0

This implements support for SolrCloud. Support for using Solr in standalone mode is here removed, meaning that from now on Solr search engine will have to be used with Solr backend running in cloud mode.

implicit routing

The approach taken here is using implicit document router to control where the documents are indexed. This is achieved by specifying special field in the indexed document that holds the identifier of the shard it's to be indexed in. In order for that to work the collection must to be created with these parameters:

  • router.name=implicit
  • router.field=<field name>

Unfortunately implicit routing is not supported by the bundled Solr start script, which can create a collection with compositeId router only, meaning that setting up the cloud for local development will be little bit more involved. Solr initialization script for Travis that is provided here can be used as an example. For more information see:

Previously, running Solr in standalone/multicore mode enabled using a separate schema per core. Usually that would mean a dedicated language analysis would be configured per core. With Solr running in the cloud mode, a collection with all it's shards must have the same configuration. That means we will need to handle multiple (per language) full text fields in the same schema. That is yet to be implemented on this PR. For more details on the available options see Semantic & Multilingual Strategies in Lucene/Solr. For possible future support for dynamic analyzers see SOLR-6492.

compositeId routing

With compositeId routing, exact destination shard for a document is not strictly controlled. With it, we provide the shard key and Solr takes care of choosing the exact shard by itself. For us, that means it would not be possible to direct a document in a specific language to a shard dedicated for that language.

While this might not fit the multilingual setup, it would be desirable for single language setup. The benefit of compositeId routing is that shards can be split, which is not possible when using implicit routing. While theoretical document limit per shard is of no concern for us (more that 2 billion documents), being able to split shards is still practical, as Solr node will work best if it has enough memory to cache it's data.

Implementing support for compositeId routing is left for future improvement.

TODOs

@pspanja pspanja force-pushed the solr-cloud-support-2.0 branch from 7b5d46c to 755396a Compare December 23, 2016 11:30
@pspanja pspanja changed the title [WIP] First implementation for SolrCloud support EZP-26807: First implementation for SolrCloud support Dec 23, 2016
@pspanja
Copy link
Contributor Author

pspanja commented Dec 23, 2016

Now added an issue and some description. Any feedback welcome.


download() {
case ${SOLR_VERSION} in
4.10.4|6.3.0 )
6.3.0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase needed to add 6.4.1

@adamwojs
Copy link
Member

adamwojs commented May 8, 2019

Closed in favor #137

@adamwojs adamwojs closed this May 8, 2019
@andrerom andrerom deleted the solr-cloud-support-2.0 branch May 8, 2019 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants