Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-30282: Impl. indexing of related objects for the full text search #135

Merged
merged 1 commit into from
Mar 28, 2019

Conversation

adamwojs
Copy link
Member

@adamwojs adamwojs commented Mar 20, 2019

JIRA: https://jira.ez.no/browse/EZP-30282

Description

Impl. indexation of related objects for the full text search.

Indexing related content

Text of related content is stored in the meta_related_content_X__text_t index fields where X is relation level e.g. meta_related_content_1__text_t, meta_related_content_2__text_t

Boost factor for related content

As an User, I expect the following order of search results for query "Content A": Content A, Content B when content repository contains Content A and Content B with embedded Content A.

This is implemented by adding boost factor based on the following formula:

image

to the index fields with related content. Ref.

private function getBoostFactorForRelatedContent(int $depth): float
{
return 1.0 / pow(2.0, $depth);
}

Indexing depth configuration

As a Developer, you are able to define the maximum indexing depth using the following YAML configuration:

ez_search_engine_solr:
    # ...
    connections:
        default:
            # ...
            indexing_depth:
                default: 1          # Default value: 0 - no relation indexing, 1 - direct relations, 2nd level  relations, 3rd level  relations (maximum value).
                content_type:
                    article: 2      # Index depth defined for specyfic content type

Indexing of the related content is disabled by default, however, this might change in eZ Platform 3.X.

Checklist:

  • Coding standards ($ composer fix-cs)
  • Impl. feature / bugifx
  • Impl. tests
  • Ready for Code Review

@adamwojs adamwojs self-assigned this Mar 20, 2019
@adamwojs adamwojs requested a review from kmadejski March 20, 2019 12:52
@andrerom
Copy link
Contributor

andrerom commented Mar 20, 2019

New comment after looking at the updated diff 😄

Looks promising 👍
Would it make sense to only allow depth of 0-3? Also it seems we might need some recursion protection here (content1 -> content2 -> content1 or more advanced circular relations, mainly skipping content1 being indexes several times in the index)

@andrerom
Copy link
Contributor

andrerom commented Mar 20, 2019

Somewhat side here:

I'm wondering if this is maybe overlapping with the index boosting issue we have for fields that is being discussed with @kmadejski and @pspanja.

As in to add context on that; From my understand we either need to A. start to index all fields separately for fulltext in order to be able to boost them separately, or B. we need to consider using Solr Payloads in order to be able to boost terms.

If A, then this stuff could have been indexed as part of the relevant relation/relationList/page field full text index. But that said I realize it does not match needs for embeds on RichText which then will need either separate field or Payload as well... :)


So in the end, I guess I'm 👍 on the approach here for relations, just wanted to share in case others here look at it differently.

/**
* Return index field type for the given $contentType.
*
* @param \eZ\Publish\SPI\Persistence\Content\Type $contentType
* @param $depth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be some copy-paste mistake 😉

@alongosz alongosz changed the title EZP-30282: Impl. indexation of related objects for the full text search EZP-30282: Impl. indexing of related objects for the full text search Mar 22, 2019
Copy link
Member

@alongosz alongosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a rocket science to me. Formula for this is not documented anywhere in Solr.
I'm fine with new inventions, but some Search integration test would be useful to see what it does (especially that you yourself have put implementing tests in TODOs ;) )

But the "request changes" is for something else:

Copy link
Member

@alongosz alongosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, after internal discussion, this is fine 👍

@micszo micszo self-assigned this Mar 28, 2019
@ezrobot

This comment has been minimized.

Copy link
Member

@micszo micszo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested successfully depth 1,2,3 with single, shared and multi core setup.
Works fine. 👍

@lserwatka lserwatka merged commit 08680b2 into master Mar 28, 2019
@lserwatka lserwatka deleted the ezp_30282 branch March 28, 2019 15:23
@micszo micszo removed their assignment Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants