Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-28176: Support for parallel and selective indexing command #113

Merged
merged 1 commit into from
Nov 16, 2017

Conversation

andrerom
Copy link
Contributor

@andrerom andrerom commented Nov 2, 2017

issue: https://jira.ez.no/browse/EZP-28176
Depends on ezsystems/ezpublish-kernel#2139

As a Developer I want a indexing command which is:

  • faster at indexing (parallel indexing)
  • able to refresh parts of index (by date or content id)
  • able to omit purging so index is gradually refreshed on running prod setup
  • able to remove index items if no longer existing (when I provide content id's to index)

Todo:

  • Split out SolrCreateIndexCommand.php changes into own PR and improvement for 1.0 and up
  • Re start travis once kernel PR has been merged

@MarioBlazek
Copy link
Contributor

@andrerom did you maybe thought about adding option for refreshing index for subtree ?

@andrerom
Copy link
Contributor Author

andrerom commented Nov 3, 2017

@MarioBlazek No, but that's a good idea. Things like that is very simple to add here now (on kernel side in the command there, the command here is deprecated, so I don't intend to add anything more on it here, PR's welcome tough if someone wants to port things over), but I'll definitely add that one asap ;)

@MarioBlazek
Copy link
Contributor

@andrerom we've implemented that on current project. I can do PR on kernel.

@andrerom
Copy link
Contributor Author

andrerom commented Nov 3, 2017

@MarioBlazek If so then do the PR on top of ezsystems/ezpublish-kernel#2139, you'll only need a new getStatement* method and handle the console option same way as since.

EDIT: subtree already added there now, thanks for the suggestion.

@andrerom andrerom force-pushed the advance_indexing branch 2 times, most recently from 7b1a113 to 5d394fe Compare November 8, 2017 16:34
lib/Indexer.php Outdated
} catch (NotFoundException $e) {
// Ignore content objects that have some sort of missing data on it
$this->logWarning($progress, 'Content with id ' . $content->versionInfo->id . ' has missing data, so skipped for indexing. Full exception: ' . $e->getMessage());
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we eliminate continue statement code will be more readable IMHO.

try {
    $info = $contentHandler->loadContentInfo($contentId);
    if ($info->isPublished) {
        $documents[] = $this->searchHandler->generateDocument(
            $contentHandler->load($contentId, $info->currentVersionNo)
        );
    }
} catch(NotFoundException $e) {
    // Catch this so we delete the index for this content below 
    $this->searchHandler->deleteContent($contentId);
}

Copy link
Contributor Author

@andrerom andrerom Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are forgetting the non published code, it would mean:

try {
    $info = $contentHandler->loadContentInfo($contentId);
    if ($info->isPublished) {
        $documents[] = $this->searchHandler->generateDocument(
            $contentHandler->load($contentId, $info->currentVersionNo)
        );
    } else {
        $this->searchHandler->deleteContent($contentId);
    }
} catch(NotFoundException $e) {
    $this->searchHandler->deleteContent($contentId);
}

prefer this? (I'm agnostic to it tbh, but I can see this is probably slightly easier to read)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I prefer this version because it explicitly says when content is added/removed from the search index. But taking into account the forgotten case, we can stay as is 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alongosz Any preferences here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and constructor suggestion fixed in 8a27d12

Also fixed both in kernel PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current version looks ok

lib/Indexer.php Outdated

public function purge()
{
$this->checkSearchEngine();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check if searchHandler is supported in each call of purge and updateSearchIndex?

Copy link
Contributor Author

@andrerom andrerom Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative is that I overload ctor

EDIT:: Or that we change IncrementalIndexer to be interface so constructor is not shared, but then loosing the BC method in the process, unless we combine it with a temporary abstract for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overloading the constructor is OK for me, let's keep it simple. WDYT @alongosz?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overloading the constructor is OK for me, let's keep it simple.

+1, makes sense

@andrerom andrerom merged commit 8fe61c2 into master Nov 16, 2017
@andrerom andrerom deleted the advance_indexing branch November 16, 2017 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants