Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-30296: Large API subtree delete leads to serious Solr index inconsistencies #150

Merged
merged 4 commits into from
Oct 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
],
"require": {
"php": "~5.6|~7.0",
"ezsystems/ezpublish-kernel": "~6.7.10@dev|^6.13.6@dev|~7.3.5@dev|^7.4.3@dev",
"ezsystems/ezpublish-kernel": "~6.7.10@dev|^6.13.6@dev|~7.3.5@dev|~7.4.3@dev",
Copy link
Contributor

@andrerom andrerom Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this, so +1 once PR is green ;)

TODO: Drop this line on merge up.

"netgen/query-translator": "^1.0"
},
"require-dev": {
Expand Down
26 changes: 20 additions & 6 deletions lib/Handler.php
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@
*/
class Handler implements SearchHandlerInterface, Capable
{
/* Solr's maxBooleanClauses config value is 1024 */
const SOLR_BULK_REMOVE_LIMIT = 1000;
/* 16b max unsigned integer value due to Solr (JVM) limitations */
const SOLR_MAX_QUERY_LIMIT = 65535;
const DEFAULT_QUERY_LIMIT = 1000;

/**
* Content locator gateway.
*
Expand Down Expand Up @@ -328,7 +334,7 @@ public function commit($flush = false)
*/
protected function deleteAllItemsWithoutAdditionalLocation($locationId)
{
$query = $this->prepareQuery();
$query = $this->prepareQuery(self::SOLR_MAX_QUERY_LIMIT);
Copy link
Contributor

@andrerom andrerom Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I get why you wanted really high limit now..

I think unfortunately you'll have to change logic to iterate on this if you get back this amount of hits, or you might get total count which will tell you (roguthly*) how many iterations you need to make right?

* In the freak case of parallel async processes for some reason adding locations to this tree at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unfortunately you'll have to change logic to iterate on this if you get back this amount of hits, or you might get total count which will tell you how many iterations you need to make right?

It was my first idea, but @adamwojs was really against this approach since it might be expensive, performance wise, to perform two same queries, first to count and second to load locations. WDYT @andrerom?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that said, that can be follow also. You definitely improve the logic here as previous limit was 1000...

So ignore this for now unless you feel up for it in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AS said above, this is already way better then what is there today. So I'm fine with this as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrerom cool, let's stay with what we have now. I will talk to @adamwojs once we both have time and try to figure out some better way to improve it perhaps.

$query->filter = new Criterion\LogicalAnd(
[
$this->allItemsWithinLocation($locationId),
Expand All @@ -340,9 +346,15 @@ protected function deleteAllItemsWithoutAdditionalLocation($locationId)
$this->gateway->searchAllEndpoints($query)
);

$contentDocumentIds = [];

foreach ($searchResult->searchHits as $hit) {
$idPrefix = $this->mapper->generateContentDocumentId($hit->valueObject->id);
$this->gateway->deleteByQuery("_root_:{$idPrefix}*");
$contentDocumentIds[] = $this->mapper->generateContentDocumentId($hit->valueObject->id) . '*';
}

foreach (\array_chunk(\array_unique($contentDocumentIds), self::SOLR_BULK_REMOVE_LIMIT) as $ids) {
$query = '_root_:(' . implode(' OR ', $ids) . ')';
$this->gateway->deleteByQuery($query);
}
}

Expand All @@ -351,7 +363,7 @@ protected function deleteAllItemsWithoutAdditionalLocation($locationId)
*/
protected function updateAllElementsWithAdditionalLocation($locationId)
{
$query = $this->prepareQuery();
$query = $this->prepareQuery(self::SOLR_MAX_QUERY_LIMIT);
$query->filter = new Criterion\LogicalAnd(
[
$this->allItemsWithinLocation($locationId),
Expand Down Expand Up @@ -380,14 +392,16 @@ protected function updateAllElementsWithAdditionalLocation($locationId)
/**
* Prepare standard query for delete purpose.
*
* @param int $limit
*
* @return Query
*/
protected function prepareQuery()
protected function prepareQuery($limit = self::DEFAULT_QUERY_LIMIT)
{
return new Query(
[
'query' => new Criterion\MatchAll(),
'limit' => 1000,
'limit' => $limit,
'offset' => 0,
]
);
Expand Down