Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-31569: Implemented Repository filtering #54

Merged

Conversation

alongosz
Copy link
Member

@alongosz alongosz commented Apr 23, 2020

Question Answer
JIRA issue EZP-31569
Type feature
Target eZ Platform version v3.1
BC breaks no
Doc needed yes

Summary

This PR introduces Content Repository filtering API allowing a Developer to query Content Repository to retrieve either Content or Location list without involvement of a search engine.

API endpoints

Content Service

use eZ\Publish\API\Repository\Values\Content\ContentList;
use eZ\Publish\API\Repository\Values\Filter\Filter;

interface ContentService {

    public function find(Filter $filter, ?array $languages = null): ContentList;
}

LocationService

use eZ\Publish\API\Repository\Values\Filter\Filter;
use eZ\Publish\API\Repository\Values\Content\LocationList;

interface LocationService {

    public function find(Filter $filter, ?array $languages = null): LocationList;
}

Notice that I've kept $languages argument which injects context languages and is used by SiteAccessAware layer. Initially I planned to rely entirely on the Filter criteria, but the code required for this was too inconsistent and too complicated when compared with current SiteAccessAware implementation and test coverage.
$language behaves the same as for the other existing API endpoints - if null, then it should be chosen from a context of a layer. This allows to set an empty array [] to force no extra languages.

API consumption

Filter is a fluent Value Object defining its own setters using grammar which encourages chained use. It does not introduce its own builder to avoid complicating already complex solution.

use eZ\Publish\API\Repository\Values\Content\Query\Criterion\ParentLocationId;
use eZ\Publish\API\Repository\Values\Content\Query\Criterion\LanguageCode;
use eZ\Publish\API\Repository\Values\Filter\Filter;

$filter = (new Filter())
    ->withCriterion(new ParentLocationId($parentFolder->contentInfo->mainLocationId))
    ->andWithCriterion(new LanguageCode(['eng-GB']));

$contentList = $contentService->find($filter);
foreach ($contentList as $contentItem) {
    // ...
}

Full list of Filter fluent setters:

  • withCriterion - accepts a single Criterion, Criterion cannot be already set
  • andWithCriterion - appends a Criterion using LogicalAnd operation, if no Criterion was set prior this operation, it will simply set it instead (like withCriterion)
  • orWithCriterion - append a Criterion using LogicalOr operation, the same rules apply as for andWithCriterion
  • withSortClause - append a Sort Clause
  • sliceBy - set limit and offset for pagination
  • reset - remove all Criteria, Sort Clauses, and pagination settings for fluent reuse.

It's worth to point out that a Filter stores only a single Criterion. If multiple Criteria are needed they should be nested inside one of the logical operators - LogicalAnd, LogicalOr, or LogicalNot.

On the other hand a list of Sort Clauses is just an array.

Setting pagination offset and limit is not a requirement, though it's a recommended practice. Unlike its search counterpart , a Query, Filter does not impose default page limit, making results unlimited when not set. This is in line with database queries behavior, so should be less surprising than what search Query offers. Still, it's recommended to set limit and use pagination.

To avoid making the solution even bigger than it is right now, all the search Criteria are reused for filtering. They all reside in the \eZ\Publish\API\Repository\Values\Content\Query\Criterion namespace, so there seems nothing wrong with that approach.

The same goes for Sort Clauses - the ones from the \eZ\Publish\API\Repository\Values\Content\Query\SortClause namespace are reused.

Important: Not all Criteria, nor the Sort Clauses are supported with Repository Filtering. Some of them will appear later on, others, like FullText Criterion were never meant to work on a bare Repository without a content indexed for search.

To avoid confusion, a Criterion which can be used for Repository Filtering needs to implement \eZ\Publish\SPI\Repository\Values\Filter\FilteringCriterion interface. Similarly,
a Sort Clause needs to implement \eZ\Publish\SPI\Repository\Values\Filter\FilteringSortClause interface. To improve DX, with proper IDE used, it will be immediately apparent that a given fluent setter does not accept a given Criterion or Sort Clause not implementing those interfaces.

Supported Criteria

  • Ancestor
  • ContentId
  • ContentTypeGroupId
  • ContentTypeId
  • ContentTypeIdentifier
  • DateMetadata
  • LanguageCode
  • ObjectStateId
  • ObjectStateIdentifier
  • ParentLocationId
  • RemoteId
  • SectionId
  • SectionIdentifier
  • Sibling
  • Subtree
  • Visibility
  • UserEmail
  • UserId
  • UserLogin
  • UserMetadata
  • IsUserBased
  • IsUserEnabled
  • LocationId
  • LocationRemoteId
  • Location\Depth
  • Location\IsMainLocation
  • Location\Priority
  • LogicalAnd
  • LogicalNot
  • LogicalOr
  • MatchAll
  • MatchNone

The difference with search here is that all Location Criteria are actually supported regardless of filtering Content or Locations. I don't see a lot of technical reasons (maybe performance a bit, you'll see nested DISTINCT magic in case of Content gateway implementation), not to allow this. For instance, a Location\Depth query for Content items will return only those Content items which have any Location satisfying the given depth criteria. The same goes for Location\Priority. Even Location\IsMainLocation can be found useful because for non-main Location requirement it will match only Content items which have more than one (so at least one non-main) Location.

Supported Sort Clauses

Content
  • ContentId
  • ContentName
  • DateModified
  • DatePublished
  • SectionIdentifier
  • SectionName
Location
  • Depth
  • Id
  • Path
  • Priority
  • Visibility

Service Provider Interfaces

The architecture relies on query builders both for Criteria and Sort Clauses which follow Visitor pattern.

For a Criterion to support Repository Filtering, it must:

  1. Implement the \eZ\Publish\SPI\Repository\Values\Filter\FilteringCriterion interface
  2. Provide an implementation of \eZ\Publish\SPI\Repository\Values\Filter\CriterionQueryBuilder interface

For a Sort Clause to support Repository filtering, it must:

  1. Implement the \eZ\Publish\SPI\Repository\Values\Filter\FilteringSortClause interface
  2. Provide an implementation of \eZ\Publish\SPI\Repository\Values\Filter\SortClauseQueryBuilder interface.

Implementation details

Implementation aims to be a separate module, so expect its parts mostly in dedicated namespaces prefixed either as Filter or Filtering.

One of most notable things is probably an existence of \eZ\Publish\SPI\Persistence\Filter\Doctrine\FilteringQueryBuilder which extends Doctrine DBAL Query Builder. It just felt right with the current architecture to add dedicated responsibility to query builder rather than introducing separate service. Query Builder has a BC promise, because it is injected into Query Builder implementations of Criteria and Sort Clauses. While extending 3rd party implementation is often risky, I haven't found any indication that this component is either gonna change in the future or is intended to be internal.

The implementation relies on JOINs combined with WHERE constraints for filtering data as this usually proves to be more efficient than WHERE IN (subquery) approach. Both handlers and a mapper were written specifically for this feature as the data returned by gateways are a bit different. It was also an opportunity to use current stack domain language in data set. Still, when it was clean and possible, existing mappers were used to map data.

Changes to known patterns

  • \eZ\Publish\API\Repository\Values\Content\ContentList is a Value, but instead of controlling properties by the usual ValueObject magic to avoid overriding them, proper getter has been introduced.
  • its constructor is @internal so it's clear that
new ContentList()

is not supported outside of Kernel.

TODO

  • Implement API for LocationService
  • Implement Criteria handlers
  • Implement Sort Clause handlers
  • Provide pagination support
  • Provide caching support // not feasible ATM, needs to be a separate Spike
  • Implement needed Criteria and Sort Clauses (TBD) // re-using Content Query Criteria with proper markers

Follow-ups (tracked by the EZP-31711 Epic)

  1. Support for the following Criteria:
    • MapLocationDistance
    • IsFieldEmpty
    • Field (maybe, TBD)
    • FieldRelation
  2. Better pagination support - a Pagerfanta adapter
  3. Caching, if possible.

Conclusion

There are still things to improve, but this seems like a reasonable MVP, covering a lot of Criteria and Sort Clauses.

Any Reviewer who knows Repository a bit could see a lot of similarities between Legacy Search Engine and this implementation. So the natural question would be - why we didn't reuse more of internal structure of LSE? Or maybe even why we didn't simply expose LSE in a search settings-independent way? The reason for that was an attempt to make more robust implementation by relying more on JOINs than WHERE content.id IN (subquery) pattern. As a side-effect we've got rid of LSE limitation when it comes to use some Location Criteria when searching for Content.

QA

This is a PHP API that can be used to retrieve Content or Location item lists from Repository.
Example EE project which relies on volume testing data set can be found in that diff and more specifically with that sample Filtering query.

Checklist:

  • PR description is updated.
  • Tests are implemented.
  • Added code follows Coding Standards (use $ composer fix-cs).
  • PR is ready for a review.

@andrerom
Copy link
Contributor

Productboard spec suggested this method to be called filter, ...

filter was never a good name, point was to avoid that people think this is search (find is sometimes used for search, we for sure do in SearchService). But it's not a big deal, the object given is different by name at least.

ContentList .. mark it internal

Then maybe document find to also return Content[] ? Or is there some other way we can doc hint that ContentList is collection of Content items?

mentioned spec also suggested array $languageSettings as a 2nd argument. I'm not

That was probably just to stay close to how it is everywhere else, however putting it on Filter is fine. Maybe we should do the same on Query then.

To self: would a generic Filter (not Content and Location specific) be "implementable"?

It's better if it is specific, so we avoid the Content vs Location mess we have with Query. With no clear type hint on what goes where, only known on run time.

Which Criteria and Sort Clauses are needed for MVP?

Ideally everything, except:

  • Search features (Fulltext, CustomField, ...)
  • Criteria/SortClauses with performance issues on SQL making it impossible to support them with this interface as we can't just say "Use Solr" (Field)

Repository Query Builder" alternative: (pending)

Interesting

@bdunogier
Copy link
Member

bdunogier commented Apr 28, 2020

About the language parameters:

$filter->languageCodes = ['eng-GB']; // empty/not set for all (?)

Actually it should respect the siteaccess aware repository approach (in the SA aware implementation), and return items depending on the current SA's language settings. How did the repo behave before we added that again ?

What about query types ? Can we somehow re-use those ?

About criteria and sort clauses, this is my prioritized list:

  • criteria
    • parent location id
    • type identifier(s)
    • ...to be continued
  • sort clauses
    • name
    • priority
    • publication / modification date
    • ...to be continued

About ContentList vs Content[], I think we could still use a value object. It could at least provide data such as us there a next / previous page.
Also, what's the matter if the one in API is an interface ? It's not meant to be instanciated by our API consumers, is it ?

What about visibility ? I'd expect that by default, invisible items aren't returned on the frontend (and can be set to be returned), and are returned on the backend (and can be set to not be returned). How could we manage that ?

@andrerom
Copy link
Contributor

andrerom commented Apr 28, 2020

What about visibility ? I'd expect that by default, invisible items aren't returned on the frontend (and can be set to be returned), and are returned on the backend (and can be set to not be returned). How could we manage that ?

+20 but this should ideally be done Repo wide for all API's. Either as SA aware layer injects config for you on hidden or not for instance. Or we move this to permissions (with something als SALimitation("admin")) to get permission system to do this for us.

How did the repo behave before we added that again ?

Basically it works by setting value from config if languageCodes argument is null (not set by API consumer), so it can work with this approach as well.

@bdunogier
Copy link
Member

Either as SA aware layer injects config for you on hidden or not for instance.

By default, I assumed (I should not) that it follows the same architecture than the rest of the repository, with that extra siteaccess aware layer (transparent). Can you confirm @alongosz ?
If yes, would that layer modify the query to add the language to the filter ? It sounds a bit odd, but it would work afaict.

Basically it works by setting value from config if languageCodes argument is null (not set by API consumer), so it can work with this approach as well.

Good. Let's do this for that API endpoint as well. If no language is specified, it uses the current SA's languages parameters. OK @alongosz ?

@alongosz
Copy link
Member Author

Thanks @andrerom and @bdunogier for your comments.

filter was never a good name, point was to avoid that people think this is search (find is sometimes used for search, we for sure do in SearchService). But it's not a big deal, the object given is different by name at least.

This is exactly the approach now - $contentService->find should not be easily mistaken for SearchService, but I'm still open to other suggestions. We can always use fetch to make Legacy users happy ;)

ContentList .. mark it internal

Then maybe document find to also return Content[] ? Or is there some other way we can doc hint that ContentList is collection of Content items?

It's already done via:

   /**
     * @return \eZ\Publish\API\Repository\Values\Content\Content[]|\Traversable
     */
    public function getIterator(): Traversable;

mentioned spec also suggested array $languageSettings as a 2nd argument. I'm not

That was probably just to stay close to how it is everywhere else, however putting it on Filter is fine. Maybe we should do the same on Query then.

+1

To self: would a generic Filter (not Content and Location specific) be "implementable"?

It's better if it is specific, so we avoid the Content vs Location mess we have with Query. With no clear type hint on what goes where, only known on run time.

TBD, I might indeed do this to provide strict API

Which Criteria and Sort Clauses are needed for MVP?

Ideally everything, except:

  • Search features (Fulltext, CustomField, ...)
  • Criteria/SortClauses with performance issues on SQL making it impossible to support them with this interface as we can't just say "Use Solr" (Field)

Ok, here comes the list, with questions and remarks:

Criteria:

  • Logical\LogicalAnd, Logical\LogicalOr, Logical\LogicalNot // keywords in use, hence the redundancy in namespace\name.
  • Content\Id
  • Content\RemoteId
  • Content\Visibilty // for "hidden" feature
  • ContentType\GroupId
  • ContentType\Id
  • ContentType\Identfier
  • DateMetadata
  • Field\Relation // maybe, TBD
  • Field\IsFieldEmpty // maybe, don't remember if it requires external storage
  • LanguageCode
  • Location\Id
  • Location\Depth
  • Location\IsMainLocation
  • Location\ParentId
  • Location\ParentRemoteId // I really miss that in Search
  • Location\Priority
  • Location\RemoteId
  • Location\Sibling
  • Location\Subtree
  • Location\Visibility // in Search known just as Visibility, but I have the other idea for that one
  • MatchNone // (MatchAll doesn't make much sense ATM to me, because this is default behavior of any SQL engine)
  • ObjectState\Id
  • ObjectState\Identifier
  • Section\Id
  • Section\Identifier
  • User\Email
  • User\Id
  • User\Login
  • User\Metadata
  • User\IsUserBased
  • User\IsUserEnabled
  • Version\Status // new one to filter out published/drafts/archived
  • Version\No // new one to filter out Version number
  • Version\Id // new one, should be used for fetched data, not hardcoded, not sure if necessary though
  • Visibility // a Composite on (Content\Visibility or Location\Visibility) - do we need it?

SortClauses:

  • Content\Id // does it really bring some value, maybe it should be default for Content filtering if none given[1]?
  • Content\Name
  • Date\Modified
  • Date\Trashed // as a follow-up, might be useful for searching in Trash Story // cc @SylvainGuittard
  • Date\Published
  • Location
  • Location\Depth
  • Location\Id // does it really bring some value, maybe it should be default for Location filtering if none given[1]?
  • Location\IsMainLocation
  • Location\Path
  • Location\Priority
  • Location\Visibility // does it bring some value to have hidden/not hidden elements first? AFAICS it's not even used by LSE (SQL Search)
  • Section\Identifier
  • Section\Name

Note that consumption would be a bit different, more strict. This would be actually DX advantage, because usage with IDE should be more intuitive.

The list is huge, so we should focus on the most important ones and deliver the remaining ones as a follow-up.

[1] IMHO we should have some fallback ORDER BY because otherwise order is SQL engine-dependent and sometimes non-deterministic.

Either as SA aware layer injects config for you on hidden or not for instance.

By default, I assumed (I should not) that it follows the same architecture than the rest of the repository, with that extra siteaccess aware layer (transparent). Can you confirm @alongosz ?

SA layer is enabled by default now, so some language always will be defined.

If yes, would that layer modify the query to add the language to the filter ? It sounds a bit odd, but it would work afaict.

It should set languages only if not set (empty($filter->languageCodes) === true).

Basically it works by setting value from config if languageCodes argument is null (not set by API consumer), so it can work with this approach as well.

Good. Let's do this for that API endpoint as well. If no language is specified, it uses the current SA's languages parameters.

+0.75 - I would like to avoid distinction between [] and null as it's confusing even for our Engineers. Let's stick with [], hence empty above.

What about query types ? Can we somehow re-use those ?

They're strictly typed for Search API.

Also, what's the matter if the one in API is an interface ? It's not meant to be instantiated by our API consumers, is it ?

That's exactly the point of having interface on API for strict referencing in 3rd party code w/o possibility to instantiate.

What about visibility ? I'd expect that by default, invisible items aren't returned on the frontend (and can be set to be returned), and are returned on the backend (and can be set to not be returned). How could we manage that ?

There's no way to distinguish front-end from backend. I've proposed 3 separate Criteria

  • Content\Visibility
  • Location\Visibility
  • Visibility // Content OR Location

The same goes for Version status, though we can probably assume that if someone asks for Content\*, but does not specify Version\No and/or Status, published one is asumed.
Due to the nature of SQL each Criterion will have atomic responsibility. We can manipulate query when pre-processing it to attach some default Criteria if not given.

@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch 2 times, most recently from f7a33b2 to 7d7c06d Compare April 28, 2020 18:28
@bdunogier
Copy link
Member

bdunogier commented Apr 29, 2020

(visibility) There's no way to distinguish front-end from backend. I've proposed 3 separate Criteria

Yes there is: the siteaccess. In legacy, we had a setting for showing invisible objects, enabled for backoffice, and disabled for front. It would be a perfect task for the SA aware repository layer (add the criterion if it wasn't added by the user, as a default.

What about query types ? Can we somehow re-use those ?

They're strictly typed for Search API.

Yes, I know that :)
The question isn't really technical, to be honest. The fact is that we added query types to simplify the Search API usage. We are now adding another API endpoint, very similar to search (criteria, sort clauses...), but without the simplification layer.

It is even more important now that we have added built-in types. I really insist that we try to make that consistent, even though I don't know how right now. In the end, 90% of what a query type returns can be used for other find operations (filter + sort + pagination).

In any case, we can NOT require users to add the visibility criteria to each and every call.

One other thing: the query field should definitely be changed at some point to use that API instead of Search... (and it should also be exposed over REST, as an alternative to POST /views.

@alongosz
Copy link
Member Author

alongosz commented Apr 29, 2020

Internal sync:

  • Due to huge amount of Criteria for now we've decided to try and re-use existing Criteria namespace.
  • The Criteria that are supported by Search only would just throw and exception.
  • We'll gonna add marker interfaces for Content/Location-filtering-capable Criteria.
  • Criterion handling will be delegated to a Content/Location filtering specific engine (implementations).
  • We'll need to update extensibility documentation for custom Criteria.
  • The same goes for SortClauses.
  • For now the first implementation will return published only items, with an option expand further to implement Version\* Criteria.
  • Pagination: solution should be flexible enough to support different ways of pagination (offset & limit, cursor-based).
  • As a follow-up: Query types support (re-using)
  • As a follow-up: a QueryService which would allow to detect which API (search, Content filtering, Location filtering) can be used for the given set of Criteria
  • Filtering DX change:
use eZ\Publish\API\Repository\Values\Filter\Filter;

$filter = (new Filter())
    ->withCriteria(new LogicalAnd(new...))
    ->withSortClauses(new SortClause(...));

@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch from 7d7c06d to c891fc5 Compare April 29, 2020 19:00
@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch from c891fc5 to f0ca777 Compare April 30, 2020 15:45
@alongosz alongosz changed the title [WiP][Spec] EZP-31569: Introduced first portion of spec for Content and Location filtering [WiP] EZP-31569: Provided Content/Location filtering preview Apr 30, 2020
src/lib/API/Repository/Value/Filter/Filter.php Outdated Show resolved Hide resolved
src/lib/API/Repository/Value/Filter/Filter.php Outdated Show resolved Hide resolved
src/lib/API/Repository/Value/Filter/Filter.php Outdated Show resolved Hide resolved
tests/lib/unit/API/Repository/Value/Filter/FilterTest.php Outdated Show resolved Hide resolved
tests/lib/unit/API/Repository/Value/Filter/FilterTest.php Outdated Show resolved Hide resolved
tests/lib/unit/API/Repository/Value/Filter/FilterTest.php Outdated Show resolved Hide resolved
tests/lib/unit/API/Repository/Value/Filter/FilterTest.php Outdated Show resolved Hide resolved
tests/lib/unit/API/Repository/Value/Filter/FilterTest.php Outdated Show resolved Hide resolved
@bdunogier
Copy link
Member

Could you try to find some time to update the description so that it matches the last decisions ? It would really make reviews and feedback easier.

@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch from f0ca777 to 2012e57 Compare May 26, 2020 16:17
@alongosz
Copy link
Member Author

@bdunogier should be up to date now with the current state after rebasing, but I already see handling of languages needs to be changed to follow known pattern used in SiteAccess-aware services layer. Moreover Filter vs. FilterBuilder is still unresolved.
Reverted Ibexa namespace changes as the related PR was declined.

@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch from 2012e57 to 9271d06 Compare June 5, 2020 18:02
@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch 7 times, most recently from eb7e753 to 51b0d4a Compare June 22, 2020 11:19
@alongosz
Copy link
Member Author

Internal sync: removed createFindFilter from LocationService and ContentService. new Filter is the only way now.

Copy link
Member

@adamwojs adamwojs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for API and SPI changes (besides things mentioned on Slack). I have some suggestions about implementation but we can apply this as follow up PR 😉

@alongosz
Copy link
Member Author

Internal sync: Moved Service Tags constants to dedicated ServiceTags class (30f125b).

@alongosz alongosz force-pushed the ezp-31569-content-location-filtering branch from 30f125b to cac3e77 Compare June 29, 2020 16:16
@alongosz
Copy link
Member Author

Rebased to resolve conflicts after another merge and squashed fixups.

@alongosz
Copy link
Member Author

Merging, QA will be done on the RC tag.

@alongosz alongosz merged commit 9f59de2 into ezsystems:master Jun 29, 2020
@alongosz alongosz deleted the ezp-31569-content-location-filtering branch June 29, 2020 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

5 participants