Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve getCustomAttribute() performance #13308

Merged
merged 12 commits into from
Feb 22, 2018
Merged

Improve getCustomAttribute() performance #13308

merged 12 commits into from
Feb 22, 2018

Conversation

schmengler
Copy link
Contributor

@schmengler schmengler commented Jan 22, 2018

Description

When traversing a product collection and using getCustomAttribute() within the loop, it triggered a database query each time to retrieve the metadata from the eav_attributes table. This was due to the custom attributes being cached in each product model separately.

I moved it into the resource model so that only one query per entity type is necessary.

Manual testing scenarios

  1. On an instance with sample data or more than a handful of products

  2. Profile this code snippet:

     $productCollection = $this->productCollectionFactory->create();
     $productCollection->addAttributeToSelect('*');
     foreach ($productCollection as $product) { /** @var ProductInterface $product */
         $product->getCustomAttributes();
     }
    

More background and a test tool can be found in this blog by my colleague @avstudnitz who found the issue: https://www.integer-net.com/performance-trap-magento-2-getcustomattribute-function/

After the changes in this PR, database calls should been reduced by (n-1) with n = number of products.

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds on Travis CI are green)

@schmengler
Copy link
Contributor Author

The same mechanism was used for categories, I updated the PR to reflect the changes there analogously

Without getName(), __toString() is triggered, which creates a string representation of the whole class and apparently also can lead to errors
@AVoskoboinikov
Copy link
Contributor

Hi @schmengler
I've cloned your PR to my local branch so I am able to use our internal infrastructure to test performance improvement. Will be back with results.

@AVoskoboinikov
Copy link
Contributor

AVoskoboinikov commented Jan 26, 2018

references for internal teams:
L1 bamboo build
L2 bamboo build
L3 bamboo build
L4 bamboo build
PAT bamboo build
Sample Data bamboo build
SVC bamboo build

@AVoskoboinikov
Copy link
Contributor

I confirm that PR has improvements in next scenarios:

  • Server Side EE API (Get Products - response time) - 41.5%

  • Server Side EE API (Get Products - created objects count) - 69.5%

  • Server Side EE API (Get Products - select requests count) - 23.6%

  • Server Side EE Catalog GraphQL (Query multiple products - response time) - 50%

  • Server Side EE Catalog GraphQL (Query multiple products - created objects count) - 79.7%

  • Server Side EE Catalog GraphQL (Query configurable product - response time) - 4%

  • Server Side EE Catalog GraphQL (Query configurable product - created objects count) - 19.9%

  • Server Side EE Catalog GraphQL (Query products with full text search and filter - response time) - 4%

  • Server Side EE Catalog GraphQL (Query products with full text search and filter - created objects count) - 60.7%

  • Server Side EE Catalog GraphQL (Query products with full text search and filter - select requests count) - 14.9%

  • Server Side EE Catalog GraphQL (Query products with full text search only - response time) - 42%

  • Server Side EE Catalog GraphQL (Query products with full text search only - created objects count) - 76.3%

Copy link
Contributor

@maghamed maghamed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of BiC being introduced in the scope of performance improvement.
It looks better to introduce dedicated service for getCustomAttributesCodes logic.
And move all the logic there.
This service could be injected into Product model which will proxy call into it.
No need to bring this logic to Product/Category resource models which contain a lot of logic now (violating Single Responsibility), breaking their contract along with that.
The more granular single responsible services we introduce in M2 codebase - the easier it would be maintain this code in future

const KEY_PRODUCT_COUNT = 'product_count';
const KEY_CHILDREN_DATA = 'children_data';
/**#@-*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constants removal from class and moving them to interface is backward incompatible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? TheClass::THE_CONSTANT still works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmengler
Possible fatal error if there is another implementation of the interface with same constant name declaration.
But this is more like the theoretical issue.

Currently, I raised this question for voting on the Internal Architectural Council to allow moving constants preserving names and values from Implementation to Interface being implemented. And thus fix our Backward Compatibility Guideline http://devdocs.magento.com/guides/v2.2/contributor-guide/backward-compatible-development/index.html saying

Removing or renaming constants
Do not remove or rename constants.

And to adapt internal tests which break on such changes.

I will get back to you shortly as soon as get results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maghamed It wouldn't even be a problem: https://3v4l.org/H4mYd 😄

/**
* @var CategoryAttributeRepositoryInterface
*/
protected $metadataService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new properties added to classes should be private, protected properties are now allowed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I'll change it

* @param MetadataServiceInterface $metadataService
* @return string[]
*/
protected function getEavAttributesCodes(MetadataServiceInterface $metadataService)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheritance based API not encouraged in Magento 2

Copy link
Contributor Author

@schmengler schmengler Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but I have to work with what is there (which is inheritance based resource models) or introduce huge changes, probably BIC. Do you have an idea for a simple solution?

Copy link
Contributor Author

@schmengler schmengler Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's basically moved from \Magento\Framework\Model\AbstractExtensibleModel::getEavAttributesCodes (inheritance based model API) - so it's not that I am introducing something new here.

A possible non-inheritance solution I see is to to implement the method in the Category and Product resource models seperately in a non-generic way. Then, if we want to get rid of duplicated logic, add a method getCustomAttributeCodes($dataObjectClassName) to \Magento\Framework\Api\MetadataServiceInterface (BIC: new method in interface). IMHO this would be subject to a different PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disregard this, I've read your top comment

/**
* @var ProductAttributeRepositoryInterface
*/
protected $metadataService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing of protected properties/methods are not encouraged in M2

@schmengler
Copy link
Contributor Author

It looks better to introduce dedicated service for getCustomAttributesCodes logic.
And move all the logic there.
This service could be injected into Product model which will proxy call into it.
No need to bring this logic to Product/Category resource models which contain a lot of logic now (violating Single Responsibility), breaking their contract along with that.

I'm going to think this through

@schmengler
Copy link
Contributor Author

@maghamed Here's my draft for such a service. I'd like to discuss it before updating the PR:

<?php
/**
 * Copyright © Magento, Inc. All rights reserved.
 * See COPYING.txt for license details.
 */

namespace Magento\Eav\Model\Entity;


use Magento\Framework\Api\MetadataServiceInterface;

class GetCustomAttributeCodes
{
    /**
     * @var string[]
     */
    private $customAttributesCodes;

    /**
     * Receive a list of custom EAV attributes using provided metadata service. The results are cached per entity type
     *
     * @param MetadataServiceInterface $metadataService Custom attribute metadata service to be used
     * @param string[] $interfaceAttributes Attribute codes that are part of the interface and should not be
     *                                      considered custom
     * @param string|null $entityType Entity type (class name), only needed if metadata service handles different
     *                                entities
     * @return string[]
     */
    public function execute(
        MetadataServiceInterface $metadataService,
        array $interfaceAttributes,
        string $entityType = null
    ): array {
        $cacheKey = get_class($metadataService) . '|' . $entityType;
        if (!isset($this->customAttributesCodes[$cacheKey])) {
            $customAttributesCodes = $this->getEavAttributesCodes($metadataService, $entityType);
            $this->customAttributesCodes[$cacheKey] = array_values(
                array_diff($customAttributesCodes, $interfaceAttributes)
            );
        }
        return $this->customAttributesCodes;
    }

    /**
     * Receive a list of EAV attributes using provided metadata service.
     *
     * @param MetadataServiceInterface $metadataService
     * @param string|null $entityType
     * @return string[]
     */
    private function getEavAttributesCodes(MetadataServiceInterface $metadataService, string $entityType = null)
    {
        $attributeCodes = [];
        $customAttributesMetadata = $metadataService->getCustomAttributesMetadata($entityType);
        if (is_array($customAttributesMetadata)) {
            /** @var $attribute \Magento\Framework\Api\MetadataObjectInterface */
            foreach ($customAttributesMetadata as $attribute) {
                $attributeCodes[] = $attribute->getAttributeCode();
            }
        }
        return $attributeCodes;
    }
}

This could be injected into the Product and Category models as you suggested and their getCustomAttributeCodes() would be replaced with

        return $this->getCustomAttributesCodes->execute(
            $this->metadataService,
            ProductInterface::ATTRIBUTES
        );

What I don't like about it is that the caller (the product/category model) is responsible to pass the non-custom interface attributes. This is something that the custom attribute metadata service should be able to provide, don't you think? Again, adding a method to \Magento\Framework\Api\MetadataServiceInterface would be a major BIC, which I have avoided so far.

What do you think?

Also just to be sure: what's the consensus on state in services? Because caching the results is absolutely necessary.

array_diff($this->customAttributesCodes, CategoryInterface::ATTRIBUTES)
);
}
return $this->customAttributesCodes;
Copy link

@jameshalsall jameshalsall Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of products we can improve this even further for larger datasets, as it stands here it will work great for a small number of attributes (< 500) but as soon as we start to go upward of that figure the performance will fall through the floor (each attribute loaded results in Magento\Eav\Model\ResourceModel\Entity\Attribute::_afterLoad being called which in turn runs a select query to join additional attribute data).

In my current project we have ~2200 custom attributes, but all of our attribute sets are small (< 100 custom attributes in each one), the problem is that this method loads every attribute regardless of the product's attribute set, to fix it I pulled in this PR as a patch and then changed this method to the following:

    public function getCustomAttributesCodes($attributeSetId)
    {
        if (!isset($this->customAttributesCodes[$attributeSetId])) {
            $codes = [];
            $criteria = $this->searchCriteriaBuilder->addFilter('attribute_set_id', $attributeSetId, 'eq');
            $attributes = $this->attributeRepository->getList($criteria->create())->getItems();

            if (is_array($attributes)) {
                /** @var $attribute \Magento\Framework\Api\MetadataObjectInterface */
                foreach ($attributes as $attribute) {
                    $codes[] = $attribute->getAttributeCode();
                }
            }

            $codes = array_values(
                array_diff($codes, ProductInterface::ATTRIBUTES)
            );

            $this->customAttributesCodes[$attributeSetId] = $codes;
        }

        return $this->customAttributesCodes[$attributeSetId];
    }

then had the Magento\Catalog\Model\Product pass through the product's attribute set ID, this way we only ever load the attributes for the current attribute set

I think the memoization based on attribute set ID in the $this->customAttributesCodes[$attributeSetId] is not required because resource model instances are not shared across products, but I might be wrong so I've left it in there for safety

whilst this would result in a few more queries (1 for each different attribute set to get the attributes relevant for that product) it means that creating a product is not affected by the total number of attributes in Magento, only by the number of attributes in the product's attribute set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because resource model instances are not shared across products,

They are, this is what my initial approach is based on. Thanks a lot for your addition, it makes sense and I'd add it to the PR!

But first waiting for @maghamed to discuss the final architecture

@maghamed
Copy link
Contributor

maghamed commented Feb 14, 2018

Hey @schmengler ,

Your GetCustomAttributeCodes service looks good to me.
Not sure that we need to provide third parameter string $entityType = null
Actually, its presence in MetadataInterface also looks excessive as Default implementation not use it either.

/**
 * Default implementation of metadata service, which does not return any real attributes.
 */
class DefaultMetadataService implements MetadataServiceInterface
{
    /**
     * {@inheritdoc}
     */
    public function getCustomAttributesMetadata($dataObjectClassName = null)
    {
        return [];
    }
}

What I don't like about it is that the caller (the product/category model) is responsible to pass the non-custom interface attributes. This is something that the custom attribute metadata service should be able to provide, don't you think?

I agree with you that it's the responsibility of Metadata Service to know what attributes belong to Interface and what attributes are Custom ones. But as you mentioned we can't change existing interface because of BC requirements.

What we can do is to introduce interface for your service like this

interface GetCustomAttributeCodesInterface
{
    /**
      * Only one parameter - Metadata Service. No need to pass interface Attributes
      **/
    public function execute(MetadataServiceInterface $metadataService): array;
}

along with that you provide the implementation

class GetCustomAttributeCodes implements GetCustomAttributeCodesInterface
{
    /**
     * Receive a list of custom EAV attributes using provided metadata service. The results are cached per entity type
     *
     * @param MetadataServiceInterface $metadataService Custom attribute metadata service to be used
     *                                entities
     * @return string[]
     */
    public function execute(
        MetadataServiceInterface $metadataService
    ): array {
        $cacheKey = get_class($metadataService) ;
        if (!isset($this->customAttributesCodes[$cacheKey])) {
            $this->customAttributesCodes[$cacheKey]  = $this->getEavAttributesCodes($metadataService);
        }
        return $this->customAttributesCodes[$cacheKey];
    }

    /**
     * Receive a list of EAV attributes using provided metadata service.
     *
     * @param MetadataServiceInterface $metadataService
     * @return string[]
     */
    private function getEavAttributesCodes(MetadataServiceInterface $metadataService)
    {
        $attributeCodes = [];
        $customAttributesMetadata = $metadataService->getCustomAttributesMetadata();
        if (is_array($customAttributesMetadata)) {
            /** @var $attribute \Magento\Framework\Api\MetadataObjectInterface */
            foreach ($customAttributesMetadata as $attribute) {
                $attributeCodes[] = $attribute->getAttributeCode();
            }
        }
        return $attributeCodes;
    }
}

And introduce new Composite services for Product and Category.

class GetProductCustomAttributeCodes implements GetCustomAttributeCodesInterface
{

    /**
     * @param GetCustomAttributeCodes $baseCustomAttributeCodes
     */
    public function __construct(
        GetCustomAttributeCodes $baseCustomAttributeCodes
    ) {
        $this->baseCustomAttributeCodes = $baseCustomAttributeCodes;
    }


    public function execute(
        MetadataServiceInterface $metadataService
    ): array {
        $customAttributesCodes = $this->baseCustomAttributeCodes->execute($metadataService);
        return array_diff($customAttributesCodes, ProductInterface::ATTRIBUTES);
    }
}

class GetCategoryCustomAttributeCodes implements GetCustomAttributeCodesInterface
{

    /**
     * @param GetCustomAttributeCodes $baseCustomAttributeCodes
     */
    public function __construct(
        GetCustomAttributeCodes $baseCustomAttributeCodes
    ) {
        $this->baseCustomAttributeCodes = $baseCustomAttributeCodes;
    }


    public function execute(
        MetadataServiceInterface $metadataService
    ): array {
        $customAttributesCodes = $this->baseCustomAttributeCodes->execute($metadataService);
        return array_diff($customAttributesCodes, CategoryInterface::ATTRIBUTES);
    }
}

Now you make both Product and Category model depend on GetCustomAttributeCodesInterface interface.

And with DI Type you change the incoming object.

    <type name="Magento\Catalog\Model\Product">
        <arguments>
            <argument name="getCustomAttributeCode" xsi:type="object">GetProductCustomAttributeCodes</argument>
        </arguments>
    </type>

Do the same for Category.

So, summarizing all the above.

  • we don't make backward incompatible changes
  • we introduce a new interface for service which returns CustomAttributeCodes which uses Metadata. On the level of the interface we don't provide awareness from where we take the info about entity attributes, i.e. Lately we can use Metadata service for it.
  • Each entity provides own implementation of this service.

Do proposed solution make sense for you?

@schmengler
Copy link
Contributor Author

@maghamed Thank you for the detailed response, your suggestion sounds good!

Regarding the type argument:

Actually, its presence in MetadataInterface also looks excessive as Default implementation not use it either.

It's used in the implementation for address attributes, although on second look there is a default value and other values are never passed. I'll leave it out then.

@schmengler
Copy link
Contributor Author

@maghamed About the addition by @jameshalsall : I could add an optional attributeSetargument to the execute method but we would need to pass it along to Magento\Framework\Api\MetadataServiceInterface::getCustomAttributesMetadata() (i.e. add an optional argument to the interface method and its implementations). But I'm not sure if it's always a good idea and maybe too much for this PR. It could be done in a second one, and here I'd already add the optional parameter in the new interface to be prepared.

@jameshalsall
Copy link

If you make it easy for me to apply that fix later (by allowing the parameter in the interface method) then I can send a separate PR after this one is merged 👍 thanks

@magento-engcom-team
Copy link
Contributor

Hi @maghamed, thank you for the review.
MAGETWO-88021 has been created to process this Pull Request

* @var ResourceModel\Category
*/
protected $_resource;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is not necessary and not relevant for this PR anymore, it helps with static analysis and IDE support by specifying the concrete type of $_resource

{
return parent::_getResource();
}

Copy link
Contributor Author

@schmengler schmengler Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as with $_resource. It's not super important but a minor improvement that I made while I was at it. Should it be a separate PR?

@magento-engcom-team magento-engcom-team merged commit f286a98 into magento:2.3-develop Feb 22, 2018
->willReturn([$customAttributeCode]);
$this->category->setData($interfaceAttributeCode, "sub");
$nameAttributeCode = 'name';
$descriptionAttributeCode = 'description';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you rename the variables? For the test it's not important that it's name and description, but that one is an interface attribute and the other is custom

->willReturn([$customAttributeCode]);
$this->model->setData($interfaceAttributeCode, 10);
$priceCode = 'price';
$colorAttributeCode = 'color';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as with "name" and "description" above. you probably reverted both to the previous version


//The color attribute is not set, expect empty custom attribute array
$this->assertEquals([], $this->model->getCustomAttributes());

//Set the color attribute;
$initialCustomAttribueValue = "red";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variables for "initial value" and "new value" wer introduced to make connection between fixture and verification explicit and clear

@schmengler
Copy link
Contributor Author

@jameshalsall ooops, too late, sorry!

@mslabko
Copy link
Member

mslabko commented May 3, 2018

Hi @jameshalsall ,
Do you plan to create PR with optimization of loading attributes?

@jameshalsall
Copy link

jameshalsall commented May 3, 2018 via email

@mslabko
Copy link
Member

mslabko commented May 8, 2018

Hi @jameshalsall ,
We are going to start regression cycle for 2.3.0 and this performance improvement is very important for us because it affects a lot of test cases.
Will you have time to finish this work until 14/05 or we can do it from our side?

@jameshalsall
Copy link

@mslabko I will take a look at it now and see how much effort is involved

@jameshalsall
Copy link

I could add an optional attributeSetargument to the execute method but we would need to pass it along to Magento\Framework\Api\MetadataServiceInterface::getCustomAttributesMetadata() (i.e. add an optional argument to the interface method and its implementations). But I'm not sure if it's always a good idea and maybe too much for this PR. It could be done in a second one, and here I'd already add the optional parameter in the new interface to be prepared.

so it seems the option $attributeSetId parameter was not added in this PR, so I will assume it is safe to add it in the new PR - I don't think this is a problem because 2.3 is not yet released.

@mslabko
Copy link
Member

mslabko commented May 10, 2018

@jameshalsall , yes, for 2.3 it's not a problem to add second argument for GetCustomAttributeCodesInterface::execute.
But it must be required argument due to we do not need to load all attributes per entity, only for specified attribute set.

Also, I recommend do not use MetadataServiceInterface for retrieving attributes, due to this is not optimal from a performance perspective. Instead of MetadataServiceInterface it's better to use \Magento\Eav\Model\Config::getEntityAttributes, which already has a caching mechanism - permanent and application cache.
On 2.2-develop similar approach give us from 56-95% impoovements for all "product-scenarios" on medium profile

So I suggetst refactor GetCustomAttributeCodesInterface as

—  execute(MetadataServiceInterface $metadataService)
++ execute(int $attributeSetId)

@jameshalsall
Copy link

oops, I already added - #15135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants