Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive injection with an example #2

Closed
wants to merge 2 commits into from

Conversation

ivanprado
Copy link
Contributor

Having recursive dependency injection can help to create smaller PageObject components that extracts only parts of a page, and then define compounded PageObjects with several of them. See example books_06.

The example of books_06 presents boilerplate. A lot of yield from self. and response repeated and repeated.

@kmike It would be nice if you can have a look and give your opinion about all that.

I was not able to check if cyclic dependency detection is working
because I failed to create an example. The following fails with the error NameError: name 'B' is not defined in method get_type_hints:

@attr.s(auto_attribs=True)
class A(WebPage):
    b: 'B'

@attr.s(auto_attribs=True)
class B(WebPage):
    a: A

I was not able to check if cyclic dependency detection is working
because I failed to create an example
@codecov
Copy link

codecov bot commented Feb 6, 2020

Codecov Report

Merging #2 into master will decrease coverage by 4.86%.
The diff coverage is 14.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #2      +/-   ##
==========================================
- Coverage   63.91%   59.04%   -4.87%     
==========================================
  Files           6        6              
  Lines          97      105       +8     
==========================================
  Hits           62       62              
- Misses         35       43       +8
Impacted Files Coverage Δ
scrapy_po/middleware.py 30.76% <14.28%> (-7.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0776e7...fd191a4. Read the comment docs.

@kmike
Copy link
Member

kmike commented Feb 6, 2020

I wonder if a problem with get_type_hints can be solved using https://docs.python.org/3/whatsnew/3.7.html#pep-563-postponed-evaluation-of-annotations, though I kind-of expected it to work as-is.

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive injection makes total sense to me, I think that'd be a good feature to have. It is listed as "imitation of the implementation" in https://gist.github.com/kmike/2fa2b1b8c77bfb98f72b2636e66c5d36#dependency-injection, having a very similar application in mind.

from scrapy_po import WebPage


class ListingsPiece(WebPage):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas about a better API? Inheriting "page part" component from a WebPage looks a bit strange, on a first sight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes is strange. Also, all the "pieces" are incorporating the selector API without sharing it, which I don't like very much. Well, I have to think about it. This is very preliminary to me, I just noticed that I needed the recursive injection and started to experiment with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, all the "pieces" are incorporating the selector API without sharing it, which I don't like very much.

Sorry, could you please explain it more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal of inheriting is to be able to say "please do dependency injection for this class", it could make sense to have a base class for this (e.g. Component, or something like this - this is becoming Java pretty fast), and have PageObject inherit from Component.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, maybe I know what you mean. In the code I was showing you on a call (which is not commited), selector-specific logic is moved to a mixin:

class PageObject(abc.ABC):
    """
    All web page objects should inherit from this class.
    It is a marker for a middleware to pick anargument and populate it.

    Instead of inheriting you can also use ``PageObject.register(MyWebPage)``.
    """
    pass


class ItemPage(PageObject):
    """ Page Object which requires to_item method to be implemented. """
    @abc.abstractmethod
    def to_item(self):
        pass


class ResponseShortcutsMixin:
    """
    A mixin with common shortcut methods for working with self.response.
    It requires "response" attribute to be present.
    """
    _cached_selector = None

    @property
    def selector(self) -> parsel.Selector:
        from parsel.selector import Selector
        if self._cached_selector is None:
            self._cached_selector = Selector(self.html)
        return self._cached_selector

    @property
    def url(self):
        return self.response.url

    @property
    def html(self):
        return self.response.html

    def xpath(self, query, **kwargs):
        return self.selector.xpath(query, **kwargs)

    def css(self, query):
        return self.selector.css(query)


@attr.s(auto_attribs=True)
class WebPage(PageObject, ResponseShortcutsMixin):
    """
    Default WebPage class. It uses basic response data (html, url),
    and provides xpath / css shortcuts.

    This class is most useful as a base class for page objects, when
    extraction logic is similar to what's usually happening in scrapy
    spider callbacks.
    """
    response: ResponseData

Let me asend a PR with these changes.

Copy link
Member

@kmike kmike Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3 - could you please check it & merge if it makes things easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kmike ! I left some comments in the PR itself that I think covers yoursquestions.

@ivanprado
Copy link
Contributor Author

Thank @kmike for all your feedback. A quick sight to the get_type_hints problems reveals that the globals that get_type_hints use at the constructor method are limited to the class itself so module globals are not included (and they should). Probably andi should be changed to also include the module in globals. Could you give me rights in andi to propose PR requests?

@ivanprado
Copy link
Contributor Author

It seems that the problem with the type is related to an issue in attrs: python-attrs/attrs#593. I'll create a xfail test in andi for it.

@ivanprado
Copy link
Contributor Author

This has been implemented in #2 in a different way, probably better. And the problem with get_type_hints was addressed in andi. So closing.

@ivanprado ivanprado closed this Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants