-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive injection with an example #2
Conversation
I was not able to check if cyclic dependency detection is working because I failed to create an example
Codecov Report
@@ Coverage Diff @@
## master #2 +/- ##
==========================================
- Coverage 63.91% 59.04% -4.87%
==========================================
Files 6 6
Lines 97 105 +8
==========================================
Hits 62 62
- Misses 35 43 +8
Continue to review full report at Codecov.
|
I wonder if a problem with get_type_hints can be solved using https://docs.python.org/3/whatsnew/3.7.html#pep-563-postponed-evaluation-of-annotations, though I kind-of expected it to work as-is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursive injection makes total sense to me, I think that'd be a good feature to have. It is listed as "imitation of the implementation" in https://gist.github.com/kmike/2fa2b1b8c77bfb98f72b2636e66c5d36#dependency-injection, having a very similar application in mind.
from scrapy_po import WebPage | ||
|
||
|
||
class ListingsPiece(WebPage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas about a better API? Inheriting "page part" component from a WebPage looks a bit strange, on a first sight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes is strange. Also, all the "pieces" are incorporating the selector API without sharing it, which I don't like very much. Well, I have to think about it. This is very preliminary to me, I just noticed that I needed the recursive injection and started to experiment with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, all the "pieces" are incorporating the selector API without sharing it, which I don't like very much.
Sorry, could you please explain it more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal of inheriting is to be able to say "please do dependency injection for this class", it could make sense to have a base class for this (e.g. Component, or something like this - this is becoming Java pretty fast), and have PageObject inherit from Component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, maybe I know what you mean. In the code I was showing you on a call (which is not commited), selector-specific logic is moved to a mixin:
class PageObject(abc.ABC):
"""
All web page objects should inherit from this class.
It is a marker for a middleware to pick anargument and populate it.
Instead of inheriting you can also use ``PageObject.register(MyWebPage)``.
"""
pass
class ItemPage(PageObject):
""" Page Object which requires to_item method to be implemented. """
@abc.abstractmethod
def to_item(self):
pass
class ResponseShortcutsMixin:
"""
A mixin with common shortcut methods for working with self.response.
It requires "response" attribute to be present.
"""
_cached_selector = None
@property
def selector(self) -> parsel.Selector:
from parsel.selector import Selector
if self._cached_selector is None:
self._cached_selector = Selector(self.html)
return self._cached_selector
@property
def url(self):
return self.response.url
@property
def html(self):
return self.response.html
def xpath(self, query, **kwargs):
return self.selector.xpath(query, **kwargs)
def css(self, query):
return self.selector.css(query)
@attr.s(auto_attribs=True)
class WebPage(PageObject, ResponseShortcutsMixin):
"""
Default WebPage class. It uses basic response data (html, url),
and provides xpath / css shortcuts.
This class is most useful as a base class for page objects, when
extraction logic is similar to what's usually happening in scrapy
spider callbacks.
"""
response: ResponseData
Let me asend a PR with these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3 - could you please check it & merge if it makes things easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kmike ! I left some comments in the PR itself that I think covers yoursquestions.
Thank @kmike for all your feedback. A quick sight to the |
It seems that the problem with the type is related to an issue in attrs: python-attrs/attrs#593. I'll create a |
This has been implemented in #2 in a different way, probably better. And the problem with |
Having recursive dependency injection can help to create smaller PageObject components that extracts only parts of a page, and then define compounded PageObjects with several of them. See example
books_06
.The example of
books_06
presents boilerplate. A lot ofyield from self.
andresponse
repeated and repeated.@kmike It would be nice if you can have a look and give your opinion about all that.
I was not able to check if cyclic dependency detection is working
because I failed to create an example. The following fails with the error
NameError: name 'B' is not defined
in methodget_type_hints
: