Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close DB session before exiting method #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dalepotter
Copy link

Prevents abandoned database connections when running process_item.

Tested manually by running scrapy crawl quotes twice. Only 50 rows present in the quote table after each run.

Prevents abandoned database connections when running `process_item`.
@Pa7rickStar
Copy link

@dalepotter you are right - I totally agree!

Since this repository seems not to be maintained anymore, I won't crate a pull request for this:
There are only 50/100 quotes present because of the duplefilter. Whenever an author page is requested again, the request will be filtered by scrapy and the item will not be complete. A quick fix would be to deactivate the duplefilter at the yield line in the parse function like this:
yield response.follow(url=author_url, callback=self.parse_author, meta={'quote_item': quote_item}, dont_filter=True)
Then of cause many sites will be parsed twice. A better solution would be to change the pipeline.

Another issue: In the models.py is backref is used in the Tag and the Quote class. According to the documentation backref should be used only in one of them. However, backref considered to legacy in SQLAlchemy 2.0. Instead [relationship.back_populates](https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#sqlalchemy.orm.relationship.params.back_populates) should be used like this:

class Tag(Base):
    __tablename__ = "tag"

    id = Column(Integer, primary_key=True)
    name = Column('name', String(30), unique=True)
    quotes = relationship('Quote', secondary='quote_tag', back_populates="tags")
    
class Quote(Base):
    __tablename__ = "quote"

    id = Column(Integer, primary_key=True)
    quote_content = Column('quote_content', Text())
    author_id = Column(Integer, ForeignKey('author.id'))  # Many quotes to one author
    tags = relationship('Tag', secondary='quote_tag', back_populates="quotes")  # M-to-M for quote and tag

At the same token, lazy=dynamic is a legacy parameter and in this context not doing anything important anyways.

Lastly, I think the declarative mapping has changed as well. I'm using SQLAlchemy for the first time today - so I'm not quite sure...

@harrywang
Copy link
Owner

Thanks guys - just no time to maintain this.

@Pa7rickStar
Copy link

no problem! The tutorial still was a good entry point for me 👍

@harrywang
Copy link
Owner

no problem! The tutorial still was a good entry point for me 👍

I am glad you find it useful :) cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants