Close DB session before exiting method #2

dalepotter · 2022-09-07T20:42:13Z

Prevents abandoned database connections when running process_item.

Tested manually by running scrapy crawl quotes twice. Only 50 rows present in the quote table after each run.

Prevents abandoned database connections when running `process_item`.

Pa7rickStar · 2023-06-02T20:13:00Z

@dalepotter you are right - I totally agree!

Since this repository seems not to be maintained anymore, I won't crate a pull request for this:
There are only 50/100 quotes present because of the duplefilter. Whenever an author page is requested again, the request will be filtered by scrapy and the item will not be complete. A quick fix would be to deactivate the duplefilter at the yield line in the parse function like this:
yield response.follow(url=author_url, callback=self.parse_author, meta={'quote_item': quote_item}, dont_filter=True)
Then of cause many sites will be parsed twice. A better solution would be to change the pipeline.

Another issue: In the models.py is backref is used in the Tag and the Quote class. According to the documentation backref should be used only in one of them. However, backref considered to legacy in SQLAlchemy 2.0. Instead [relationship.back_populates](https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#sqlalchemy.orm.relationship.params.back_populates) should be used like this:

class Tag(Base):
    __tablename__ = "tag"

    id = Column(Integer, primary_key=True)
    name = Column('name', String(30), unique=True)
    quotes = relationship('Quote', secondary='quote_tag', back_populates="tags")
    
class Quote(Base):
    __tablename__ = "quote"

    id = Column(Integer, primary_key=True)
    quote_content = Column('quote_content', Text())
    author_id = Column(Integer, ForeignKey('author.id'))  # Many quotes to one author
    tags = relationship('Tag', secondary='quote_tag', back_populates="quotes")  # M-to-M for quote and tag

At the same token, lazy=dynamic is a legacy parameter and in this context not doing anything important anyways.

Lastly, I think the declarative mapping has changed as well. I'm using SQLAlchemy for the first time today - so I'm not quite sure...

harrywang · 2023-06-02T20:17:49Z

Thanks guys - just no time to maintain this.

Pa7rickStar · 2023-06-02T20:41:46Z

no problem! The tutorial still was a good entry point for me 👍

harrywang · 2023-06-02T20:45:24Z

no problem! The tutorial still was a good entry point for me 👍

I am glad you find it useful :) cheers.

Close DB session before exiting method

4c4b5a3

Prevents abandoned database connections when running `process_item`.

Pa7rickStar mentioned this pull request Jun 2, 2023

Why only crawled 50 items? #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close DB session before exiting method #2

Close DB session before exiting method #2

dalepotter commented Sep 7, 2022

Pa7rickStar commented Jun 2, 2023

harrywang commented Jun 2, 2023

Pa7rickStar commented Jun 2, 2023

harrywang commented Jun 2, 2023

Close DB session before exiting method #2

Are you sure you want to change the base?

Close DB session before exiting method #2

Conversation

dalepotter commented Sep 7, 2022

Pa7rickStar commented Jun 2, 2023

harrywang commented Jun 2, 2023

Pa7rickStar commented Jun 2, 2023

harrywang commented Jun 2, 2023