Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert database_reaper to celery (PP-1470) #2324

Merged
merged 11 commits into from
Mar 10, 2025
Merged

Conversation

jonathangreen
Copy link
Member

@jonathangreen jonathangreen commented Mar 6, 2025

Description

Convert all of the ReaperMonitor scripts to be Celery Tasks:

  • CredentialReaper
  • PatronRecordReaper
  • WorkReaper
  • CollectionReaper
  • MeasurementReaper
  • LoanReaper
  • HoldReaper
  • IdlingAnnotationReaper

Updates some of our database foreign key relationships so its easier to use a database level delete cascade to clean up old records in the database, rather then relying on the ORM to do the cascade.

This PR is based on these PR (they should get merged first):

Motivation and Context

Working through the epic to convert all of our scripts into tasks.

JIRA: PP-1470

How Has This Been Tested?

  • Running tests in CI

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

Copy link

codecov bot commented Mar 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.28%. Comparing base (bb90e20) to head (602a726).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2324      +/-   ##
==========================================
- Coverage   91.29%   91.28%   -0.02%     
==========================================
  Files         364      364              
  Lines       41402    41317      -85     
  Branches     8862     8870       +8     
==========================================
- Hits        37800    37715      -85     
  Misses       2340     2340              
  Partials     1262     1262              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

select(Hold)
.where(Hold.start < cutoff, or_(Hold.end == None, Hold.end < utc_now()))
.order_by(Hold.id)
.limit(batch_size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query has been simplified in two ways:

  • We aren't filtering out unlimited access and open access license pools, because you can't place holds on these items anyway.
  • We drop the concept of "source of truth" integrations.
    • The only source of truth integration we had was OPDS for distributors, which only supports unlimited access items, so you couldn't place a hold anyway.

So this query should be effectively equivalent, even though its dropped some of the conditions.

or_(
Loan.end < now,
and_(Loan.start < now - timedelta(days=90), Loan.end == None),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query drops the concept of "source of truth" integrations.

Previously the only source of truth integration we had was OPDS for distributors, which only supports unlimited access items, so these items are filtered out by the LicensePool.unlimited_access == False check, so no need to complicate the query by filtering on the collection protocol.

So this query should be effectively equivalent to what we had before.

deletion_query = delete(Credential).where(Credential.expires < cutoff)
with task.transaction() as session:
rows_removed = _execute_delete(session, deletion_query)
task.log.info(f"Deleted {pluralize(rows_removed, 'expired credential')}.")
Copy link
Member Author

@jonathangreen jonathangreen Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For as many of the reapers as I could, i updated them to use a direct SQL delete statement, rather then loading each record into the ORM, to just delete it. Should allow us to perform the deletes much faster.

This required setting ondelete="CASCADE" on the patron_id foreign keys, so that we are able to use delete statements to clean up patron records in the reaper, without needing the ORM to clean up the related records.

That is why this needs a DB migration.

ReaperMonitor.REGISTRY.append(MeasurementReaper)


class ScrubberMonitor(ReaperMonitor):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't bother migrating either of the srubber monitors, since we are not using the patron location feature. Its pretty NYPL specific. I'm going to instead just drop whats left of that code in a follow up PR to this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the follow up PR: #2326 (still in progress)

minute="40",
hour="2",
), # Once a day at 2:40 AM
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all used to run at 2am every day. Since they are all separate task now, I decided to space them out a little bit.

back_populates="patron",
cascade="delete",
uselist=True,
passive_deletes=True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See SQLAlchemy docs: https://docs.sqlalchemy.org/en/20/orm/cascades.html#passive-deletes

I updated these relationships to set the passive_deletes parameter since I added a DB level foreign key cascade.

from tests.fixtures.services import ServicesFixture


class TestLoanlikeReaperMonitor:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the bulk of these tests into test_reaper.py

@@ -1031,298 +1018,3 @@ def test_set_item(self, db: DatabaseTransactionFixture):
monitor = CustomListEntryWorkUpdateMonitor(db.session)
monitor.process_item(entry)
assert old_work == entry.work


class MockReaperMonitor(ReaperMonitor):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the bulk of these tests into test_reaper.py

@jonathangreen jonathangreen requested a review from a team March 7, 2025 01:51
@jonathangreen jonathangreen added feature New feature DB migration This PR contains a DB migration labels Mar 7, 2025
Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎸🤘🏽

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that most of these just moved over, so I did't really give this much review. If there are specific things you want me to look at, just let me know.

@jonathangreen jonathangreen force-pushed the chore/refactor-analytics branch from 8c582d1 to 20a6c64 Compare March 7, 2025 17:33
@jonathangreen jonathangreen force-pushed the feature/reapers-celery branch from defe008 to c7ba85b Compare March 7, 2025 17:34
Base automatically changed from chore/refactor-analytics to main March 10, 2025 19:38
@jonathangreen jonathangreen force-pushed the feature/reapers-celery branch from c7ba85b to 628de50 Compare March 10, 2025 19:40
@jonathangreen jonathangreen force-pushed the feature/reapers-celery branch from 628de50 to 602a726 Compare March 10, 2025 19:49
@jonathangreen jonathangreen merged commit a389d5a into main Mar 10, 2025
19 checks passed
@jonathangreen jonathangreen deleted the feature/reapers-celery branch March 10, 2025 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB migration This PR contains a DB migration feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants