-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert database_reaper to celery (PP-1470) #2324
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2324 +/- ##
==========================================
- Coverage 91.29% 91.28% -0.02%
==========================================
Files 364 364
Lines 41402 41317 -85
Branches 8862 8870 +8
==========================================
- Hits 37800 37715 -85
Misses 2340 2340
Partials 1262 1262 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
select(Hold) | ||
.where(Hold.start < cutoff, or_(Hold.end == None, Hold.end < utc_now())) | ||
.order_by(Hold.id) | ||
.limit(batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query has been simplified in two ways:
- We aren't filtering out unlimited access and open access license pools, because you can't place holds on these items anyway.
- We drop the concept of "source of truth" integrations.
- The only source of truth integration we had was OPDS for distributors, which only supports unlimited access items, so you couldn't place a hold anyway.
So this query should be effectively equivalent, even though its dropped some of the conditions.
or_( | ||
Loan.end < now, | ||
and_(Loan.start < now - timedelta(days=90), Loan.end == None), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query drops the concept of "source of truth" integrations.
Previously the only source of truth integration we had was OPDS for distributors, which only supports unlimited access items, so these items are filtered out by the LicensePool.unlimited_access == False
check, so no need to complicate the query by filtering on the collection protocol.
So this query should be effectively equivalent to what we had before.
deletion_query = delete(Credential).where(Credential.expires < cutoff) | ||
with task.transaction() as session: | ||
rows_removed = _execute_delete(session, deletion_query) | ||
task.log.info(f"Deleted {pluralize(rows_removed, 'expired credential')}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For as many of the reapers as I could, i updated them to use a direct SQL delete statement, rather then loading each record into the ORM, to just delete it. Should allow us to perform the deletes much faster.
This required setting ondelete="CASCADE"
on the patron_id
foreign keys, so that we are able to use delete
statements to clean up patron records in the reaper, without needing the ORM to clean up the related records.
That is why this needs a DB migration.
ReaperMonitor.REGISTRY.append(MeasurementReaper) | ||
|
||
|
||
class ScrubberMonitor(ReaperMonitor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't bother migrating either of the srubber monitors, since we are not using the patron location feature. Its pretty NYPL specific. I'm going to instead just drop whats left of that code in a follow up PR to this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the follow up PR: #2326 (still in progress)
minute="40", | ||
hour="2", | ||
), # Once a day at 2:40 AM | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all used to run at 2am every day. Since they are all separate task now, I decided to space them out a little bit.
back_populates="patron", | ||
cascade="delete", | ||
uselist=True, | ||
passive_deletes=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See SQLAlchemy docs: https://docs.sqlalchemy.org/en/20/orm/cascades.html#passive-deletes
I updated these relationships to set the passive_deletes
parameter since I added a DB level foreign key cascade.
from tests.fixtures.services import ServicesFixture | ||
|
||
|
||
class TestLoanlikeReaperMonitor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the bulk of these tests into test_reaper.py
@@ -1031,298 +1018,3 @@ def test_set_item(self, db: DatabaseTransactionFixture): | |||
monitor = CustomListEntryWorkUpdateMonitor(db.session) | |||
monitor.process_item(entry) | |||
assert old_work == entry.work | |||
|
|||
|
|||
class MockReaperMonitor(ReaperMonitor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the bulk of these tests into test_reaper.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎸🤘🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that most of these just moved over, so I did't really give this much review. If there are specific things you want me to look at, just let me know.
8c582d1
to
20a6c64
Compare
defe008
to
c7ba85b
Compare
c7ba85b
to
628de50
Compare
628de50
to
602a726
Compare
Description
Convert all of the
ReaperMonitor
scripts to be Celery Tasks:CredentialReaper
PatronRecordReaper
WorkReaper
CollectionReaper
MeasurementReaper
LoanReaper
HoldReaper
IdlingAnnotationReaper
Updates some of our database foreign key relationships so its easier to use a database level delete cascade to clean up old records in the database, rather then relying on the ORM to do the cascade.
This PR is based on these PR (they should get merged first):Motivation and Context
Working through the epic to convert all of our scripts into tasks.
JIRA: PP-1470
How Has This Been Tested?
Checklist