-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: a script to backfill commitments table #184
Conversation
Signed-off-by: Miroslav Bajtoš <[email protected]>
GROUP BY published_as | ||
ORDER BY published_at | ||
ON CONFLICT DO NOTHING | ||
RETURNING cid -- I _think_ this helps to get correct row count reported by PG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not run this query, there may be syntax errors.
WDYT, should I spend a bit of extra time to clone a subset of live measurements to my local PG database and run the query on that DB first? Otherwise, I'll go YOLO and debug this query while executing it against the live data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can YOLO, if it goes wrong we can just delete all commitments and start over, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can YOLO, if it goes wrong we can just delete all commitments and start over, right?
Yes, we can 👍🏻
// the part "ON CONFLICT DO NOTHING" will tell PG to silently ignore the INSERT command. | ||
const { rows, rowCount } = await client.query(` | ||
INSERT INTO commitments (cid, published_at) | ||
SELECT published_as as cid, MAX(finished_at) + INTERVAL '3 minutes' as published_at FROM measurements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this 3 minutes offset will lead to lots of overlap between 1h windows, would it be easier to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the overlaps; it gives me higher confidence that we don't accidentally miss any measurements.
I would like to keep the estimated published_at
value close to the real values we report for new commitments.
We are reading the current time after the smart contract call completes, see here:
https://github.com/filecoin-station/spark-api/blob/ca10df39450cf50ff68be871869b763287bc0939/spark-publish/index.js#L81-L83
I can change that line to use started
instead, which should bring us closer to MAX(finished_at)
. It still would not be perfect because if there is a large backlog of unpublished measurements, we can commit a batch of measurements much later than they were recorded (finished).
I can also rework that line to compute MAX(finished_at)
from the committed measurements. I think that's the most accurate solution, WDYT?
Also, maybe I am sweating this too much, and this few-minute difference does not really matter. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah I don't mind at all, if you think this doesn't create any bad effects 👍
Next steps:
|
The query stopped working because we had a long gap where no measurements were recorded.
After bumping up the interval to 12 hours, the script runs again
|
Signed-off-by: Miroslav Bajtoš <[email protected]>
Could we make this part of |
Another hiccup for 2023-12-07, I solved it by temporarily bumping up the interval to 24 hours.
The oldest measurement in our DB is from |
39145
this killed our DB server 😢 |
Running
vs
The counts are roughly the same, I think the difference is caused by the 3 minutes we are adding to I am going to assume the migration was successful and will delete historical data older than 1 week later today. |
Sounds good! |
The db is at 84% storage capacity, so we have enough time |
FWIW, running DELETE commands is not enough to release the disk space back to the OS. https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY
|
Deleted all measurements |
A useful command to know: list info about tables & indices, including their size.
Things occupying more than 1GB:
|
I opened a PR to implement the cleanup; see #186 Closing this PR un-merged as the work here is finished. |
I will keep this PR as a draft and then close it unmerged after completing the migration manually.
This is a follow-up for #182