-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring down infra cost #71
Comments
Asked on Slack whether the dashboard work will provide us with total amount of jobs performed. I believe this is the only metric that requires us to have infinite retention on the |
The dashboard work will replace that 👍 So we can turn on a bucket retention policy after these metrics land. In order to implement some quick cost reductions, I propose we perform a manual purge:
Wdyt @bajtos @patrickwoodhead? |
As suggested by @bajtos: Before deleting all measurements, store them in cold storage. Try compressing using https://facebook.github.io/zstd/. |
|
The 30d retention will delete all data older than April 13th if enabled today. Your export will contain only data older than April 1st. We will lose measurements recorded between April 1st and 13th. Is that okay? Did I miss something? |
Of course 😅 ok I will include more data in the export. Updated the task list to go up to May 1st (to be sure) |
Script used for the export, currently running: https://gist.github.com/juliangruber/cd50f1227d08e8b94d6b4b36620b4711 |
This export finished:
It suggests there were 32m jobs completed, while on the website we show 161m. I will repeat this export to see if it is deterministic. |
Next run:
It's in the same ballpark, but not exact. Since no new events are being added to the old timeframe, this export mechanism is flawed. Let's check if we can do something else |
Next run:
This time I used async iteration instead of the |
I assume we can improve our chances by performing many queries, maybe one for each day. I will try this now |
The oldest row it can find is from 2022-11-05. We landed the telemetry commit on Oct 31st (CheckerNetwork/desktop@6d135e6). I don't know what this means. |
|
The script ran until I'm going to continue the script tomorrow with that date as the new starting point, and will merge the result with the previous export. |
The 1TB disk instance ran out of space. It's currently on |
The script currently is at |
Up to |
I will now evaluate deleting these old rows, more work needs to be done before we can turn on a retention policy |
I have deleted all rows from the |
from |
I have paused the script as even with a 1s window it was bringing down the Influx cluster |
We are waiting to hear back from the Influx support team, which has taken on this case |
For now, I'm exporting all measurements from web3.storage, to get a job count without needing InfluxDB |
Still exporting Voyager, currently at 3.7TB size |
InfluxDB support told us that up to |
I've enabled a 90 days retention policy on the "station" bucket. This matches what we have for "peer-checker", "spark-evaluate" and "spark-publish". We can reduce it to 30 days if it's still too expensive. |
FWIW, the |
Data Graphana + JSON integration doesn't work any more (on PL Grafana), so this has to be reimplemented once the "Station" dashboard has been moved to the Space Meridian Grafana. Hereby, this job is complete finally |
Tasks
Upload data export to w3sspark-stats
site-backend#14The text was updated successfully, but these errors were encountered: