-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaxMind GeoLite2 infinite download attempts - problem still exists with 4.0.2 #2114
Comments
Unfortunately I'm gonna need more details if we want to get to the bottom of this. If your answer to Perhaps a docker-compose file including a configuration as close as possible to how you run Shlink, would be helpful. If you could share logs showcasing those download attempts, that would also be great. The keywords to look for are |
Right, no problem. I didn't really think I could be the only one who still had this problem. Thanks for replying, I'll do my best to find all the necessary information. The section of my docker compose file looks like this:
Looking at the logs, I notice that for weeks before April 16th, there was no attempt to update the database file. I can see that I had similar errors before March 15th, though the wording was different then ("Finished downloading GeoLite2 db file" instead of "Finished updating GeoLite2 db file"). However, these messages stop on March 15th, when I updated to 4.0.2, and then only started again on April 16th. Perhaps not a coincidence that there's close to one month in between? Suddenly on the 16th an update takes place, perhaps triggered by the GET request to As you can see, this happens again shortly afterwards, and continues from there. Log file snippet from April 16th
Now, I used some manual greps to find out how often this happened each day. > grep -e '2024-04-16.*Finished updating GeoLite2 db file' 20240429-shlink.log | wc -l
10
> grep -e '2024-04-17.*Finished updating GeoLite2 db file' 20240429-shlink.log | wc -l
21
> grep -e '2024-04-18.*Finished updating GeoLite2 db file' 20240429-shlink.log | wc -l
16
> grep -e '2024-04-19.*Finished updating GeoLite2 db file' 20240429-shlink.log | wc -l
105
Obviously the number is suddenly much greater on the 19th, so I checked out that date. Initially everything looks like before, many successful downloads. Then suddenly the limit is reached and the download fails with a 429. Here's a log snippet that includes several of the last successful downloads on the 19th, and a couple of the unsuccessful attempts that follow. Log snippet from April 19th, including failed (429) download attempts
The attempts continue after that, I see 76 of them on the 19th: > grep -e '2024-04-19.*database download failed' 20240429-shlink.log | wc -l
76
Please let me know how else I can help figure this out. |
Thanks for the detailed information. For context, this is how Shlink handles GeoLite downloads: When the first visit reaches an instance, Shlink detects the file does not exist and tries to download it for the first time, logging the message "Finished downloading GeoLite2 db file". In the context of docker, "the first time" means it's a new container, which can happen for a number of reasons: autoscaling, the service was updated to a new version, etc. Every container will need to download its own GeoLite2 db file. If the file exists, Shlink will check if it needs to be updated based on its own metadata, by verifying if it's older than 35 days. This would explain your comment: "Perhaps not a coincidence that there's close to one month in between?" After those 35 days, Shlink will download the newest available version and log "Finished updating GeoLite2 db file". The problem originally reported in #2021, happened because once the first file was downloaded, Shlink kept a reference to its metadata in memory, causing all checks to result in a download once a container run for more than 35 days. Both that problem and probably this one, would be mitigated by regenerating new containers, which is probably what makes this happen less often. That said, I'm taking a few days off, but I'll try to debug a bit further as soon as I can. |
Right. It looks to me as if this is still happening here. Perhaps it would be possible to add some information to the log output that would allow us to understand the decision making process. I.e. something like deciding to download GeoLite database because last download on XXX is YYY days ago... or whatever other details come into it. |
Update. I added debug output to see the Clearly this is very outdated. Now perhaps I'll track down why the meta data has not been updated after any of the successful downloads that clearly happened. |
I don't think I know PHP well enough. I see that a new And before the metadata is extracted, the factory is called, here. Clearly we're still looking at outdated metadata, but I'm not sure why... unless the new |
Yep, good idea.
This is the direction in which my thinking is going, but I'm also a bit clueless why this happened. It happened a few years ago, that a new GeoLite file was published with incorrect metadata, but I don't think this is very likely to happen. I'll need to debug a bit further, but thanks for giving it a try, that was really helpful. |
I'm not being able to reproduce this with latest Shlink. I have tried downloading an older version of the GeoLite2 db file, starting Shlink, visiting a short URL. Shlink detects the file is too old, and starts downloading a newer version. Once finished, if I visit a new short URL, Shlink properly checks the new file is up to date, and does not try to download again. I have also tried visits in parallel. Thanks to the lock, one of them downloads the GeoLite file, then the lock is released, and by the time the rest are checked, they do it against the new file. |
I have just also done some testing with Shlink 4.0.1 and 4.0.2 With the first one, if I replace the database with a very old version, any visit continues to consider the database is up to date, because of the references in memory. With v4.0.2 though, they always pick the version is old, download a new one, and following requests work as expected with the new db file. Could it be that you had some orphan or old Shlink containers running in a version older than 4.0.2, which were the ones producing this issue? |
Sorry, forgot to follow up here.
No. There was just the one container that identified itself as version 4.0.2. It was still running when we last wrote, and it is still active now, albeit with a small change to the source code that outputs a debug message for the build date read from the metadata. And that's where an interesting thing happened recently. I had a > docker compose logs -f shlink | grep "Meta build"
shlink | 2024-05-03T10:54:57+0000 INFO server [2024-05-03T10:54:57.374148+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T12:22:13+0000 INFO server [2024-05-03T12:22:13.495630+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T13:22:11+0000 INFO server [2024-05-03T13:22:11.708647+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T13:22:23+0000 INFO server [2024-05-03T13:22:23.467587+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T13:22:26+0000 INFO server [2024-05-03T13:22:26.821597+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T13:22:33+0000 INFO server [2024-05-03T13:22:33.285331+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T15:26:18+0000 INFO server [2024-05-03T15:26:18.569636+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T15:29:42+0000 INFO server [2024-05-03T15:29:42.649483+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T15:49:48+0000 INFO server [2024-05-03T15:49:48.798808+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T15:49:49+0000 INFO server [2024-05-03T15:49:49.377408+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T15:49:49+0000 INFO server [2024-05-03T15:49:49.774814+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T19:43:05+0000 INFO server [2024-05-03T19:43:05.519738+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-03T21:10:21+0000 INFO server [2024-05-03T21:10:21.658584+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-04T00:07:58+0000 INFO server [2024-05-04T00:07:58.281199+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-04T03:12:31+0000 INFO server [2024-05-04T03:12:31.709732+00:00] [-] Shlink.NOTICE - Meta build date is 2024-03-12T14:30:24+00:00
shlink | 2024-05-04T03:38:40+0000 INFO server [2024-05-04T03:38:40.165622+00:00] [-] Shlink.NOTICE - Meta build date is 2024-05-03T12:43:40+00:00
... To summarize, the log output shows that after I added the debug output, there were 15 further occasions on May 3rd and May 4th where the build date was checked -- and each time it was found to be the same "old" date back in March. It appears that my system actually had an old metadata timestamp the whole time, resulting in countless downloads. Then at some point it received a download where the metadata was finally updated correctly. Why? Not a clue... |
The only reason I can think of is that there were no actual successful downloads at all. Eventually, because of the API limits, but perhaps the first daily attempts did not succeed either. If you look for Maybe Shlink did in fact start downloading the file on the first 30 attempts every day, but then the download timed out, it was not able to write the temp file or extract it, etc. Another reason, as I mentioned somewhere above, is that Shlink downloaded files with incorrect metadata, but this is something hard to verify. I'll try to download some of the latest DBs ate check them. Another useful thing to check is if Shlink logged any successful download. Can you find any appearance of |
Yes, check my previous reports. There were many apparently successful downloads on all days, including the days where 429 errors were displayed later on. I have not found any errors that had a different exception than the 429 one. It seems to me that it may be best at this point for me to rebuild the container on the latest state and leave it to run, in order to see whether the problem returns in the future. The evidence is there that something was odd with the metadata in my running container, and that this problem somehow went away after lots of occurrences, but I'm not sure there's anything more to learn in hindsight. |
Thanks for your help debugging this. In any case, what's becoming more clear to me is that this approach to trigger GeoLite2 downloads is too brittle. In an ideal world, Shlink would try to download the GeoLite2 db as soon as possible when it does not exist yet, then try to update it every N days, and schedule more attempts in case of error, after a few hours/days, with a maximum amount of attempts per day. The problem is that technologies used by Shlink do not natively allow that kind of scheduling, so the way I thought I could mimic that would be by running a job after every visit, and check if those conditions are met based on date calculations. It seemed convenient, and requires no interaction from the user, but it can lead to too many attempts if certain errors occur, which can make things even worst as soon as API limits are reached. In the past, updating the GeoLite2 db was done via cronjobs, but that required manual interactions from users, and it's not very convenient in docker contexts. It is still the recommended approach for people not using RoadRunner https://shlink.io/documentation/long-running-tasks/#download-geolite2-db I'll try to put some time checking if there's a better way to do this. |
I have created #2124 in order to implement some improvements in the download approach. I'll close this for now. |
Shlink version
4.0.2
PHP version
8.3.3
How do you serve Shlink
Docker image
Database engine
MariaDB
Database version
11.3.2
Current behavior
I see the exact behavior described in #2021 -- received notification from Maxmind yesterday, log on their end shows countless downloads each of the last few days (I guess they are not countless but rather 30, but I didn't count). Log on my host shows many successful downloads for no apparent reason, until eventually a download fails because the limit is reached. Further downloads fail with a 429. Just summarizing here -- I read the thread in #2021 and I see that exact thing.
The snag is: I'm already using 4.0.2.
Am I the only one?
Expected behavior
GeoIP db to be downloaded only as needed, or at least limited to Maxmind Free tier compatible attempts per day.
Minimum steps to reproduce
Good questions. Like others, I had this setup running for a long while without trouble. I updated on March 15th and went to 4.0.2, and (I just checked) the first time MaxMind complained by email about the database download limit was also March 15th. Strangely, nothing happened after that, until I received new warning emails on April 19th, 23rd and 28th.
The text was updated successfully, but these errors were encountered: