Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Limitation Bug with visit:process #307

Closed
jakkuh opened this issue Dec 5, 2018 · 14 comments
Closed

Memory Limitation Bug with visit:process #307

jakkuh opened this issue Dec 5, 2018 · 14 comments
Assignees
Labels
Milestone

Comments

@jakkuh
Copy link
Contributor

jakkuh commented Dec 5, 2018

When trying to run visit:process in the CLI I get the following error.

Allowed memory size of 268435456 bytes exhausted (tried to allocate 4096 bytes) in /shlink-1-13-1/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOConnection.php on line 107
@jakkuh
Copy link
Contributor Author

jakkuh commented Dec 5, 2018

Increasing the php memory limit to 512mb fixes this

Correction: I had to increase the php limit all the way to 1024M to fix this issue.

@acelaya
Copy link
Member

acelaya commented Dec 6, 2018

I would say it is still suffering from trying to handle big datasets.

This should have been mitigated with the latest change, but I'll keep investigating what else could be consuming that much memory.

@jakkuh
Copy link
Contributor Author

jakkuh commented Dec 6, 2018 via email

@acelaya
Copy link
Member

acelaya commented Feb 17, 2019

Hi @tivyhosting. I'm trying to find where this memory leak is coming from.

Do you know how many visits does your system process every time you run the command? I just need an approximate value.

@jakkuh
Copy link
Contributor Author

jakkuh commented Feb 19, 2019

Unsure, but at the time I know that at least 96000 visits were unprocessed. My guess is somewhere in the 300-400k range.

But I have the command running every 15 mins, and a lot of the time it might only be processing a couple hundred and I still get the issue. It's almost like the entire database is loaded in before checking.

@acelaya
Copy link
Member

acelaya commented Feb 20, 2019

Thanks @tivyhosting, that's really helpful.

Based on the huge amount of unprocessed visits and the kind of error, my guess is that the lower level database query is trying to load too many rows in memory

I already made sure the ORM was not trying to map too many objects at the same time, but the raw query still loads all pending visits with no limit.

I think that making sure only smaller subsets are loaded at the same time should solve the issue, but I will run some benchmarks with big datasets in order to confirm.

Having the whole exception trace would also help confirm this theory, but I guess you never got to run the command with higher verbosity.

I will let you know once I have done some tests.

@acelaya
Copy link
Member

acelaya commented Feb 21, 2019

@tivyhosting I have done several tests and I have managed to reproduce the issue when trying to process more than 400k pending visits.

My guess was right, the problem is that the query is trying to fetch the whole dataset at the same time.

I will try to provide a fix during this weekend and release the v1.16. This is the only issue left for that milestone.

@acelaya
Copy link
Member

acelaya commented Feb 23, 2019

@tivyhosting I'm about to merge the fix for this bug.

I have managed to make it load the visits to be located in blocks of 10k, which drastically reduces the memory peak, and it also makes it no longer exponential, so you can even process millions without affecting the memory ussage.

Also, as an interesting side effect, I have realized that the way located visits are then persisted could be improved, It is now done in blocks of 200, which makes the whole process waaaaaay faster. I have processed 200k visits in less than 5min, when it used to take about an hour with previous approach.

I will release the new version as soon as possible (once builds finish and such).

@acelaya
Copy link
Member

acelaya commented Feb 23, 2019

@tivyhosting v1.16.0 is already available https://github.com/shlinkio/shlink/releases/tag/v1.16.0

@jakkuh
Copy link
Contributor Author

jakkuh commented Feb 26, 2019

Noticed a almost 30 second constant spam of ignoring visit with no IP address. It seems like yes it does ignore these, but it doesn't seem to remove them from the "to be processed" queue. Maybe this is the reasoning for the memory leak?

Although after the update it does run fine, but doesn't spam the console with the IPs it has processed anymore?

https://i.imgur.com/NTdUBrI.png

@jakkuh
Copy link
Contributor Author

jakkuh commented Feb 26, 2019

Just piped it to a file, and its 550,000 results of "Ignored visit with no IP address" every time I run the command.

@acelaya
Copy link
Member

acelaya commented Feb 26, 2019

Nope, this is not the reason for the memory leak. That has been fixed, as explained some comments above.

This is a new issue intruded with the new implementation. It shouldn't have a big impact, but I agree it's quite annoying.

I will provide a fix for this in v1.16.1

@acelaya
Copy link
Member

acelaya commented Feb 26, 2019

I have to investigate a little bit further, but I'm now almost sure this is not a regression introduced with last version, but an issue which has always been there.

However, it has a very low impact so it can be unnoticed.

I will provide a fix either way.

@acelaya
Copy link
Member

acelaya commented Feb 26, 2019

@tivyhosting I just confirmed this has always been there. The message is only displayed when running the visit:process command in verbose mode (with -v, -vv or -vvv).

I think it is something that I wanted to fix at some point but got "lost". Now that there's a ticket tracking the issue I will remember to find a more elegant solution which prevents these visits to be processed over and over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants