Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.8] Memory leak in DB join queries #28195

Closed
laurencei opened this issue Apr 12, 2019 · 11 comments · Fixed by #28220
Closed

[5.8] Memory leak in DB join queries #28195

laurencei opened this issue Apr 12, 2019 · 11 comments · Fixed by #28220
Labels

Comments

@laurencei
Copy link
Contributor

laurencei commented Apr 12, 2019

  • Laravel Version: 5.8.11
  • PHP Version: 7.2.11
  • Database Driver & Version: mySQL

Description:

There seems to be a runaway memory leak when using any of the join DB functions.

Steps To Reproduce:

On a clean Laravel 5.8 install just drop this inside your console.php:

// Memory usage consistent
Artisan::command('static-memory-test', function () {
    while(true) {
        $users = DB::table('users')->get();

        var_dump ('current memory usage: '.memory_get_usage());
    }
});

// Memory leak
Artisan::command('leak-memory-test', function () {
    while(true) {
        $users = DB::table('users')->join('password_resets', 'users.email', '=', 'password_resets.email')->get();

        var_dump ('current memory usage: '.memory_get_usage());
    }
});

Running php artisan static-memory-test shows:

string(30) "current memory usage: 12982728"
string(30) "current memory usage: 12982728"
string(30) "current memory usage: 12982728"
string(30) "current memory usage: 12982728"
...

But running php artisan leak-memory-test shows:

string(30) "current memory usage: 12819376"
string(30) "current memory usage: 12821864"
string(30) "current memory usage: 12824320"
string(30) "current memory usage: 12826776"
...

I noticed this, because I have a very complicated SQL query with lots of joins on a continuous loop as part of a long-running process. The script would keep crashing after just a few minutes saying it was out of memory. After many hours I managed track down the cause to the join statements.

What I've been able to work out is any/all join statements are affected -i.e. join(), leftJoin() etc.

I did some further testing, and discovered the bug does NOT exist in either Laravel 5.1 or Laravel 5.2. In both those versions you can run a loop with a join and the memory usage remains static, so we know it can be done.

But on Laravel 5.3.0 the memory usage issue appears. I think I've tracked down the PR that caused this to a refactor submitted here: #13576

The problem is there have been so many code changes and other PRs since then, it's difficult to work out the solution here.

Ping @acasar (original PR contributor)
Ping @staudenmeir (resident DB expert)

@laurencei
Copy link
Contributor Author

laurencei commented Apr 12, 2019

p.s. and the size of the memory leak is related to the size of the join dataset. So in the example above it looks small, but when you have large data sets, it causes the memory to rapidly get out of hand.

@driesvints driesvints added the bug label Apr 12, 2019
@staudenmeir
Copy link
Contributor

I can reproduce this, but the memory usage regularly "resets" (when the garbage collection kicks in?):

string(30) "current memory usage: 16562760"
string(30) "current memory usage: 16565160"
string(30) "current memory usage: 16567560"
string(30) "current memory usage: 12581160"
string(30) "current memory usage: 12583560"
string(30) "current memory usage: 12585960"

Do you get the same behavior?

@laurencei
Copy link
Contributor Author

laurencei commented Apr 12, 2019

Do you get the same behavior?

Interesting... yes I do.

If I add a counter, and check when the reset occurs, its the same at 1423 loops (at least for me).

The problem is for my "real" script - it only lasts 20-25 loops before the memory failure kicks in (due to the size of the queries).

If I add a gc_collect_cycles() into the loop, the memory issue goes away (at least on this basic script)... but doesnt solve it on my larger complex script (although I'll go see if I can tweak it somehow).

So what's the root cause though? I'm guessing a reference not being released somewhere...

@staudenmeir
Copy link
Contributor

The memory leak is caused by this line:

$this->parentQuery = $parentQuery;

Every JoinClause instance receives and stores the parent query. The instance itself then gets stored in the parent query's $joins property. This leads to some kind of "recursion".

The question is whether this is the expected behavior or a PHP bug.

@staudenmeir
Copy link
Contributor

staudenmeir commented Apr 12, 2019

Simple example:

class A
{
    public $b;

    public function b()
    {
        $this->b = new B($this);
    }
}

class B
{
    public $a;

    public function __construct($a)
    {
        $this->a = $a;
    }
}

while (true) {
    (new A)->b();

    var_dump(memory_get_usage());

    usleep(1000);
}

@laurencei
Copy link
Contributor Author

laurencei commented Apr 12, 2019

The question is whether this is the expected behavior or a PHP bug.

This seems relevant: https://medium.com/@johann.pardanaud/about-circular-references-in-php-10f71f811e9

According to that, a solution is coming in PHP7.4: https://wiki.php.net/rfc/weakrefs

I guess the only other thought is I'll have a look and see if we can refactor the code at all to remove the circular reference... but given the issue has gone undetected for so long, I might just have to use some raw SQL queries on my particular use case to as an intrim solution... 😢

@staudenmeir
Copy link
Contributor

Do your queries use the JoinClause::newQuery() and forSubQuery() methods?

If not, you could create your own JoinClause class and remove the $this->parentQuery = $parentQuery; line by overriding the constructor.

@staudenmeir
Copy link
Contributor

It looks like we can fix this by storing the parent query's connection, grammar, processor and class name separately instead of the whole query object.

@laurencei
Copy link
Contributor Author

laurencei commented Apr 14, 2019

Do your queries use the JoinClause::newQuery() and forSubQuery() methods? If not, you could create your own JoinClause class and remove the $this->parentQuery = $parentQuery; line by overriding the constructor.

Yeah, I dont use either, so that seems to help in this instance.

It looks like we can fix this by storing the parent query's connection, grammar, processor and class name separately instead of the whole query object.

How do we get around the call to newQuery() and forSubQuery() though if we dont have the parentQuery?

edit: oh - if we change the constructor, i guess that works...

@staudenmeir
Copy link
Contributor

The only downside is the breaking change. I'll submit a PR.

@laurencei
Copy link
Contributor Author

laurencei commented Apr 14, 2019

The only downside is the breaking change. I'll submit a PR.

Just target 5.9 I guess? It's been here since 5.3, so it's not like a new urgent bug...?

edit: I've changed my script and added the gc_collect_cycles() as an intrim until ?5.9 (if it gets accepted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants