Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanner *runStream* cause a lot memory consumption #934

Closed
hadson19 opened this issue May 4, 2020 · 17 comments · Fixed by #936
Closed

Spanner *runStream* cause a lot memory consumption #934

hadson19 opened this issue May 4, 2020 · 17 comments · Fixed by #936
Assignees
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@hadson19
Copy link

hadson19 commented May 4, 2020

Environment details

  • windows:
  • Node.js 10.x:
  • npm version: 6.9.0
  • @google-cloud/spanner version: @google-cloud/[email protected]

Steps to reproduce

const { Spanner } = require("@google-cloud/spanner");
const fs = require("fs");
const util = require('util');
const stream = require('stream');
const through = require('through');

const pipeline = util.promisify(stream.pipeline);

const spanner = {
  "instanceId": "..",
  "databaseId": "..",
  "projectId": ".."
};
const filePath = "./users.data.txt";
const keyFilename = "service-account.json";

const spannerDatabase = new Spanner({
  projectId: spanner.projectId,
  keyFilename,
})
  .instance(spanner.instanceId)
  .database(spanner.databaseId);

const spannerStream = spannerDatabase.runStream({
  sql: 'select * from user',
  json: true,
  gaxOptions: {
    pageSize: 1000,
  },
});

const run = async () => {
  await pipeline(
    spannerStream,
    through(function (data) {
      return this.queue(`${JSON.stringify(data)}\n`);
    }),
    fs.createWriteStream(filePath)
  );

  process.exit(0);
};

run();

The stream consume a lot of memory (RAM) (some time it's take 600 Mb)
The gaxOptions (pageSize, maxResults, retry) doesn't work.

Thanks!

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/nodejs-spanner API. label May 4, 2020
@hadson19 hadson19 changed the title Spanner *runStream* cause a lot mamory consumption Spanner *runStream* cause a lot memory consumption May 4, 2020
@olavloite olavloite self-assigned this May 4, 2020
@olavloite olavloite added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: question Request for information or clarification. Not an issue. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 4, 2020
@olavloite
Copy link
Contributor

@hadson19
I'm not directly able to reproduce this. Could you elaborate a little bit on how you are measuring the memory consumption?

I've tried to reproduce the problem with the following script:

async function queryWithMemUsage(instanceId, databaseId, projectId) {
  // Imports the Google Cloud client library
  const {Spanner} = require('@google-cloud/spanner');

  // Creates a client
  const spanner = new Spanner({
    projectId: projectId,
  });

  // Gets a reference to a Cloud Spanner instance and database
  const instance = spanner.instance(instanceId);
  const database = instance.database(databaseId);

  const query = {
    sql: `SELECT *
          FROM TableWithAllColumnTypes
          ORDER BY ColInt64`,
    gaxOptions: {pageSize: 1000},
  };

  let count = 0;
  database
    .runStream(query)
    .on('data', row => {
      count++;
      if (count % 100 === 0) {
        console.log(`Current row: ${row.toJSON({wrapNumbers: true})}`);
        console.log(`Processed ${count} rows so far`);
        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        console.log(`Current mem usage: ${Math.round(used * 100) / 100} MB`);
      }
    })
    .on('error', console.log)
    .on('end', () => {
      console.log(`Finished processing ${count} rows`);
      database.close();
    });
}

When running that script, the memory usage printed out for each 100 rows will vary between 30MB to 80MB. The memory usage will increase steadily to approx 80MB when a garbage collect will reduce it back to 30MB.

Example log from a run of the script using a table which contains huge rows:

Current row: [object Object]
Processed 100 rows so far
Current mem usage: 32.75 MB
Current row: [object Object]
Processed 200 rows so far
Current mem usage: 38.73 MB
Current row: [object Object]
Processed 300 rows so far
Current mem usage: 48.99 MB
Current row: [object Object]
Processed 400 rows so far
Current mem usage: 34.41 MB
Current row: [object Object]
Processed 500 rows so far
Current mem usage: 32.81 MB
Current row: [object Object]
Processed 600 rows so far
Current mem usage: 31.85 MB
Current row: [object Object]
Processed 700 rows so far
Current mem usage: 37.9 MB
Current row: [object Object]
Processed 800 rows so far
Current mem usage: 39.96 MB
Current row: [object Object]
Processed 900 rows so far
Current mem usage: 54.63 MB
Current row: [object Object]
Processed 1000 rows so far
Current mem usage: 52.64 MB
Current row: [object Object]
Processed 1100 rows so far
Current mem usage: 62.48 MB
Current row: [object Object]
Processed 1200 rows so far
Current mem usage: 68.3 MB
Current row: [object Object]
Processed 1300 rows so far
Current mem usage: 79.03 MB
Current row: [object Object]
Processed 1400 rows so far
Current mem usage: 34.41 MB
Current row: [object Object]
Processed 1500 rows so far
Current mem usage: 39.45 MB
Current row: [object Object]
Processed 1600 rows so far
Current mem usage: 39.81 MB
Current row: [object Object]
Processed 1700 rows so far
Current mem usage: 49.65 MB
Current row: [object Object]
Processed 1800 rows so far
Current mem usage: 55.1 MB
Current row: [object Object]
Processed 1900 rows so far
Current mem usage: 65.69 MB
Current row: [object Object]
Processed 2000 rows so far
Current mem usage: 72.19 MB
Current row: [object Object]
Processed 2100 rows so far
Current mem usage: 72.7 MB
Current row: [object Object]
Processed 2200 rows so far
Current mem usage: 27.29 MB
Current row: [object Object]
Processed 2300 rows so far
Current mem usage: 33.06 MB
Current row: [object Object]
Processed 2400 rows so far
Current mem usage: 43.47 MB
Current row: [object Object]
Processed 2500 rows so far
Current mem usage: 48.39 MB
Current row: [object Object]
Processed 2600 rows so far
Current mem usage: 50.01 MB
Current row: [object Object]
Processed 2700 rows so far
Current mem usage: 59.59 MB
Current row: [object Object]
Processed 2800 rows so far
Current mem usage: 69.78 MB
Current row: [object Object]
Processed 2900 rows so far
Current mem usage: 75.44 MB
...

@hadson19
Copy link
Author

hadson19 commented May 4, 2020

@olavloite
To mesure memory usage I've used similar this mechanizm

 setInterval(() => {
   const used = process.memoryUsage().heapUsed / 1024 / 1024;
   console.log(`Current mem usage: ${Math.round(used * 100) / 100} MB`);
 }, 100);

Can you try to write the result into the file?
Because, it seems that backpressuring in nodejs-spanner streams doesn't work.

Can you also check the pause method ?
Using handler data to consume data doesn't approve that streams work correctly

@olavloite
Copy link
Contributor

@hadson19

I've changed my test case to the following:

async function queryWithMemUsage(instanceId, databaseId, projectId) {
  // Imports the Google Cloud client library
  const {Spanner} = require('@google-cloud/spanner');
  const fs = require('fs');
  const stream = require('stream');
  // eslint-disable-next-line node/no-extraneous-require
  const through = require('through');

  // Creates a client
  const spanner = new Spanner({
    projectId: projectId,
  });

  // Gets a reference to a Cloud Spanner instance and database
  const instance = spanner.instance(instanceId);
  const database = instance.database(databaseId);

  const query = {
    sql: `SELECT *
          FROM TableWithAllColumnTypes
          ORDER BY ColInt64`,
    gaxOptions: {pageSize: 1000},
  };

  let count = 0;
  const fileStream = fs.createWriteStream('/home/loite/rs.txt');
  const rs = database
    .runStream(query)
    .on('data', () => {
      count++;
      if (count % 100 === 0) {
        console.log(`Processed ${count} rows so far`);
        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        console.log(`Current mem usage: ${Math.round(used * 100) / 100} MB`);
      }
    })
    .on('error', console.log)
    .on('end', () => {
      console.log('Finished writing file');
      database.close();
    });
  // eslint-disable-next-line node/no-unsupported-features/node-builtins
  await stream.pipeline(
    rs,
    through(function (data) {
      return this.queue(
        `${JSON.stringify(data.toJSON({wrapNumbers: true}))}\n`
      );
    }),
    fileStream
  );
}

This generates a results file sized 2.1GB. The memory usage is even lower than in the initial test. It now varies between approx 30MB and 55MB.

Processed 100 rows so far
Current mem usage: 33.09 MB
Processed 200 rows so far
Current mem usage: 41.48 MB
Processed 300 rows so far
Current mem usage: 30.93 MB
Processed 400 rows so far
Current mem usage: 43.89 MB
Processed 500 rows so far
Current mem usage: 48.15 MB
Processed 600 rows so far
Current mem usage: 40.99 MB
Processed 700 rows so far
Current mem usage: 42.5 MB
Processed 800 rows so far
Current mem usage: 34.73 MB
Processed 900 rows so far
Current mem usage: 49.04 MB
Processed 1000 rows so far
Current mem usage: 35.42 MB
Processed 1100 rows so far
Current mem usage: 37.07 MB
Processed 1200 rows so far
Current mem usage: 26.57 MB
Processed 1300 rows so far
Current mem usage: 38.64 MB
Processed 1400 rows so far
Current mem usage: 47.67 MB
Processed 1500 rows so far
Current mem usage: 25.23 MB
Processed 1600 rows so far
Current mem usage: 36.7 MB
Processed 1700 rows so far
Current mem usage: 30.94 MB
Processed 1800 rows so far
Current mem usage: 42.25 MB
Processed 1900 rows so far
Current mem usage: 54.32 MB
Processed 2000 rows so far
Current mem usage: 38.51 MB
Processed 2100 rows so far
Current mem usage: 49.96 MB
Processed 2200 rows so far
Current mem usage: 32.65 MB
Processed 2300 rows so far
Current mem usage: 48.11 MB
Processed 2400 rows so far
Current mem usage: 32.9 MB
Processed 2500 rows so far
Current mem usage: 46.36 MB
Processed 2600 rows so far
Current mem usage: 43.44 MB
Processed 2700 rows so far
Current mem usage: 37.55 MB
Processed 2800 rows so far
Current mem usage: 49.52 MB
Processed 2900 rows so far
Current mem usage: 35.06 MB
Processed 3000 rows so far
Current mem usage: 44.52 MB
Processed 3100 rows so far
Current mem usage: 43.16 MB
Processed 3200 rows so far
Current mem usage: 41.11 MB
Processed 3300 rows so far
Current mem usage: 26.32 MB
Processed 3400 rows so far
Current mem usage: 37.54 MB
Processed 3500 rows so far
.....
Processed 49000 rows so far
Current mem usage: 42.61 MB
Processed 49100 rows so far
Current mem usage: 35.88 MB
Processed 49200 rows so far
Current mem usage: 47.36 MB
Processed 49300 rows so far
Current mem usage: 46.96 MB
Processed 49400 rows so far
Current mem usage: 31.99 MB
Processed 49500 rows so far
Current mem usage: 42.15 MB
Processed 49600 rows so far
Current mem usage: 33.06 MB
Processed 49700 rows so far
Current mem usage: 45.3 MB
Processed 49800 rows so far
Current mem usage: 31.49 MB
Processed 49900 rows so far
Current mem usage: 44.7 MB
Processed 50000 rows so far
Current mem usage: 48.18 MB
Finished writing file

Screenshot from 2020-05-04 13-10-53

@hadson19
Copy link
Author

hadson19 commented May 4, 2020

hm... it seems that you have a good ssd (the latency is not big).

Can you add postpone Transform.. (to emit backpresure)?

for example:

class PostponeTransform extends Transform {
  _transform(chunk, enc, next) {
    setTimeout(() => {
      this.push(chunk, enc);
      next();
    }, 100 );
  }
}
  await stream.pipeline(
    rs,
    new PostponeTransform({
      highWaterMark: 100,
      objectMode: true,
    }),
    through(function (data) {
      return this.queue(
        `${JSON.stringify(data.toJSON({wrapNumbers: true}))}\n`
      );
    }),
    fileStream
  );

Because we have a bit latency with storage (which emit pause).. And this pause doesn't effect on nodejs spanner library.

Thanks

@olavloite
Copy link
Contributor

@hadson19 Yep, that seems to trigger the problem for me as well.

@stephenplusplus
Copy link
Contributor

When you see the increased memory usage, is it beyond what the system is capable of handling? I have noticed while researching stream memory consumption issues in the past, Node's internals are in control, and while the usage gets high, it's not causing a crash, for example.

@olavloite
Copy link
Contributor

When you see the increased memory usage, is it beyond what the system is capable of handling? I have noticed while researching stream memory consumption issues in the past, Node's internals are in control, and while the usage gets high, it's not causing a crash, for example.

I've been able to make it go out of memory on my local machine while testing this.

@olavloite olavloite added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: question Request for information or clarification. Not an issue. labels May 5, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 5, 2020
@olavloite olavloite added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels May 5, 2020
@olavloite
Copy link
Contributor

I've been able to make it go out of memory on my local machine while testing this.

I should note that this only happens when I try to write my 2GB result set to a file and deliberately add the PostponeTransformer to the pipeline. Without the PostponeTransformer or with a smaller result set, it does not go out of memory.

I have a solution locally that will fix this now, but I want to do some additional testing first.

@hadson19
Copy link
Author

hadson19 commented May 5, 2020

The PostponeTransform is a primitive simulation of network latency (stream of the record to external source). The main idea is to handle nodejs backpressure in nodejs-spanner. Please, correct me, if I'm wrong

olavloite added a commit to olavloite/nodejs-spanner that referenced this issue May 5, 2020
The request stream should be paused if the downstream is indicating
that it cannot handle any more data at the moment. The request stream
should be resumed once the downstream does accept data again. This
reduces memory consumption and potentially out of memory errors when
a result stream is piped into a slow writer.

Fixes googleapis#934
@olavloite
Copy link
Contributor

The PostponeTransform is a primitive simulation of network latency (stream of the record to external source). The main idea is to handle nodejs backpressure in nodejs-spanner. Please, correct me, if I'm wrong

Well, both should handle backpressure correctly. If you were to add some custom transformer or stream to your pipeline that itself does not handle backpressure correctly, you could still run into problems.

In this case it seems that one of the internal streams used by nodejs-spanner was not handling it well in all cases. That does not necessarily mean that the proposed fix will change the actual memory consumption in your specific case, but it should at least prevent any unnecessary build up of buffered data internally. See #936 (comment) for more information on a number of tests that I ran for this.

@skuruppu
Copy link
Contributor

skuruppu commented May 7, 2020

@hadson19 would you be willing to try #936 to see if it fixes your issue? Note that the Node.js client library is going through a major version update so using what's in the master branch may be tricky. You may also just be able to patch the change locally on the version you're using.

@hadson19
Copy link
Author

hadson19 commented May 7, 2020

@skuruppu @olavloite
I've used that pull request and I'm still getting a lot of memory consumption more than 300 mb.
Thanks.

@olavloite
Copy link
Contributor

@hadson19 What is the size of the result set that you are streaming in terms of number of columns and number of rows, and what is the size of the data in the columns (or maybe easier: the size of the total result set?).

@hadson19
Copy link
Author

hadson19 commented May 7, 2020

@olavloite
The table has near 20 columns
Total row size (stringified) is near 510 length (utf-8) (simple characters)
In my test table are near 200 000 rows
I've used the same PostponeTransform transport to check this stream

class PostponeTransform extends stream.Transform {
  _transform(chunk, enc, next) {
     setTimeout(() => {
      this.push(JSON.stringify(chunk), enc);
      next();
     }, 80 );
  }
}

  await pipeline(
    spannerDatabase.runStream({
      sql: 'select * from user',
      json: true,
        gaxOptions: {
          pageSize: 10,
        },
    }),
    new PostponeTransform({
      highWaterMark: 10,
      objectMode: true,
    }),
    fs.createWriteStream(filePath)
  )

and the library (nodejs-spanner) try to consume a lot of memory.
I've noticed that there is no more then ~300 mb.

@hadson19
Copy link
Author

@olavloite
Unfortunately after settings millions of rows in Spanner and use query above in nodejs - the memory used more then 900Mb.

@olavloite
Copy link
Contributor

@hadson19

Unfortunately after settings millions of rows in Spanner and use query above in nodejs - the memory used more then 900Mb.

What happens if you run the same query on the same table without this change?

I agree that 900MB is a lot, and I'll look into that as well, but for me this change at least reduced the total memory consumption for these kind of very large result sets to an amount that could be handled. In other words; it did not run out of memory.

@hadson19
Copy link
Author

hadson19 commented May 25, 2020

@olavloite
It is a little bit different (with or without your pull request). At the beginning - memory increases up to nearly 400Mb and waits (until it is consumed by the transformer). But after a stream (transform stream) resumes it - the memory increases up to nearly 900 Mb.

And after a nearly one minute
used 1395 MB
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 00007FF7032D879A v8::internal::GCIdleTimeHandler::GCIdleTimeHandler+4506
2: 00007FF7032B3206 node::MakeCallback+4534
3: 00007FF7032B3B80 node_module_register+2032
4: 00007FF7035D1AFE v8::internal::FatalProcessOutOfMemory+846
5: 00007FF7035D1A2F v8::internal::FatalProcessOutOfMemory+639
6: 00007FF7037B8224 v8::internal::Heap::MaxHeapGrowingFactor+9620
7: 00007FF7037AF206 v8::internal::ScavengeJob::operator=+24550
8: 00007FF7037AD85C v8::internal::ScavengeJob::operator=+17980
9: 00007FF7037ACD25 v8::internal::ScavengeJob::operator=+15109
10: 00007FF7037B6684 v8::internal::Heap::MaxHeapGrowingFactor+2548
11: 00007FF7038E06F8 v8::internal::Factory::AllocateRawArray+56
12: 00007FF7038E1072 v8::internal::Factory::NewFixedArrayWithFiller+66
13: 00007FF70390B2B6 v8::internal::Factory::NewCallHandlerInfo+113510
14: 00007FF703C5D680 v8::internal::compiler::JSTypedLowering::simplified+29264

Thanks

olavloite added a commit that referenced this issue Jun 3, 2020
* fix: pause request stream on backpressure

The request stream should be paused if the downstream is indicating
that it cannot handle any more data at the moment. The request stream
should be resumed once the downstream does accept data again. This
reduces memory consumption and potentially out of memory errors when
a result stream is piped into a slow writer.

Fixes #934

* fix: do not retry stream indefinitely

PartialResultSetStream should stop retrying to push data into the
stream after a configurable number of retries have failed.

* fix: process review comments

* fix: remove unused code

* tests: add test for pause/resume

* fix: return after giving up retrying + add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants