Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: rewrite streaming logic #136

Merged
merged 2 commits into from
Jul 12, 2019
Merged

refactor!: rewrite streaming logic #136

merged 2 commits into from
Jul 12, 2019

Conversation

callmehiphop
Copy link
Contributor

@callmehiphop callmehiphop commented Jul 9, 2019

Fixes #134

  • Tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 9, 2019
@stephenplusplus
Copy link
Contributor

I think this is completely fine, however, I would rather us pluck out the splitting and emitting of an array's members and put it into split-array-stream. Some/most of the code in this PR can remain regarding turning a method call into a stream. Then, instead of the iterative/flow control logic being here, we can just pass the array of the right size to split-array-stream, and it does that for us.

The version of split-array-stream that can handle this exists as a prototype here: https://gist.github.com/stephenplusplus/888373161f80a45516be64e911a84db5

Assuming split-array-stream worked as it did in the above gist, would you have any objections to separating responsibility as I suggested above? You had some concerns in the discussion on this issue, although I didn't fully understand.

@callmehiphop
Copy link
Contributor Author

@stephenplusplus part of the reason I think we should drop split is because it makes implementing flow control difficult. Looking at your gist, it looks like the new strategy would be to recursively call either _read or _transform and push at least 1 item per call. Effectively all this would do is slow down how quickly items were pushed in, but it would still be susceptible to a memory leak because it doesn't apply back pressure.

I'm all for keeping it if we can make it work, but I'm not really sure what that is supposed to look like.

@callmehiphop
Copy link
Contributor Author

callmehiphop commented Jul 9, 2019

@stephenplusplus ~~I think the line in question is https://gist.github.com/stephenplusplus/888373161f80a45516be64e911a84db5#file-index-js-L44

Basically that check would indicate that the buffer is full, so I assume we call it again at the end of the current loop in hopes that it made some room. On the next call at least 1 item will be pushed into the stream before it is determined if it actually has room for said item. I believe if the stream were to be paused it would continuously push items in.

If only the _read logic was used that would be ok.~~

I think misread, looks like it could work. Can you provide an example on how you would integrate with your prototype?

@stephenplusplus
Copy link
Contributor

Yes sir, using this PR as a commit, or just pseudo?

@stephenplusplus
Copy link
Contributor

Just in case you didn't see this part, I showed two ways you can use SAS, either by passing a function that resolves with an array (https://gist.github.com/stephenplusplus/888373161f80a45516be64e911a84db5#file-index-js-L126) or by creating a stream that emits arrays (https://gist.github.com/stephenplusplus/888373161f80a45516be64e911a84db5#file-index-js-L100)

@callmehiphop
Copy link
Contributor Author

@stephenplusplus I'm a little fuzzy on it, but if you want to take a crack at updating split and adding a commit to this PR I'd be ok with it.

@codecov
Copy link

codecov bot commented Jul 11, 2019

Codecov Report

Merging #136 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #136   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           1      2    +1     
  Lines          98     97    -1     
  Branches       19     20    +1     
=====================================
- Hits           98     97    -1
Impacted Files Coverage Δ
src/resource-stream.ts 100% <100%> (ø)
src/index.ts 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65ff625...000c284. Read the comment docs.

import {Transform} from 'stream';
import {ParsedArguments} from './';

interface ResourceEvents<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to have to define this normal event-emitter/stream stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the types for 'data' will just be any and users would have to explicitly set a type instead of being able to infer one.

parsedArguments: ParsedArguments,
originalMethod: Function
): ResourceStream<T> {
return new ResourceStream<T>(parsedArguments, originalMethod);
Copy link
Contributor

@stephenplusplus stephenplusplus Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rough draft attempt at an option for how this could look if split-array-stream changes go through:

runAsStream_ (parsedArguments, originalMethod) {
  let numMaxApiCalls = parsedArguments.maxApiCalls === -1 ? Infinity : parsedArguments.maxApiCalls;
  let numRequestsMade;
  let numResultsToSend = parsedArguments.maxResults === -1 ? Infinity  : parsedArguments.maxResults;

  let query = parsedArguments.query;

  const splitStream = new SplitArrayStream(getPage);

  function getPage() {
    return new Promise((resolve, reject) => {
      originalMethod(query, (err, results, nextQuery) => {
        if (err) {
          splitStream.destroy(err);
          return;
        }

        if (numResultsToSend !== Infinity) {
          results = results.splice(0, numResultsToSend);
          numResultsToSend -= results.length;
        }

        const didMakeMaxCalls = ++numRequestsMade >= numMaxApiCalls;

        if (didMakeMaxCalls) {
          results.push(null);
        } else {
          query = nextQuery;
        }

        resolve(results);
      });
    });
  }

  return splitStream;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenplusplus do you have a timeline of when this would be finished?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably wrap up stephenplusplus/split-array-stream#4 tomorrow.

@callmehiphop callmehiphop marked this pull request as ready for review July 12, 2019 17:26
@callmehiphop callmehiphop changed the title refactor: rewrite streaming logic refactor!: rewrite streaming logic Jul 12, 2019
@callmehiphop callmehiphop merged commit 641d82d into googleapis:master Jul 12, 2019
}

if (more && !this._ended) {
setImmediate(() => this._read());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call _read() again ourselves? As I understand it, the internals will ask for more data on its own (calling _read()), so the Readable implementor don't have to string a big series of reads together manually.

Copy link
Contributor Author

@callmehiphop callmehiphop Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it it will call _read when the user first requests data and then subsequent reads will happen after the buffer was full but it is now safe to continue reading. In this case because the buffer is not full, if we don't call it again we'll end up in a state where it hangs forever, I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

split-array-stream is slow
4 participants