Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: introduce PullAll method to speed up blob.text/arrayBuffer #49873

Closed
wants to merge 168 commits into from

Conversation

debadree25
Copy link
Member

Made a small attempt at addressing nodejs/performance#118 essentially array buffer repeatedly calls in to the Pull method in c++ gathers all the buffers and then calls again to concatenate the buffers hence moved this entire process to the C++ side. There seems to be good performance improvment of ~3x but could not run the full comparison benchmark because it takes tooo long. My C++ skills are also pretty weak so would need some guidance 😅! Opening this PR to gather enough feedback to improve 😃

Refs: nodejs/performance#118

Thank You!

@debadree25 debadree25 requested a review from anonrig September 26, 2023 11:00
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Sep 26, 2023
src/node_blob.cc Outdated Show resolved Hide resolved
// from a non-memory resident blob part (e.g. file-backed blob).
// The error details the system error code.
const error = lazyDOMException('The blob could not be read', 'NotReadableError');
reject(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass resolve and reject methods to pullAll function, we don't even need to return anything and just handle it on C++ side. It might provide a performance boost. This would also reduce calling and mutating the callback function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

src/node_blob.cc Outdated Show resolved Hide resolved
src/node_blob.cc Outdated
size_t offset = 0;
};

struct Impl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a little more descriptive struct names?

src/node_blob.cc Outdated Show resolved Hide resolved
src/node_blob.cc Outdated Show resolved Hide resolved
@anonrig anonrig requested a review from jasnell September 26, 2023 15:54
@debadree25
Copy link
Member Author

Okay so now the main problem how to replicate something like

reader.pull((status, buffer) => {
// ....
queueMicrotask(() => readNext());
}

in the C++ land
i tried something like this

std::shared_ptr<Impl> impl = std::make_shared<Impl>();
impl->reader = BaseObjectPtr<Blob::Reader>(reader);
impl->callback.Reset(env->isolate(), fn);
impl->env = env;
auto next = [impl_ptr = impl](int status, const DataQueue::Vec* vecs, size_t count, bob::Done doneCb) mutable {
  // ... initialising stuff and collecting the buffer
  if (status > 0) {
      impl_ptr->enqueue_cb();
  } else {
      // .. concatenate the buffers
      Local<Value> argv[2] = {Int32::New(env->isolate(), status), ArrayBuffer::New(env->isolate(), store)};
      impl_ptr->reader->MakeCallback(fn, arraysize(argv), argv);
  }
};

impl->enqueue_cb = [&]() {
    reader->inner_->Pull(next, node::bob::OPTIONS_END, nullptr, 0);
};

impl->enqueue_cb();

but the above gives a segmentation fault ☹️
i tried using unique_ptr too but that gives

note: copy constructor of '(lambda at ../../src/node_blob.cc:420:15)' is implicitly deleted because field '' has a deleted copy constructor

so any other ideas? I know wouldnt be doing a recursive call instead would call the Pull wrapped in enqueue microtask ref: nodejs/performance#118 (comment) but need to get the calling order correct first

src/node_blob.cc Outdated
@@ -291,6 +291,7 @@ Local<FunctionTemplate> Blob::Reader::GetConstructorTemplate(Environment* env) {
BaseObject::kInternalFieldCount);
tmpl->SetClassName(FIXED_ONE_BYTE_STRING(env->isolate(), "BlobReader"));
SetProtoMethod(env->isolate(), tmpl, "pull", Pull);
SetProtoMethod(env->isolate(), tmpl, "pullAll", PullAll);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update typescript typings for blob?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating

src/node_blob.cc Outdated
}

if (status > 0) {
impl_ptr->enqueue_cb();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay now this causes a stack overflow for large bufs we need to use microtask queue i tried something like this

env->RequestInterrupt([impl_ptr](Environment* env) {
        env->context()->GetMicrotaskQueue()->EnqueueMicrotask(
            env->isolate(),
            [](void* arg) {
              printf("Calling here?\n");
              auto impl_ptr = static_cast<std::shared_ptr<Impl>*>(arg);
              (*impl_ptr)->enqueue_cb();
            },
            impl_ptr.get());
      });

but this gives a very friendly segfault probably because i mangling the pointer somewhere ☹️ there arent much examples of usage with the microtask queue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any ideas @anonrig ?😅

@anonrig anonrig requested review from joyeecheung and Qard September 30, 2023 02:53
@debadree25 debadree25 requested review from jasnell and removed request for jasnell October 28, 2023 17:07
@debadree25
Copy link
Member Author

@nodejs/cpp-reviewers if anyone could guide on this PR would be great help!

Thank you!

src/node_blob.cc Outdated
size_t total = 0;
for (size_t n = 0; n < count; n++) total += vecs[n].len;

std::shared_ptr<BackingStore> store = v8::ArrayBuffer::NewBackingStore(env->isolate(), total);
Copy link
Member

@lemire lemire Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that NewBackingStore returns std::unique_ptr<BackingStore>, doesn't it? If I am right, and you are changing it to a shared_ptr, then I recommend you justify this choice with a comment.

src/node_blob.cc Outdated
if (status > 0) {
impl_ptr->enqueue_cb();
} else {
std::shared_ptr<BackingStore> store = ArrayBuffer::NewBackingStore(env->isolate(), impl_ptr->total_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here. If NewBackingStore returns a unique_ptr, changing the type might require an explanation.

src/node_blob.cc Outdated
std::copy(vecs[n].base, vecs[n].base + vecs[n].len, ptr);
ptr += vecs[n].len;
}
std::move(doneCb)(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personnally do not understand why there is a std::move... If it is correct, I recommend adding a comment to explain it.

@lemire
Copy link
Member

lemire commented Feb 1, 2024

@debadree25 Compiling node with --enable-asan may help with segmentation faults and the like.

@debadree25
Copy link
Member Author

Good idea @lemire I havent had tried with asan build let me get back to you by trying with that, i may that have rewrite this again once to remember the flow i was attempting!, thank you for taking the time to review!

MoLow and others added 4 commits April 15, 2024 13:43
PR-URL: nodejs#49871
Refs: nodejs@fbe28e2
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Chemi Atlow <[email protected]>
PR-URL: nodejs#49692
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
PR-URL: nodejs#49834
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
PR-URL: nodejs#49878
Fixes: nodejs#49853
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Chemi Atlow <[email protected]>
targos and others added 27 commits April 15, 2024 13:43
Original commit message:

    Reland^2 "[iterator-helpers] Unship due to incompat"

    This is a reland of commit bab67985346dbc0d566391ad561537b4554455b4

    Change since reland: A second breakage reported in chromium:1480783

    Original change's description:
    > Reland "[iterator-helpers] Unship due to incompat"
    >
    > This is a reland of commit 1a22cf9896d682a9dfca589f92ed97c7f875b8a2
    >
    > Change since revert: I mistakenly thought part of the finch
    > kill-switch playbook is to keep it enabled on ToT. It's actually
    > the opposite.
    >
    > Original change's description:
    > > [iterator-helpers] Unship due to incompat
    > >
    > > Bug: chromium:1474613, v8:13558
    > > Change-Id: Iccba26e5cd5dc1787172c78b6e4f6889ef67fcea
    > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4834350
    > > Reviewed-by: Adam Klein <[email protected]>
    > > Commit-Queue: Shu-yu Guo <[email protected]>
    > > Cr-Commit-Position: refs/heads/main@{#89741}
    >
    > Bug: chromium:1474613, v8:13558
    > Change-Id: Idc421a114303f036622bff681c9fa252c9110b9d
    > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4843761
    > Commit-Queue: Rezvan Mahdavi Hezaveh <[email protected]>
    > Auto-Submit: Shu-yu Guo <[email protected]>
    > Commit-Queue: Shu-yu Guo <[email protected]>
    > Reviewed-by: Rezvan Mahdavi Hezaveh <[email protected]>
    > Cr-Commit-Position: refs/heads/main@{#89800}

    Bug: chromium:1474613, v8:13558
    Change-Id: Ia933c874508f1ec10ea4718c6378858cd5bec8f9
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4854732
    Reviewed-by: Rezvan Mahdavi Hezaveh <[email protected]>
    Auto-Submit: Shu-yu Guo <[email protected]>
    Commit-Queue: Rezvan Mahdavi Hezaveh <[email protected]>
    Cr-Commit-Position: refs/heads/main@{#89905}

Refs: v8/v8@89b3702
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Original commit message:

    fix: EmbeddedTargetOs on IBM i with Python 3.9

    For some context, Python 3.9 on IBM i returns "os400" for sys.platform
    instead of "aix". We used to build with Python 3.6 which returned "aix"
    as the platform

    When attempting to build Node.js with python 3.9 on IBM i we run into a
    build error.

    Ref: nodejs#48056
    Ref: nodejs#48056 (comment)

    I'm not quite sure where target_os is being passed down to the function ToEmbeddedTargetOs.
    It seems as though target_os is being generated from sys.platform or
    similar call from python as we started running into this issue after
    building with Python 3.9.

    This PR supersedes initial changes proposed in:
    https://chromium-review.googlesource.com/c/v8/v8/+/4259330

    This PR contains the minimal changes to successfully build Node.js (builds v8 as an internal dep)
    on IBM i with Python 3.9.

    Change-Id: I32d43197bce994a72a0d85091e91f80eeea4482d
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4846413
    Commit-Queue: Jakob Linke <[email protected]>
    Reviewed-by: Michael Achenbach <[email protected]>
    Reviewed-by: Jakob Linke <[email protected]>
    Cr-Commit-Position: refs/heads/main@{#89981}

Refs: v8/v8@8ec2651
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Refs: v8/v8@d9715ad
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Accept a new `step` break message.

PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
`--no-harmony-sharedarraybuffer` was removed from V8 but it's still
possible to disable the feature with
`--enable-sharedarraybuffer-per-context`.

PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Syntax errors from `JSON.parse` contain more information and
can now be printed on two lines if they are long.

PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Refs: nodejs#50050
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Refs: nodejs#50079
PR-URL: nodejs#49639
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
JSTransferable wrapper object is a short-lived wrapper in the scope of
the serialization or the deserialization. Make the JSTransferable
wrapper object pointer as a strongly-referenced detached BaseObjectPtr
so that a JSTransferable wrapper object and its target object will never
be garbage-collected during a ser-des process, and the wrapper object
will be immediately destroyed when the process is completed.

PR-URL: nodejs#50026
Fixes: nodejs#49852
Fixes: nodejs#49844
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Notable changes:

This release addresses some regressions that appeared in Node.js 18.18.0:

- (Windows) FS can not handle certain characters in file name
  nodejs#48673
- 18 and 20 node images give error - Text file busy (after re-build images)
  nodejs/docker-node#1968
- libuv update in 18.18.0 breaks webpack's thread-loader
  nodejs#49911

The libuv 1.45.0 and 1.46.0 updates that were released in Node.js 18.18.0
have been temporarily reverted.

PR-URL: nodejs#50066
PR-URL: nodejs#48902
Fixes: nodejs#48640
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs#50091
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
PR-URL: nodejs#50088
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Trivikram Kamat <[email protected]>
This commit adds a 'flush' option to the createWriteStream()
family of functions.

Refs: nodejs#49886
PR-URL: nodejs#50093
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
PR-URL: nodejs#50142
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Ruy Adorno <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
PR-URL: nodejs#50121
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
This test was flaky on Windows when trying to clean up the
tmp directory, probably because it relied on child processes
timing out and being killed.

This commit updates the test to check for debug output
from the test runner. This should be adequate because the
original change was effectively:

let concurrency = getOptionValue('--test-concurrency') || true;

The test runner now logs the value of the concurrency variable.

Fixes: nodejs#50101
PR-URL: nodejs#50108
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
@debadree25
Copy link
Member Author

Will reopen have completely messed up my branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.