-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stream example #34
Conversation
Refs: nodejs/node#33240 |
I've updated the example to disable the transfer list in the custom duplex by default to avoid the Abort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM but I’m not really sure about this example as it exemplifies several things that are not best practices, including using Workers for file I/O and compression, both of which are already asynchronous operations in Node.js.
If you want a more real-world example, maybe a simple text-processing Worker (e.g. applying a complex regexp to a stream – even just filtering out valid JS identifiers is quite heavy if you want to do it correctly) would be better?
_read () { | ||
// Do nothing here. A more complete example would | ||
// implement proper read/pause behavior. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I’ve thought about porting the code from our internal Worker stdio implementation to npm. This is definitely giving more motivation for that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely would be good I think. I'm considering the possibility of Piscina providing a more complete version of PortDuplex
out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell The thing is, in the end streaming use cases for Workers are even rarer than use cases for Workers in general. I don’t think we should encourage this pattern if users don’t really need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's absolutely fair. The key reason I'm thinking through this is to explore the various good and bad ways of using the workers and what the various caveats are.
Yep, this is actually a lead in to some react server side rendering tests I'm looking at but wanted to make sure the basic idea for the PortDuplex would be sound. |
(to be clear, I'm not in a rush to land this particular example, we can keep tweaking it until it's demonstrating the right thing) |
@jasnell I’ll try to port over the stream code from core, and I’ll try to look into performance improvements there – theoretically, the same |
examples/stream/port_duplex.js
Outdated
if (typeof chunk === 'string') { | ||
chunk = Buffer.from(chunk, encoding); | ||
} | ||
const transferList = this.#transfer ? [chunk.buffer] : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the buffer was allocated with allocUnsafe this will cause problems due to pooling no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there are many ways this is problematic. I'll be updating this such that we always create a new Uint8Array
copy of the Buffer
then transferring that new copy. That causes the fewest headaches overall.
examples/stream/port_duplex.js
Outdated
// transferList. There are several reasons: | ||
// a) Buffer instances are most often created off a pool | ||
// and share the same underlying common ArrayBuffer, | ||
// transferring those can break Node.js in many ways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment in the Node.js thread, this is true but not necessarily a concern in the future. The sceond concern is definitely there, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example is complicated enough to either live in its own module or be a part of this one.
In case I would recommend the name https://www.npmjs.com/package/scivolo (water slide).
examples/stream/port_duplex.js
Outdated
// TODO(@jasnell): A more complete example would | ||
// handle this error more appropriately. | ||
this.#port.close(); | ||
console.error(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should call the callback here.
examples/stream/port_duplex.js
Outdated
_write (chunk, encoding, callback) { | ||
if (typeof chunk === 'string') { | ||
chunk = Buffer.from(chunk, encoding); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always be a buffer, you are not allowing the encoding to be changed.
// will cause unexpected and undefined behavior that | ||
// will likely crash the Node.js process. | ||
this.#port.postMessage(chunk); | ||
callback(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling there should be some sort of backpressure built in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely it should
I agree. It should definitely live in it's own module and we should be very prescriptive about how and why to use it. @addaleax, I know you were looking at porting the code we use in core to wrap stdio for workers, let's bundle that into the new project. I'll get that repo started by later tonight. |
Converted this PR to a draft so it doesn't get landed. Will be iterating on this further. |
This is going to be revisited later. |
@addaleax @mcollina ... here's a super simple example of how streams can be supported.
@addaleax ... interestingly, this example hits an Abort inside Node.js when we replace
createBrotliCompress
withcreateGzip
orcreateDeflate
within the worker.js...The reason, for the abort, as far as I can see, is the fact that the
PortDuplex
transfers ownership of the buffer chunk over to the main thread using a transferList, which highlights a significant grey areas with the Streams API ... when using a Writable, and passing aBuffer
, who takes ownership responsibility for that Buffer? The caller or the receiver? I don't think we have a clear answer for that but we definitely need a fix that does not trigger an Abort.