-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements kj::HashSet support for jsg. #3550
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the intended use case here?
IMO HashSet
is not the right C++ representation to correspond to a JS Set
. The problem is, in order to convert between JS Set
and HashSet
, you necessarily have to enumerate the whole set, which is O(n), which probably defeats the purpose of using a Set vs. just using an array or something.
Probably all use cases fall into one of two categories:
- The C++ consumer is going to iterate the whole set. In this case, building the hash index is a waste of time. It would be better to simply deliver an array.
- The C++ consumer only wants to look up certain values in the set. In this case, what is really wanted is some sort of a wrapper that does not iterate the whole set upfront, but instead looks up specific values on-demand.
Or conversely for the C++ -> JS direction:
- The C++ producer is going to construct a one-off set to pass to JS. There's again no reason to construct a hash index, the C++ representation should be an array or vector.
- The C++ producer is passing some long-lived set to JS. In this case, JS should really get a view into the set that doesn't convert the contents until requested, otherwise it's wasteful to re-convert the whole set every time it is requested.
The use case is #3538 (comment) where a C++ function is producing a HashSet and the JS side also wants a Set. My initial solution was to convert the set to an array and return that from the C++ side, then convert to a Set in JS. Dan suggested that jsg should support this so I implemented it. |
Hmm, I see, this does appear to be an unusual use case where you:
So converting a HashSet to a JS Set is actually what you want here, and there's no wasted index-building. In the future we might want to add an alternative representation like:
Kind of like how But I guess we could also support |
I think we still do need an answer for the duplicate question when converting a Set to a HashSet. Possible answers are:
|
It does seem rather unfortunate that what this conversion pattern ends up with is:
It would seem to be more efficient to skip the creation of the kj::HashSet entirely if possible and just have the C++ code just directly create the v8::Set (wrapped as a Is the |
Yeah, the function is also called from C++ in edgeworker to pre-load the correct Python packages. |
5cb3816
to
d91b4d8
Compare
Okay, I fixed the duplicate issue and added a test for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sold on this just yet. If the hashsets are small and the objects in the set are simple enough then it's likely to be ok but there's likely a number of performance issues associated with model.
Consider, for instance, if I have a kj::HashSet<kj::String>. Let's say those strings end up being fairly non-trivial in size. To get that hash set out to JavaScript would mean multiple allocations and copies for each string in addition to iterating over the entire set.
kj::HashSet<kj::String> set;
set.add(kj::str("here is a string")); // allocation
set.add(kj::str("here is another string")); // allocation
Converting this to a JS set means we perform an additional allocation in the v8 heap and do a utf8 write to copy the strings in addition to v8 having to calculate the hashes of those for it's own internal hash index. That just feels super wasteful at scale when the C++ code could just start with the JS Set in the first place.
Shouldn't have hit approve... meant to hit Request changes
@jasnell you are raising the same objections I did initially, but then Dominik pointed to his use case and indeed in that use case this conversion makes sense. He hash a It's debatable whether that is a common enough case to build into JSG, but it does check out in this case. |
Ok well, I won't block it but I'm still not convinced. |
auto out = v8::Set::New(isolate); | ||
for (const auto& item: set) { | ||
out = check(out->Add(context, | ||
static_cast<TypeWrapper*>(this)->wrap(context, creator, item).template As<v8::Value>())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this actually kj::mv(item)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, do you mean that I should pass an owned U
to wrap
? I don't think I can do that from a HashSet, certainly in the case of kj::String it doesn't work. Isn't passing a ref fine here?
d91b4d8
to
5d12f05
Compare
Test Plan