Skip to content
This repository has been archived by the owner on Nov 27, 2023. It is now read-only.

Maybe also have groupByMap? #3

Closed
bakkot opened this issue Jul 15, 2021 · 46 comments
Closed

Maybe also have groupByMap? #3

bakkot opened this issue Jul 15, 2021 · 46 comments

Comments

@bakkot
Copy link
Contributor

bakkot commented Jul 15, 2021

In the interest of having both the convenience of regular objects and the additional utility of Maps available, albeit not in the same object, perhaps we can have a method which returns a plain object (coalescing return values with ToPropertyKey) as well as a method which returns a Map (coalescing only 0 and -0, and otherwise by identity).

@jridgewell
Copy link
Member

My other thought is that we could have a 3rd argument similar to reduce, which acts as the init value. During the setting phase, we would check if this is a Map/WeakMap, and perform the appropriate set operations. Else, we would treat it as an object. If it's empty, we would initialize to a prototype-less object.

@ljharb
Copy link
Member

ljharb commented Jul 15, 2021

That opens up a bunch of questions - what if the init value is a Set, or a TypedArray (on the array method), or a Tuple, or a Record?

@bakkot
Copy link
Contributor Author

bakkot commented Jul 15, 2021

Interesting thought. To me I think that adds a lot of conceptual complexity for not much value, vs just having two methods.

@theScottyJam
Copy link

theScottyJam commented Jul 20, 2021

One option (similar to what @jridgewell was proposing) is to have the extra parameter expect an object with an "emplace" function (emplace for Map/WeakMap is proposed here). All update operations will happen through the provided emplace function. If this extra parameter is not provided, it'll default to special behavior for grouping into a plain object.

const getParity = i => i % 2 === 0 ? 'even': 'odd'
const mapOfResults = array.groupBy(getParity, new Map());
const weakMapOfResults = array.groupBy(getParity, new WeakMap());

Alternatively, we could create an "emplace" or "groupBy" protocol (via a new well-known symbol) that Map and WeakMap simply implement using their public emplace function, but others may implement through other means.

My preference might be to make this parameter expect an options object, to open up the possibility of customizing other aspects of the groupBy algorithm in the future.

const mapOfResults = array.groupBy(getParity, { into: new Map() });

@zloirock
Copy link
Contributor

I'm against adding a reducer argument since it makes the signature of this method more complex and less readable. Also, objects and maps / weakmaps haven't a common interface for adding new entries.

.groupByMap LGTM and, since maps are iterable and could store any keys, it could be used for converting results to anything:

const weakmap = new WeakMap(array.groupByMap(fn));

@bmeck
Copy link
Member

bmeck commented Jul 20, 2021

I actually think only 1 method would be enough at least for the initial proposal, then you can manage your identity side table rather than spec having nits about things like -0 and 0.

@theScottyJam
Copy link

I agree @bmeck - honestly, I'm not sure how many people would want a map as an output (maybe a lot - maybe my assumptions are wrong, but I would assume the majority just need an object output). People who need map support can build their own groupByMap() function, like they already have to do today. If there seems to be a lot of community need for such functionality, a follow-on proposal can be discussed to provide this functionality, either through an extra parameter or through a separate function.

@bmeck
Copy link
Member

bmeck commented Jul 20, 2021

I'd personally love a Map rather than an object, but I use Maps a lot to associate data.

@zloirock
Copy link
Contributor

zloirock commented Jul 20, 2021

@bmeck null prototype object good for object destructuring, maps for iteration with any keys.

const { foo, bar, baz } = array.groupBy(fn);
// ...
for (const [name, subarray] of array.groupByMap(fn)) { /* ... */ }

Both of those cases are useful, map can't be simply used in the first case, object in the second.

@bmeck
Copy link
Member

bmeck commented Jul 20, 2021

Object is usable in the second with Object.entries(arr.groupBy(fn))

@ljharb
Copy link
Member

ljharb commented Jul 20, 2021

The only thing that's not usable in the second case is if you want the "group by" key to be something other than a string or symbol.

Personally, i think this will be an extreme minority case, but it still seems reasonable to find a way to satisfy it.

@zloirock
Copy link
Contributor

@bmeck

with any keys

@ljharb I just agreed with your opinion written here -)

@bakkot
Copy link
Contributor Author

bakkot commented Jul 20, 2021

I want this quite frequently, personally. The most common case is when you want to bucket things which have two attributes which match - for example, you have a list of addresses and want to bucket those which have both the same city and country. If your cities and countries are represented as strings, you can come up with an awkward solution which creates a single string from both of those, but that's ugly and error-prone (if you just join with ,, what if your city or country has a , in it?), and you have to parse the values back afterwards. It would be much, much nicer to just group by a record containing both attributes.

(Also, it's pretty common to have attributes which are not strings which you want to match on. For example, even if you just care about grouping addresses by city, if your cities are represented as objects with identity you really need a Map.)

@ljharb
Copy link
Member

ljharb commented Jul 20, 2021

Without Records existing as a feature yet, would that be any more convenient than the string approach?

@bakkot
Copy link
Contributor Author

bakkot commented Jul 20, 2021

Yeah, you just have a memoizing "Pair" constructor which returns the same identity object given the same inputs, which approximates Records. In nontrivial code I will often have such a thing around already, but it's only a few lines to reproduce it as necessary:

let memo = new Map;
function makePair(left, right) {
  if (!memo.has(left)) {
    memo.set(left, new Map);
  }
  let lm = memo.get(left);
  if (!lm.has(right)) {
    lm.set(right, [left, right]);
  }
  return lm.get(right);
}

let grouped = addresses.groupByMap(a => makePair(a.city, a.country));

@RoyalIcing
Copy link

RoyalIcing commented Jul 28, 2021

What if instead groupBy returned key/value tuples, of the form that can be passed to either new Map constructor or Object.fromEntries?

@bakkot
Copy link
Contributor Author

bakkot commented Jul 28, 2021

That doesn't really work, because Object.fromEntries is last-wins, not coalescing.

That is:

[{ a: 0 }, { a: '0' }].groupBy(o => o.a); // { 0: [{ a: 0 }, { a: '0' }] }

Object.fromEntries([{ a: 0 }, { a: '0' }].groupByEntries(o => o.a)); // { 0: [{ a: '0' }] }, the first entry has been eaten

Also, point of this thread is that the object form is much more ergonomic for the common case. If you have to do Object.fromEntries() to get an object you can destructure, you might as well just stick with Maps.

@ljharb
Copy link
Member

ljharb commented Jul 28, 2021

@bakkot it'd work if the coalescing was done inside groupBy - that way the only choice left would be if the [0] of the tuple was PropertyKeyed or MapEntried.

Totally agree tho that needing to Object.fromEntries busts up the ergonomics.

@bakkot
Copy link
Contributor Author

bakkot commented Jul 28, 2021

Well, yes, but if you're going to be passing an argument to determine whether to coalesce-like-for-objects or coalesce-like-for-maps, you might as well just go all the way and produce an object or map at that point anyway, rather than a list of entries.

@ljharb
Copy link
Member

ljharb commented Jul 28, 2021

Although, one alternative would be to default to an object - the common case - and allow a simple argument that instead produces entries - so you can make a Map, Set, WeakMap, or whatever you want?

@bakkot
Copy link
Contributor Author

bakkot commented Jul 28, 2021

It's already trivial to get entries out of a Map, and that's the overwhelmingly common thing to want when not just doing a plain object. So I'd prefer to just produce a Map, and let people consume that however they like if they want it in a different form.

@ljharb
Copy link
Member

ljharb commented Jul 28, 2021

For the non-object case, assuming there is an object option, that seems like a reasonable position.

@RoyalIcing
Copy link

That doesn't really work, because Object.fromEntries is last-wins, not coalescing.

That is:

[{ a: 0 }, { a: '0' }].groupBy(o => o.a); // { 0: [{ a: 0 }, { a: '0' }] }

Object.fromEntries([{ a: 0 }, { a: '0' }].groupByEntries(o => o.a)); // { 0: [{ a: '0' }] }, the first entry has been eaten

Also, point of this thread is that the object form is much more ergonomic for the common case. If you have to do Object.fromEntries() to get an object you can destructure, you might as well just stick with Maps.

I should have written an example. I was thinking more like with the grouping done for you:

[{ a: 0 }, { a: '0' }].groupBy(o => o.a); // [ [0, [{ a: 0 }]], ['0', [{ a: '0' }]] ]

Or the example from the readme:

const array = [1, 2, 3, 4, 5];

// groupBy groups items by arbitrary key.
// In this case, we're grouping by even/odd keys
array.groupBy(i => {
  return i % 2 === 0 ? 'even': 'odd';
});

// =>  [ ['odd', [1, 3, 5]], ['even', [2, 4]] ]

I believe passing this result to Object.fromEntries() would produce the same result as in the original readme:

Object.fromEntries([ ['odd', [1, 3, 5]], ['even', [2, 4]] ])
// =>  { odd: [1, 3, 5], even: [2, 4] }

And would also work with Map:

new Map([ ['odd', [1, 3, 5]], ['even', [2, 4]] ])
// =>  Map(2) { 'odd' => [1, 3, 5], 'even' => [2, 4] }

This way you use existing Object & Map methods and open your self up to other data structures, instead of requiring a parameter.

And the top level result doesn’t necessarily have to be an array, as these methods accept any iterable (not sure if that opens up any optimizations).

@bakkot
Copy link
Contributor Author

bakkot commented Jul 28, 2021

I believe passing this result to Object.fromEntries() would produce the same result as in the original readme:

Yes, that particular example works, but the example I gave does not.

If

[{ a: 0 }, { a: '0' }].groupBy(o => o.a); // [ [0, [{ a: 0 }]], ['0', [{ a: '0' }]] ]

then

Object.fromEntries([{ a: 0 }, { a: '0' }].groupBy(o => o.a)); // { 0: [{ a: '0' }] }, the first entry has been eaten

as you can tell by doing

Object.fromEntries([ [0, [{ a: 0 }]], ['0', [{ a: '0' }]] ])

today.

@RoyalIcing
Copy link

Yes, that particular example works, but the example I gave does not.

Just to clarify, your particular example is highlighting how a Map may be better than an Object? I agree, it does sound like a better use.

What I was trying to advocate for was a destination-agnostic method, that can easily be used to create a Map, an Object, or some future immutable JavaScript data type.

I believe Object.fromEntries() is really useful, perhaps it’s a bit of a mouthful and so if it were made more ergonomic it would be more attractive.

Iterables work fantastically as a bridge between different data types, especially as they can be lazily evaluated. It feels as though this capability is quite often forgotten in JavaScript’s APIs, even new ones.

@jridgewell
Copy link
Member

What I was trying to advocate for was a destination-agnostic method, that can easily be used to create a Map, an Object, or some future immutable JavaScript data type.

What @bakkot is trying to explain is that you can't create a destination agnostic method. The destination affects the key equality mechanism. For Object, we need to use ToPropertyKey and string equality. For Maps, we need to just use SameValueZero. The output needs to know that when it's collating.

@CarterLi
Copy link

CarterLi commented Aug 2, 2021

IMO using Map is always a better option than using plain object because of the implicit string conversion. A map can be easily convert to object using Object.fromEntries ( without converting Map to entries )

Object.fromEntries(new Map([[1,1],[2,2]]))

So why not groupByEntries? Because using a hash map to group values is always faster than using plain arrays. Implementation may always use an internal map to group values. Making implementation convert map to entries then users converting entries to map back is completely unnecessary and wastes time

What if users do want to make 1 and '1' the same key? Just always return strings in the callback.

const map = [{ a: 0 }, { a: '0' }].groupByMap(o => o.a + '');
map.get('0') // => [{ a: 0 }, { a: '0' }]
map.get(0) // => undefined
Object.fromEntries(map) // => { '0': [{ a: 0 }, { a: '0' }] }

In addition, we have another proposal: https://github.com/tc39/proposal-collection-normalization

So we can also do:

const map = [{ a: 0 }, { a: '0' }].groupByMap(o => o.a, { coerceKey: String });
map.get('0') // => [{ a: 0 }, { a: '0' }];
map.get(0) // => [{ a: 0 }, { a: '0' }];
Object.fromEntries(map) // => { '0': [{ a: 0 }, { a: '0' }] }

@theScottyJam
Copy link

As a counter-example, compare these two code snippets:

// Example 1
const { minors, adults } = users.groupBy(u => u.age > 18 ? 'adults' : 'minors')
...

// Example 2
const minorsAndAdults = users.groupByMap(u => u.age > 18 ? 'adults' : 'minors')
const minors = minorsAndAdults.get('minors')
const adults = minorsAndAdults.get('adults')
...

I understand there's use cases for grouping by a map and not coercing keys, but I would be against not having a groupByObject option. Javascript developers understand that objects keys are always strings, and shouldn't be surprised by the coercion here any more than the coercing that happens wehe you do obj[notAString] = 2. Examples like the above will be a very common use case for groupBy, and it adds a whole lot of boilerplate and an awkward temporary variable when you use a map data structure.

@bakkot
Copy link
Contributor Author

bakkot commented Aug 3, 2021

You could also write your Example 2 more concisely as

const { minors, adults } = Object.fromEntries(users.groupByMap(u => u.age > 18 ? 'adults' : 'minors'))

but since we expect this to be a very common thing to want to do, it seems silly to require the user to provide the Object.fromEntries wrapper every time.

@rbuckton
Copy link

I brought this up in #9, but I'll quote it here since it somewhat crosses into both issues:

A groupBy into a non iterable/iterator still feels wrong to me, especially if you have more you want to do with the grouped results.

Here's an example from the tooling we use to generate GitHub PR comments for on-demand benchmarks in TypeScript PRs:

function formatReport(benchmark: Benchmark, options: BenchmarkOptions, noStyles: boolean) {
    const cell = formatCell.bind(undefined, noStyles);
    const measurements = from(benchmark.measurements)
        .flatMap(measurement => measurement.pivot())
        .orderBy(measurement => measurement.scenarioIndex)
        .thenBy(measurement => measurement.hostIndex)
        .thenBy(measurement => measurement.metricIndex);

    return html`
    <b>Benchmark Report</b>
    <table border="0" cellpadding="0" cellspacing="0" ${noStyles ? "" : `class="comparison"`}>
        <thead>
            <tr>
                <th ${noStyles ? "align=left" : `class="metric"`}>${cell("Metric")}</th>
                <th ${noStyles ? "align=right" : `class="current mean"`}>${cell("Average")}</th>
                <th ${noStyles ? "align=right" : `class="current best"`}>${cell("Best")}</th>
                <th ${noStyles ? "align=right" : `class="current worst"`}>${cell("Worst")}</th>
            </tr>
        </thead>${
            measurements
                .groupBy(measurement => measurement.measurementName)
                .map(group => html`
        <tr class="group"><th ${noStyles ? "align=left" : ""} colspan="4">${cell(group.key)}</th></tr>
        <tbody>${
            from(group)
                .map(measurement => html`
            <tr class="${getMeasurementClassNames(measurement)}">
                <td ${noStyles ? "align=left" : `class="${getMetricClassNames(measurement)}"`}>${cell(measurement.metric)}</td>
                <td ${noStyles ? "align=right" : `class="${getCurrentMeanClassNames(measurement)}"`}>${cell(formatMean(measurement))}</td>
                <td ${noStyles ? "align=right" : `class="current best"`}>${cell(formatUnit(measurement.minimum, measurement))}</td>
                <td ${noStyles ? "align=right" : `class="current worst"`}>${cell(formatUnit(measurement.maximum, measurement))}</td>
            </tr>`)}
        </tbody>`)}
    </table>`;
}

Here I'm using a .groupBy into a .map. If the result isn't iterable, then I'd have to write

Object.entries(measurements.groupBy(measurement => measurement.measurementName))
  .map(group => html`...`)

Perhaps another approach might be to have the result of groupBy be iterable and support stringified/symbol keys, in a way not too dissimilar from the groups property of result of RegExp.prototype.exec()?

In TS parlance, something like this:

type GroupByGroups<K, V, U extends V = V> =
  & { [P in Extract<K, string | symbol>]: V[] } // handles `string | symbol`
  & { [P in Extract<K, number> as `${P}`]: V[] } // handles Number.prototype.toString()
  & { [P in Extract<K, true> as `true`]: U[] } // handles `true` -> `"true"`
  & { [P in Extract<K, false> as `false`]: V[] } // handles `false` -> `"false"`;

interface GroupByResultArray<K, V, U extends V = V> extends Array<[K, V[]]> {
  groups: GroupByGroups<K, V, U>;
}

declare function groupBy<T, U extends T>(values: Iterable<T>, keySelector: (value: T) => value is U): GroupByResultArray<boolean, T, U>;
declare function groupBy<T, K>(values: Iterable<T>, keySelector: (value: T) => K): GroupByResultArray<K, T>;

// works like this
const { true: a, false: b } = groupBy([dog, cat, bear], (dog): dog is Dog).groups; 
a; // Dog[]
b; // Animal[]

// or like this
const { foo, bar } = groupBy([{ name: "foo" }, { name: "bar" }], x => x.name).groups; 
foo; // { name: string }[]
bar; // { name: string }[]

// or like this
for (const [key, values] of groupBy(people, person => person.familyName)) {
}

In that way you get the best of both worlds, and there's parity with another language feature so the behavior is easy to explain using existing functionality.

@theScottyJam
Copy link

I don't think making the object iterable solves this problem.

const data = [true, 'true']
data.groupBy(x => x) // { 'true': [true, 'true'] }
data.groupByMap(x => x) // Map { true => [true], 'true' => 'true'] }

If the returned object from data.groupBy were iterable, I would expect that it would yield entries that looked like ['true', [true, 'true']]. I would not expect the object to keep around information about what the entries would have looked like, had string conversion not been done. Thus, even if it's iterable, you shouldn't be able to turn the iterator into a map without losing information.


As for your links to how other languages do it - for most of those languages, they don't really have the option to use an object the same way we do in Javascript, so it's hard to compare against them saying "look, they don't use a Javascript-like object as an output" when that's not even an option for them. But, it is intriguing that some of them choose to use give a list of tuples instead of whatever map structure they do have - not sure the reasoning behind that.

@bakkot
Copy link
Contributor Author

bakkot commented Aug 12, 2021

@rbuckton Most of those languages do not have a first-class record type like JavaScript does and so are not directly comparable. In JavaScript libraries, the convention is very strongly to return a string-keyed object. See

That said, I agree that having a Map is strictly more useful, despite breaking with the ecosystem precedent and being less ergonomic, which is why I propose just having two versions. Do you see a downside to that?

(I don't think your proposal to expose a .groups property really solves the problem, since (as discussed above) you need to know whether to perform stringification before coalescing.)

@rbuckton
Copy link

I suppose I'm not opposed to two implementations. The only downside of producing a Map is that you can't continue to chain operations off of it (until/unless Map gets more array-like operations or the iteration helpers proposal makes it through). I'm very much of a fan of fluent APIs where I have a lot of flexibility:

import { from } from "@esfx/iter-query";

from([true, "true", false])
  .groupBy(x => `${x}`)
  .toObject(null, ({ key }) => key, ({ values }) => values.toArray()); // { true: [true, 'true'], false: [false] }
  
from(array)
  .groupBy(x => `${x}`)
  .toMap(({ key }) => key, ({ values }) => values.toArray()); // Map { 'true' => [true, 'true'], 'false' => [false] }

from(array)
  .groupBy(x => `${x}`)
  .map(({ key, values }) => `key: ${key}, values: ${JSON.stringify(values.toArray())}`)
  .toArray()
  .join(`;`); // 'key: true, values: [true,"true"]; key: false, values: [false]'

It may take awhile to get there with various proposals, but I'm hoping we do (at least in some form).

@CarterLi
Copy link

until/unless Map gets more array-like operations or the iteration helpers proposal makes it through.

It's coming: https://github.com/tc39/proposal-collection-methods#proposal

@Ginden
Copy link

Ginden commented Sep 9, 2021

It's coming:

@CarterLi Unfortunely, it isn't coming - it's basically stalled due to subclassing reasons.

@Ginden
Copy link

Ginden commented Sep 29, 2021

BTW, personally I think that we should have groupBy (returning Map) and groupByObject (returning dictionary).

groupByObject should be runtime type-checked - iterator function should be allowed only to return string | symbol (because, let's be honest - '[object Object]' wasn't ever good idea).

@ljharb
Copy link
Member

ljharb commented Sep 29, 2021

@Ginden that presumes the most common use case would be a Map, when i strongly suspect it'll be the Object.

@bakkot
Copy link
Contributor Author

bakkot commented Sep 30, 2021

groupByObject should be runtime type-checked - iterator function should be allowed only to return string | symbol

I feel strongly that the object version of this method should call ToPropertyKey, just like every other property setting/accessing method in the language.

@jridgewell
Copy link
Member

Closing with the committee consensus on groupBy and groupByMap.

@ephys
Copy link

ephys commented Dec 15, 2021

I'm way too late to the party but what about adding the second argument to groupByToMap?

We can't do it for groupBy because of #3 (comment), but it would still be interesting to have the ability to choose which constructor is used when using groupByToMap. It would let us use URLSearchParams, WeakMap, Headers (fetch), a userland Multimap.

@Ginden
Copy link

Ginden commented Dec 15, 2021

Unfortunately, SearchParams isn't subclass of Map.

With Map.prototype.merge you could do:

new MapSubclass().merge(arr.groupByToMap())

BTW, @jridgewell, you can add to Related section: https://tc39.es/proposal-collection-methods/#Map.groupBy

@ephys
Copy link

ephys commented Dec 15, 2021

None of the examples I listed are subclasses of Map, they don't need to be as long as their constructor accepts an array of entries

@zloirock
Copy link
Contributor

@ephys @Ginden it's rather a case for something like that tc39/proposal-iterator-helpers#36

@ephys
Copy link

ephys commented Dec 15, 2021

Yes something somewhat like Iterator#to(Constructor): Array#groupByToMap(cb, ContructorWhichAcceptsEntries) (unless what you meant was that tc39/proposal-iterator-helpers#36 makes this redundant in which case I would disagree as Iterator Helpers is lacking a groupByX method - and if added it should reflect the Array method imho).

Edit: we should move this to a new issue. I commented here as I was initially going to talk about dropping groupByToMap in favor of the constructor but keeping both makes sense so it's unrelated to this thread. More related to #15

@mhofman
Copy link
Member

mhofman commented Dec 15, 2021

One thing I believe was not discussed at the plenary and which was not delved into deeply here is that since groupByToMap uses SameValueZero, there is no way to create a grouping differentiating -0 and 0. If there was a groupByToEntries, the caller could be fully responsible for normalization, with SameValue being used on the user normalized value. If key normalization predicates could be exposed (e.g. on the target constructor), it would make this easier for programs.

assert(Object.is(Object.asKey(0), '0'));
assert(Object.is(Map.asKey(-0), 0));
const arr = [{ a: 0 }, { a: '0' }, {a: -0}];
Object.fromEntries(arr.groupByToEntries(o => Object.asKey(o.a))); // { '0': [{ a: 0 }, { a: '0' }, {a: -0}] };
const map = new Map(arr.groupByToEntries(o => Map.asKey(o.a)));
map.get('0'); // => [{ a: '0' }]
map.get(0); // => [{ a: 0 }, { a: -0 }]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests