What data structures should belong in std? #922

0joshuaolson1 · 2018-04-14T14:03:30Z

Fake motivation:

Associative array data structures like some skiplists, B-trees, and radix/compact prefix trees need less to no randomness to get similar speed with less memory overhead. Ignoring compression, lower memory use means:

A friend's computer needs less RAM to run my interpreter, game/emulator, or crypto client while their browser/antivirus is running too.
Less granularity is required when doing greater-than-memory computations (faster batch processes)
A higher threshold before my server/router/cache/database hits swap (etc.), ulimits, or OOM.

And there's the problems/bikesheds of hashing/collision strategies and entropy source quality that I'd prefer not to leave to a standard library.

Real motivation:

I recently learned about a better crit-bit tree called a qp trie. Some people have measured speedup over hash maps (more for writes) and much better memory usage per key. There's a few Rust ports.

Caveats:

The qp trie's efficiency is somewhat dependent on the distribution of key prefixes (worst-case is still good) and allocation approach/API. This paper's approach to a trie may help.
The author's claims about a DNS organization using it in production show similar(?) results compared to hash array tries, though.

tiehuis · 2018-04-14T23:44:42Z

I think we should definitely add some more data structures in general, as right now we only have the following:

ArrayList
HashMap
LinkedList

Some extra things that may be useful to include.

BTree
Cuckoo Filter/Bloom Filter
Priority Queue
Fixed Ring Buffer (embedded)

Given that the stdlib will probably be heavily revised before 1.0 we can always experiment and figure out what works now with no huge worry. It would be important to clearly document the tradeoffs for each data structure to the user if there is potential overlap.

Aside: I have read that argument on SipHash. It seemed fairly flimsy though since if you have sufficient access to read the secret key from process memory, I think you have bigger issues/considerations.

BraedonWooding · 2018-04-15T01:27:26Z

Just want to note that boost has shown that circular buffers (or fixed ring buffers) are amazingly powerful for queues and stacks providing almost vector/arrayList like speed rather than the often 10x slower deque speed that they commonly are implemented with.

PavelVozenilek · 2018-04-15T23:55:17Z

Practical restriction would be to allow only fully documented data structures.

Simple types like 2D and 3D point should be added, so people are not tempted to create their own.

0joshuaolson1 · 2018-04-15T23:59:52Z

Simple types like 2D and 3D point should be added, so people are not tempted to create their own.

I disagree. Aren't they mostly for graphics, windows, games, and scientific computing? And people should be tempted to make their own in this case - a Point is like the simplest struct ever and has no clear associated functions to make a real module out of.

0joshuaolson1 · 2018-04-16T00:06:35Z

@tiehuis

that argument on SipHash. It seemed fairly flimsy

I don't personally care about SipHash (except its speed when good mixing isn't needed), but I do care that a server using a probabilistic data structure like a hashmap has to deal with (adversarial) worst case bucket balancing, blocking/crappy OS RNGs, etc.

Constant time operations are a whole 'nother matter, though.

0joshuaolson1 · 2018-04-16T00:22:18Z

Speaking of crypto, what about HTTP/2, encoding/decoding, regex, and command line parsing? Certain data structures may be necessary just to implement them, but on the other hand nontrivial algorithms exposed through the standard library increases the maintenance burden of Zig's std due to something as simple as discussing API etc. tradeoffs in issue trackers.

PavelVozenilek · 2018-04-16T07:25:43Z

@0joshuaolson1

Simple types available in standard library may result in compatible libraries. In C every graphics library (like Cairo) got their point and rectangle. Similar mess happened with boolean type in C, and (until stdint.h) with sized integers.

0joshuaolson1 · 2018-04-16T12:22:00Z

I hope Zig's comptime can overcome C/C++/Rust's problem where people need to reimplement parts of the standard library with macros/templates.

BraedonWooding · 2018-04-16T12:56:41Z

Little list of what I think should be in Zig in terms of data structures;

Pt2D/Pt3D
Vec2D/Vec3D
Complex Numbers (even C has it); 'technically' a data structure
Quaternions in all their glory, while Quaternions are complex numbers (or 4 tuples if you prefer) the operations on them are often specific to Quaternions and are often around rots, especially around conversion to angles or a rot like structure for games.
non-probabilistic hashmap
queues/stacks using a circular buffer, deque is just too damn slow, like a factor of up to 100
BigInt (I guess)
Decimal is a huge one too
Matrices are beautiful too

Just felt I should add my two cents

PavelVozenilek · 2018-04-17T13:10:58Z

@BraedonWooding

BigInt (I guess)

Here is detailed tutorial (consisting of 10 parts) how to implement multiple precision arithmetic, usable even for cryptographic applications.

BraedonWooding · 2018-04-23T01:12:33Z

Here is a quick vector thing I drafted up when I had a few hrs;
https://github.com/BraedonWooding/zig/blob/Vec/std/math/vector.zig
To me 300 lines is still like a little small to completely warrant a standard library entry, however I think that the whole keep it consistent business is important :).

fithisux · 2018-04-24T09:48:01Z

"What data structures should belong in std?"
IMHO none. Why? Because the idea behind Zig is to be a more ergonomic/safe C or actually a more ergonomic/safe/portable assembly. The data structures should belong to a library.

I love data structures, kd-trees R-trees and friends among others. Why not create a project outside Zig that has everything dreamed of in various modules, geometric data structures, probabilistic data structures, string processing data structures.

PavelVozenilek · 2018-04-25T14:48:13Z

@fithisux: projects outside would be suspected as being abandoned or obsolete. Being part of the distribution gives some assurance that the bloody thing may actually work.

0joshuaolson1 · 2018-08-07T17:22:09Z

Red-black trees were just merged into master.

andrewrk · 2018-08-07T18:25:52Z

Yep. To further clarify:

Nearly anything that is generally useful, and has minimal or no OS dependencies is welcome into the stdlib pre-1.0.0. This helps act as a test bed for the language, and helps give us real APIs to get working with each other as we work on the language and standard library.

Then, when we are nearing 1.0.0 and the language is stable, we'll have a comprehensive std lib audit. Everything gets fully documented. Many APIs are removed from std. Volunteers are welcome to take ownership of the removed code in separate repositories. APIs that remain are made to be consistent with each other in naming, style, and patterns. The overarching goal here is going to be for std to be as minimal as possible, except where such minimalism would harm the community or introduce dependency issues. Everyone should be happy with the zig package manager before we do this audit.

binary132 · 2018-08-09T16:17:43Z

As a language user I prefer a usable and clean standard library to a featureful and spaghettified one. I find it unlikely that things introduced will later be removed once people depend on them as the language grows.

Consider the enormous size of the Java standard library for instance.

0joshuaolson1 · 2018-08-10T00:11:29Z

I find it unlikely that things introduced will later be removed once people depend on them as the language grows.

If it's difficult to update imports when an std package is moved out, that's a language/tooling problem.

EDIT: And if imports worked a little more like IPFS (for example), nobody would even notice (besides maybe network fetching with version upgrades).

binary132 · 2018-08-10T11:30:16Z

First of all, you do not have control over imported packages. I cannot force someone to update their library to stop using someone else's library which uses deprecated stdlib features. Difficulty with deprecation is a very real problem.

As to API stability, let's not forget the impact of Rust's API stability issues on its community.

I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?

PavelVozenilek · 2018-08-10T13:35:19Z

Libraries shipped with the compiler are expected to be usable. Random code somewhere on the internet is seen as a crap to be avoided.

What I'd like as the ideal solution:

stdlib contains plenty of (properly maintained) libraries.
Project explicitly specifies which libraries from stdlib it wants to use.
These selected libraries (and associated documentation and tests) will be automatically copied inside project file structure.
If a library offers optional features, then it should be possible possible to omit any such feature. When the library (and docs) is copied, this skipped feature is removed from the source code copy (and the docs).

With this the programmer won't be overloaded and would have a chance to customize stdlib code. Proposed band-aid features like struct extensions (#1170) won't be needed.

0joshuaolson1 · 2018-08-11T12:08:38Z

you do not have control over imported packages.

Neither can you control which Zig versions or platforms libraries are written for, but they're not always crap as PavelVozenilek put it. The problems of std API changes and library writers changing things are similar.

I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?

I'd imagine package management can be done separately from building.

tiehuis changed the title ~~Would an alternative to hash maps/sets belong in std?~~ What data-structures should belong in std? Apr 14, 2018

0joshuaolson1 changed the title ~~What data-structures should belong in std?~~ What data structures should belong in std? Apr 16, 2018

andrewrk added this to the 1.0.0 milestone Apr 16, 2018

schmee mentioned this issue Feb 25, 2019

Add priority queue #2008

Merged

andrewrk added the question No questions on the issue tracker, please. label Apr 15, 2020

andrewrk closed this as completed Apr 15, 2020

andrewrk modified the milestones: 1.0.0, 0.7.0 Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What data structures should belong in std? #922

What data structures should belong in std? #922

0joshuaolson1 commented Apr 14, 2018 •

edited

Loading

tiehuis commented Apr 14, 2018

BraedonWooding commented Apr 15, 2018

PavelVozenilek commented Apr 15, 2018

0joshuaolson1 commented Apr 15, 2018 •

edited

Loading

0joshuaolson1 commented Apr 16, 2018 •

edited

Loading

0joshuaolson1 commented Apr 16, 2018 •

edited

Loading

PavelVozenilek commented Apr 16, 2018

0joshuaolson1 commented Apr 16, 2018 •

edited

Loading

BraedonWooding commented Apr 16, 2018

PavelVozenilek commented Apr 17, 2018 •

edited

Loading

BraedonWooding commented Apr 23, 2018

fithisux commented Apr 24, 2018

PavelVozenilek commented Apr 25, 2018

0joshuaolson1 commented Aug 7, 2018

andrewrk commented Aug 7, 2018

binary132 commented Aug 9, 2018 •

edited

Loading

0joshuaolson1 commented Aug 10, 2018 •

edited

Loading

binary132 commented Aug 10, 2018

PavelVozenilek commented Aug 10, 2018

0joshuaolson1 commented Aug 11, 2018 •

edited

Loading

What data structures should belong in std? #922

What data structures should belong in std? #922

Comments

0joshuaolson1 commented Apr 14, 2018 • edited Loading

tiehuis commented Apr 14, 2018

BraedonWooding commented Apr 15, 2018

PavelVozenilek commented Apr 15, 2018

0joshuaolson1 commented Apr 15, 2018 • edited Loading

0joshuaolson1 commented Apr 16, 2018 • edited Loading

0joshuaolson1 commented Apr 16, 2018 • edited Loading

PavelVozenilek commented Apr 16, 2018

0joshuaolson1 commented Apr 16, 2018 • edited Loading

BraedonWooding commented Apr 16, 2018

PavelVozenilek commented Apr 17, 2018 • edited Loading

BraedonWooding commented Apr 23, 2018

fithisux commented Apr 24, 2018

PavelVozenilek commented Apr 25, 2018

0joshuaolson1 commented Aug 7, 2018

andrewrk commented Aug 7, 2018

binary132 commented Aug 9, 2018 • edited Loading

0joshuaolson1 commented Aug 10, 2018 • edited Loading

binary132 commented Aug 10, 2018

PavelVozenilek commented Aug 10, 2018

0joshuaolson1 commented Aug 11, 2018 • edited Loading

0joshuaolson1 commented Apr 14, 2018 •

edited

Loading

0joshuaolson1 commented Apr 15, 2018 •

edited

Loading

0joshuaolson1 commented Apr 16, 2018 •

edited

Loading

0joshuaolson1 commented Apr 16, 2018 •

edited

Loading

0joshuaolson1 commented Apr 16, 2018 •

edited

Loading

PavelVozenilek commented Apr 17, 2018 •

edited

Loading

binary132 commented Aug 9, 2018 •

edited

Loading

0joshuaolson1 commented Aug 10, 2018 •

edited

Loading

0joshuaolson1 commented Aug 11, 2018 •

edited

Loading