Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What data structures should belong in std? #922

Closed
0joshuaolson1 opened this issue Apr 14, 2018 · 20 comments
Closed

What data structures should belong in std? #922

0joshuaolson1 opened this issue Apr 14, 2018 · 20 comments
Labels
question No questions on the issue tracker, please.
Milestone

Comments

@0joshuaolson1
Copy link

0joshuaolson1 commented Apr 14, 2018

Fake motivation:

Associative array data structures like some skiplists, B-trees, and radix/compact prefix trees need less to no randomness to get similar speed with less memory overhead. Ignoring compression, lower memory use means:

  • A friend's computer needs less RAM to run my interpreter, game/emulator, or crypto client while their browser/antivirus is running too.
  • Less granularity is required when doing greater-than-memory computations (faster batch processes)
  • A higher threshold before my server/router/cache/database hits swap (etc.), ulimits, or OOM.

And there's the problems/bikesheds of hashing/collision strategies and entropy source quality that I'd prefer not to leave to a standard library.

Real motivation:

I recently learned about a better crit-bit tree called a qp trie. Some people have measured speedup over hash maps (more for writes) and much better memory usage per key. There's a few Rust ports.

Caveats:

  • The qp trie's efficiency is somewhat dependent on the distribution of key prefixes (worst-case is still good) and allocation approach/API. This paper's approach to a trie may help.
  • The author's claims about a DNS organization using it in production show similar(?) results compared to hash array tries, though.
@tiehuis
Copy link
Member

tiehuis commented Apr 14, 2018

I think we should definitely add some more data structures in general, as right now we only have the following:

  • ArrayList
  • HashMap
  • LinkedList

Some extra things that may be useful to include.

  • BTree
  • Cuckoo Filter/Bloom Filter
  • Priority Queue
  • Fixed Ring Buffer (embedded)

Given that the stdlib will probably be heavily revised before 1.0 we can always experiment and figure out what works now with no huge worry. It would be important to clearly document the tradeoffs for each data structure to the user if there is potential overlap.

Aside: I have read that argument on SipHash. It seemed fairly flimsy though since if you have sufficient access to read the secret key from process memory, I think you have bigger issues/considerations.

@tiehuis tiehuis changed the title Would an alternative to hash maps/sets belong in std? What data-structures should belong in std? Apr 14, 2018
@BraedonWooding
Copy link
Contributor

Just want to note that boost has shown that circular buffers (or fixed ring buffers) are amazingly powerful for queues and stacks providing almost vector/arrayList like speed rather than the often 10x slower deque speed that they commonly are implemented with.

@PavelVozenilek
Copy link

Practical restriction would be to allow only fully documented data structures.

Simple types like 2D and 3D point should be added, so people are not tempted to create their own.

@0joshuaolson1
Copy link
Author

0joshuaolson1 commented Apr 15, 2018

Simple types like 2D and 3D point should be added, so people are not tempted to create their own.

I disagree. Aren't they mostly for graphics, windows, games, and scientific computing? And people should be tempted to make their own in this case - a Point is like the simplest struct ever and has no clear associated functions to make a real module out of.

@0joshuaolson1
Copy link
Author

0joshuaolson1 commented Apr 16, 2018

@tiehuis

that argument on SipHash. It seemed fairly flimsy

I don't personally care about SipHash (except its speed when good mixing isn't needed), but I do care that a server using a probabilistic data structure like a hashmap has to deal with (adversarial) worst case bucket balancing, blocking/crappy OS RNGs, etc.

Constant time operations are a whole 'nother matter, though.

@0joshuaolson1
Copy link
Author

0joshuaolson1 commented Apr 16, 2018

Speaking of crypto, what about HTTP/2, encoding/decoding, regex, and command line parsing? Certain data structures may be necessary just to implement them, but on the other hand nontrivial algorithms exposed through the standard library increases the maintenance burden of Zig's std due to something as simple as discussing API etc. tradeoffs in issue trackers.

@0joshuaolson1 0joshuaolson1 changed the title What data-structures should belong in std? What data structures should belong in std? Apr 16, 2018
@andrewrk andrewrk added this to the 1.0.0 milestone Apr 16, 2018
@PavelVozenilek
Copy link

@0joshuaolson1

Simple types available in standard library may result in compatible libraries. In C every graphics library (like Cairo) got their point and rectangle. Similar mess happened with boolean type in C, and (until stdint.h) with sized integers.

@0joshuaolson1
Copy link
Author

0joshuaolson1 commented Apr 16, 2018

I hope Zig's comptime can overcome C/C++/Rust's problem where people need to reimplement parts of the standard library with macros/templates.

@BraedonWooding
Copy link
Contributor

Little list of what I think should be in Zig in terms of data structures;

  • Pt2D/Pt3D
  • Vec2D/Vec3D
  • Complex Numbers (even C has it); 'technically' a data structure
  • Quaternions in all their glory, while Quaternions are complex numbers (or 4 tuples if you prefer) the operations on them are often specific to Quaternions and are often around rots, especially around conversion to angles or a rot like structure for games.
  • non-probabilistic hashmap
  • queues/stacks using a circular buffer, deque is just too damn slow, like a factor of up to 100
  • BigInt (I guess)
  • Decimal is a huge one too
  • Matrices are beautiful too

Just felt I should add my two cents

@PavelVozenilek
Copy link

PavelVozenilek commented Apr 17, 2018

@BraedonWooding

BigInt (I guess)

Here is detailed tutorial (consisting of 10 parts) how to implement multiple precision arithmetic, usable even for cryptographic applications.

@BraedonWooding
Copy link
Contributor

Here is a quick vector thing I drafted up when I had a few hrs;
https://github.com/BraedonWooding/zig/blob/Vec/std/math/vector.zig
To me 300 lines is still like a little small to completely warrant a standard library entry, however I think that the whole keep it consistent business is important :).

@fithisux
Copy link

"What data structures should belong in std?"
IMHO none. Why? Because the idea behind Zig is to be a more ergonomic/safe C or actually a more ergonomic/safe/portable assembly. The data structures should belong to a library.

I love data structures, kd-trees R-trees and friends among others. Why not create a project outside Zig that has everything dreamed of in various modules, geometric data structures, probabilistic data structures, string processing data structures.

@PavelVozenilek
Copy link

@fithisux: projects outside would be suspected as being abandoned or obsolete. Being part of the distribution gives some assurance that the bloody thing may actually work.

@0joshuaolson1
Copy link
Author

Red-black trees were just merged into master.

@andrewrk
Copy link
Member

andrewrk commented Aug 7, 2018

Yep. To further clarify:

Nearly anything that is generally useful, and has minimal or no OS dependencies is welcome into the stdlib pre-1.0.0. This helps act as a test bed for the language, and helps give us real APIs to get working with each other as we work on the language and standard library.

Then, when we are nearing 1.0.0 and the language is stable, we'll have a comprehensive std lib audit. Everything gets fully documented. Many APIs are removed from std. Volunteers are welcome to take ownership of the removed code in separate repositories. APIs that remain are made to be consistent with each other in naming, style, and patterns. The overarching goal here is going to be for std to be as minimal as possible, except where such minimalism would harm the community or introduce dependency issues. Everyone should be happy with the zig package manager before we do this audit.

@binary132
Copy link

binary132 commented Aug 9, 2018

As a language user I prefer a usable and clean standard library to a featureful and spaghettified one. I find it unlikely that things introduced will later be removed once people depend on them as the language grows.

Consider the enormous size of the Java standard library for instance.

@0joshuaolson1
Copy link
Author

0joshuaolson1 commented Aug 10, 2018

I find it unlikely that things introduced will later be removed once people depend on them as the language grows.

If it's difficult to update imports when an std package is moved out, that's a language/tooling problem.

EDIT: And if imports worked a little more like IPFS (for example), nobody would even notice (besides maybe network fetching with version upgrades).

@binary132
Copy link

First of all, you do not have control over imported packages. I cannot force someone to update their library to stop using someone else's library which uses deprecated stdlib features. Difficulty with deprecation is a very real problem.

As to API stability, let's not forget the impact of Rust's API stability issues on its community.

I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?

@PavelVozenilek
Copy link

Libraries shipped with the compiler are expected to be usable. Random code somewhere on the internet is seen as a crap to be avoided.

What I'd like as the ideal solution:

  1. stdlib contains plenty of (properly maintained) libraries.
  2. Project explicitly specifies which libraries from stdlib it wants to use.
  3. These selected libraries (and associated documentation and tests) will be automatically copied inside project file structure.
  4. If a library offers optional features, then it should be possible possible to omit any such feature. When the library (and docs) is copied, this skipped feature is removed from the source code copy (and the docs).

With this the programmer won't be overloaded and would have a chance to customize stdlib code. Proposed band-aid features like struct extensions (#1170) won't be needed.

@0joshuaolson1
Copy link
Author

0joshuaolson1 commented Aug 11, 2018

you do not have control over imported packages.

Neither can you control which Zig versions or platforms libraries are written for, but they're not always crap as PavelVozenilek put it. The problems of std API changes and library writers changing things are similar.

I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?

I'd imagine package management can be done separately from building.

@andrewrk andrewrk added the question No questions on the issue tracker, please. label Apr 15, 2020
@andrewrk andrewrk modified the milestones: 1.0.0, 0.7.0 Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question No questions on the issue tracker, please.
Projects
None yet
Development

No branches or pull requests

7 participants