Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZeroTrie, an efficient string-to-int collection #2722

Merged
merged 34 commits into from
Aug 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
438b0e1
Squashed version of df11a8d74d7f60e91b2467d0b25328cc739c2426
sffc Jun 30, 2023
abb6b1d
Merge branch 'main' into zerotrie-squashed
sffc Jun 30, 2023
9c8242f
deps
sffc Jun 30, 2023
7829adb
inline
sffc Jun 30, 2023
c36537a
comments and docs
sffc Jun 30, 2023
52517fb
ZeroTrieFlavor
sffc Jun 30, 2023
993f0c2
Name the type parameter `Store`
sffc Jun 30, 2023
9290ec5
Rearrange code
sffc Jul 4, 2023
60a93af
Add From impls
sffc Jul 4, 2023
059ca93
Don't use ref-cast
sffc Jul 4, 2023
2f25994
fmt
sffc Jul 4, 2023
9df8cae
generate-readmes
sffc Jul 4, 2023
0d6f8fd
Merge branch 'main' into asciitrie
sffc Jul 5, 2023
8345f97
Docs, tests, and function names for varint
sffc Jul 10, 2023
25e71be
Start writing layout docs
sffc Jul 10, 2023
9739ce1
More layout docs
sffc Jul 11, 2023
1d55557
Add some more docs
sffc Jul 16, 2023
9327015
More docs
sffc Jul 16, 2023
b8e06dd
Move example byteph to a unit test and refactor exports of byte_phf
sffc Jul 16, 2023
f4ce8c2
More docs
sffc Jul 16, 2023
a8de97d
atbs_split_first --> atbs_pop_front
sffc Jul 16, 2023
ed65138
NodeType refactor and docs
sffc Jul 16, 2023
4896988
Docs for ZeroTrieIterator
sffc Jul 16, 2023
b484dc1
Move helper functions to helpers.rs
sffc Jul 16, 2023
0f933fc
"must be" comment
sffc Jul 16, 2023
2395d30
f2 docs
sffc Jul 16, 2023
9ab08a6
More docs on builder utilities
sffc Jul 16, 2023
7847652
Some more builder docs
sffc Jul 16, 2023
1a7ac1a
More code review comments
sffc Jul 16, 2023
4973911
Rob feedback
sffc Jul 16, 2023
8df4756
Merge branch 'main' into asciitrie
sffc Jul 16, 2023
1ccb26a
Fix criterion dependency
sffc Jul 16, 2023
c687cd8
fmt & tidy
sffc Jul 16, 2023
3b415bd
Write docs for the builder and simplify it slightly.
sffc Jul 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ members = [
"experimental/relativetime",
"experimental/relativetime/data",
"experimental/unicodeset_parser",
"experimental/zerotrie",
"ffi/capi_cdylib",
"ffi/capi_staticlib",
"ffi/diplomat",
Expand Down
67 changes: 67 additions & 0 deletions experimental/zerotrie/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# This file is part of ICU4X. For terms of use, please see the file
# called LICENSE at the top level of the ICU4X source tree
# (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

[package]
name = "zerotrie"
description = "A data structure that efficiently maps strings to integers"
version = "0.1.0"
authors = ["The ICU4X Project Developers"]
edition = "2021"
readme = "README.md"
repository = "https://github.com/unicode-org/icu4x"
license = "Unicode-DFS-2016"
# Keep this in sync with other crates unless there are exceptions
include = [
"src/**/*",
"examples/**/*",
"benches/**/*",
"tests/**/*",
"Cargo.toml",
"LICENSE",
"README.md"
]

[package.metadata.docs.rs]
all-features = true

[package.metadata.cargo-all-features]
# Bench feature gets tested separately and is only relevant for CI
denylist = ["bench"]

[dependencies]
zerovec = { path = "../../utils/zerovec", optional = true }
litemap = { path = "../../utils/litemap", default-features = false, features = ["alloc"], optional = true }
serde = { version = "1.0", optional = true }
displaydoc = { version = "0.2.3", default-features = false }

[dev-dependencies]
postcard = { version = "1.0", default-features = false, features = ["alloc"] }
serde = { version = "1.0", default-features = false }
zerovec = { path = "../../utils/zerovec", features = ["serde", "hashmap"] }
litemap = { path = "../../utils/litemap" }
criterion = "0.4"
icu_benchmark_macros = { path = "../../tools/benchmark/macros" }
serde_json = "1.0"
bincode = "1.0"
rand = "0.8"
rand_pcg = "0.3"

[lib]
bench = false # This option is required for Benchmark CI
path = "src/lib.rs"

[features]
default = []
bench = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of your benches are behind this feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation; I put the non-ZeroTrie benches behind #[cfg(feature = "bench")]

alloc = []
litemap = ["dep:litemap", "alloc"]
serde = ["dep:serde", "dep:litemap", "alloc", "litemap/serde", "zerovec?/serde"]

[[bench]]
name = "overview"
harness = false

[[test]]
name = "builder_test"
required-features = ["alloc", "litemap"]
51 changes: 51 additions & 0 deletions experimental/zerotrie/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE

See Terms of Use <https://www.unicode.org/copyright.html>
for definitions of Unicode Inc.’s Data Files and Software.

NOTICE TO USER: Carefully read the following legal agreement.
BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"),
YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
TERMS AND CONDITIONS OF THIS AGREEMENT.
IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE
THE DATA FILES OR SOFTWARE.

COPYRIGHT AND PERMISSION NOTICE

Copyright © 1991-2022 Unicode, Inc. All rights reserved.
Distributed under the Terms of Use in https://www.unicode.org/copyright.html.

Permission is hereby granted, free of charge, to any person obtaining
a copy of the Unicode data files and any associated documentation
(the "Data Files") or Unicode software and any associated documentation
(the "Software") to deal in the Data Files or Software
without restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, and/or sell copies of
the Data Files or Software, and to permit persons to whom the Data Files
or Software are furnished to do so, provided that either
(a) this copyright and permission notice appear with all copies
of the Data Files or Software, or
(b) this copyright and permission notice appear in associated
Documentation.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT OF THIRD PARTY RIGHTS.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder
shall not be used in advertising or otherwise to promote the sale,
use or other dealings in these Data Files or Software without prior
written authorization of the copyright holder.


Portions of ICU4X may have been adapted from ICU4C and/or ICU4J.
ICU 1.8.1 to ICU 57.1 © 1995-2016 International Business Machines Corporation and others.
41 changes: 41 additions & 0 deletions experimental/zerotrie/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading