Skip to content
This repository has been archived by the owner on Dec 19, 2021. It is now read-only.

Implement embedding a Trinary Search Tree #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tgulacsi
Copy link

It is good for a lot of small files - fill the tree,
GOB encode, gzip then store as base64 in the source.

On init, the code decodes the base64, uncompresses, decodes,
then fills the names slice and the values map from the tree.

Benchmarks with the zoneinfo.zip:

$ rm zoneinfo/.go; time embedfiles -out zoneinfo/zi.go -pkg=tz -trie=false zoneinfo/ ; ls -l zoneinfo/.go
real 0m1,276s
user 0m1,599s
sys 0m0,033s
-rw-r--r-- 1 gthomas gthomas 2721295 jan 25 15:51 zoneinfo/zi.go

$ rm zoneinfo/.go; time embedfiles -out zoneinfo/zi.go -pkg=tz -trie=true zoneinfo/ ; ls -l zoneinfo/.go
real 0m0,045s
user 0m0,031s
sys 0m0,019s
-rw-r--r-- 1 gthomas gthomas 295428 jan 25 15:51 zoneinfo/zi.go

So a tenfold decrease in source size.

It is good for a lot of small files - fill the tree,
GOB encode, gzip then store as base64 in the source.

On init, the code decodes the base64, uncompresses, decodes,
then fills the names slice and the values map from the tree.

Benchmarks with the zoneinfo.zip:

$ rm zoneinfo/*.go; time embedfiles -out zoneinfo/zi.go -pkg=tz -trie=false zoneinfo/ ; ls -l zoneinfo/*.go
real    0m1,276s
user    0m1,599s
sys     0m0,033s
-rw-r--r-- 1 gthomas gthomas 2721295 jan   25 15:51 zoneinfo/zi.go

$ rm zoneinfo/*.go; time embedfiles -out zoneinfo/zi.go -pkg=tz -trie=true zoneinfo/ ; ls -l zoneinfo/*.go
real    0m0,045s
user    0m0,031s
sys     0m0,019s
-rw-r--r-- 1 gthomas gthomas 295428 jan   25 15:51 zoneinfo/zi.go

So a tenfold decrease in source size.
@leighmcculloch
Copy link
Owner

leighmcculloch commented Feb 4, 2020

Hi @tgulacsi, wow thanks for optimizing how the files are stored. I wouldn't have expected such a huge time and space saving, but I guess it makes sense as I'm not really doing anything for performance.

Do you know how much benefit we get from using only one or the other? i.e. Does gzipping the files by itself cause a significant speed benefit as well, or is the speed benefit all from the TST?

It is slower ans bigger than the GOB'd trie, buf less then a third
in time and size than the original.

$ time embedfiles -out=zoneinfo.go -pkg=tz -trie=true zoneinfo/; l
zoneinfo.go

real    0m0,033s
user    0m0,025s
sys     0m0,013s
-rw-r--r-- 1 gthomas gthomas 295091 febr   4 18:08 zoneinfo.go
:gthomas@redpath: ~/src/github.com/leighmcculloch/go-tz
$ time embedfiles -out=zoneinfo.go -pkg=tz -gzip=true zoneinfo/; l
zoneinfo.go

real    0m0,310s
user    0m0,473s
sys     0m0,032s
-rw-r--r-- 1 gthomas gthomas 602555 febr   4 18:08 zoneinfo.go
:gthomas@redpath: ~/src/github.com/leighmcculloch/go-tz
$ time embedfiles -out=zoneinfo.go -pkg=tz -gzip=false zoneinfo/; l
zoneinfo.go

real    0m1,334s
user    0m1,604s
sys     0m0,045s
-rw-r--r-- 1 gthomas gthomas 2721247 febr   4 18:08 zoneinfo.go
@tgulacsi
Copy link
Author

tgulacsi commented Feb 4, 2020

The fastest and less memory consuming would be to uncompress only when retrieving the file - so no map, just the file name list, and a function to retrieve (decode & uncompress) the data (zoneinfo).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants