Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Careful microoptimisations for 2x performance improvement #5

Merged
merged 5 commits into from
Jan 14, 2018

Conversation

oxinabox
Copy link
Member

This could do with some careful review.

It halves the time of the constructor (both for found, and not found),
and overall the tests suite runs 30% faster.

@oxinabox oxinabox mentioned this pull request Jan 11, 2018
@oxinabox
Copy link
Member Author

I think the final version might be 5x faster in some cases.

@xiaodaigh can you try this out on your task?
(String sorting was it?)

@oxinabox oxinabox merged commit 90c0c87 into master Jan 14, 2018
@oxinabox
Copy link
Member Author

@xiaodaigh did you ever try this out?

@xiaodaigh
Copy link

Yep I tried just then. It can convert a 100m vector with 1m unique values into InternedStrings vector in about 100 seconds. I tried adding the same 100m vector again and it completed in 93 seconds. This seems quite slow compared to R's 13s.

I then tried to make an interned string pool with 1m unique strings and sample from that to get 100m strings. The conversion of 1m strings took 4.7s and the sampling took 120 seconds

using InternedStrings

@time samplespace = InternedString.("id".*dec.(1:1_000_000, 10));
srand(1);
@time svec = rand(samplespace, 100_000_000);

see R's equivalent

library(data.table)
pt = proc.time()
ss <- sprintf("id%010d",1:1e6)
system.time(x <- sample(ss, 1e8, replace=T))
print(x)
2+2
timetaken(pt)

@oxinabox
Copy link
Member Author

On the v0.3.0 code (without micro opts)
I get

julia> using InternedStrings
julia> @time samplespace = InternedString.("id".*dec.(1:1_000_000, 10));
 10.471583 seconds (31.21 M allocations: 656.495 MiB, 26.22% gc time)

julia> srand(1);
julia> @time svec = rand(samplespace, 100_000_000);
322.384380 seconds (799.95 M allocations: 14.156 GiB, 56.60% gc time)

On the 0.4.0 code, with the microopts I get:

julia> using InternedStrings
julia> @time samplespace = InternedString.("id".*dec.(1:1_000_000, 10));
  6.838613 seconds (13.23 M allocations: 351.488 MiB, 11.40% gc time)
julia> srand(1);
julia> @time svec = rand(samplespace, 100_000_000);
 90.561751 seconds (100.00 M allocations: 2.235 GiB, 16.08% gc time)

I don't see how the R example is the same at all though.
That isn't interning strings. Is it?
My R is pretty weak

To me that looks like

julia> @time samplespace = String.("id".*dec.(1:1_000_000, 10));
  0.231330 seconds (3.01 M allocations: 130.061 MiB)

julia> srand(1);
julia> @time svec = rand(samplespace, 100_000_000);
 22.346334 seconds (2.18 k allocations: 763.058 MiB, 75.21% gc time)

As an aside these tests are kinda worse case scenario for julia dodgy since they are using nonconst globals.

@ScottPJones ScottPJones deleted the ox/microopt branch May 11, 2018 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants