Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Luna performance case — writing memory #239

Closed
mwu-tow opened this issue Aug 8, 2018 · 7 comments
Closed

Luna performance case — writing memory #239

mwu-tow opened this issue Aug 8, 2018 · 7 comments
Assignees
Labels
p-high Should be completed in the next sprint

Comments

@mwu-tow
Copy link
Contributor

mwu-tow commented Aug 8, 2018

Consider the following Luna code:

import Std.Base
import Std.Foreign.C.Value
import Std.Time

def main:
    count = 1000000
    a = Array CInt64 . alloc count
    def write n:
        a.uncheckedWriteAt n $ CInt64.fromInt n
        if n == 0 then None else write n-1

    t0 = Time.now
    write (count-1)
    t1 = Time.now
    print (t1.diff t0)

And the equivalent C++ code:

#include <chrono>
#include <iostream>
#include <string>
#include <vector>

void write(std::vector<int64_t> &array, int64_t n)
{
	array[n] = n;
	if(n)
		write(array, n-1);
}

template<typename F, typename ...Args>
static auto duration(F&& func, Args&&... args)
{
	const auto start = std::chrono::steady_clock::now();
	std::invoke(std::forward<F>(func), std::forward<Args>(args)...);
	return std::chrono::steady_clock::now() - start;
}

template<typename F, typename ...Args>
static auto measure(std::string text, F&& func, Args&&... args)
{
	const auto t = duration(std::forward<F>(func), std::forward<Args>(args)...);
	std::cout << text << " took " << std::chrono::duration_cast<std::chrono::microseconds>(t).count() / 1000.0 << " ms" << std::endl;
	return t;
}

int main()
{
	int count = 1'000'000;
	std::vector<int64_t> v;
	v.resize(count);
	measure("write " + std::to_string(count), write, v, count-1);
}

Basically the program allocates an array of one million 64-bit integers and then writes to each of them using recursive write function.

Luna output:

31630.7974ms

C++ output:

write 1000000 took 0.595 ms

Methodology: ran several times, took the best result
Luna: used shell luna from luna-core
C++: MSVC 15,7 optimized x64 build

Perhaps I am again unknowingly triggering some thunk-exploding lazy evaluation trap.
Otherwise, the results would suggest some serious performance issue, as such task shouldn't be more than 50 000 times slower than C++.

@kustosz
Copy link
Contributor

kustosz commented Aug 8, 2018

@mwu-tow can you please check how these times in Luna scale with the size of array? If they are linear, we have an awesome test case for @iamrecursion to look into :)

@mwu-tow
Copy link
Contributor Author

mwu-tow commented Aug 8, 2018

They seem linear enough.

image

Count Time [ms]
1 0.0
10 1.000900001
100 4.019999999
1000 37.9863
2500 110.9783
5000 220.929600001
10000 373.840300001
25000 946.707
50000 1815.5325
100000 3062.110000001
200000 6056.7706
300000 9237.2525
400000 12961.7436
500000 15301.819100001
600000 19188.2202
700000 21644.0454
800000 26398.623699999
900000 29034.9906
1000000 31063.763700001

@wdanilo wdanilo added D - Core Contributor p-high Should be completed in the next sprint labels Aug 8, 2018
@kustosz
Copy link
Contributor

kustosz commented Aug 8, 2018

Perfect. This means we have a "it's not optimizing well" problem instead of a "it's exponential, semantics are unclear and everything is on fire" problem.

@iamrecursion please take a look at what happens here. I think you can also get away with benchmarking things like recursive adding numbers instead of memory writes, the results are pretty disappointing there too (which I think is a good thing again – if you optimize that super simple code, other things should speed up automatically).

@iamrecursion
Copy link
Contributor

iamrecursion commented Aug 9, 2018

Brilliant catch! I'll add this to my 'to look at' list.

Please do add any performance-related issues to my epic.

@piotrMocz
Copy link

Are we doing tail-call-optimization in Luna by any chance?

@iamrecursion
Copy link
Contributor

Also on my to-look-at-list.

@mikusp
Copy link

mikusp commented Aug 10, 2018

One observation: running luna with +RTS -N1 resulted in better performance than default. I see that GC time rises dramatically, maybe because of contention of 8 threads running garbage collection?

turion pushed a commit to turion/luna that referenced this issue Jun 25, 2019
turion pushed a commit to turion/luna that referenced this issue Jun 25, 2019
turion pushed a commit to turion/luna that referenced this issue Jun 25, 2019
turion pushed a commit to turion/luna that referenced this issue Jun 25, 2019
turion pushed a commit to turion/luna that referenced this issue Jun 25, 2019
turion pushed a commit to turion/luna that referenced this issue Jun 25, 2019
@joenash joenash closed this as completed Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p-high Should be completed in the next sprint
Projects
None yet
Development

No branches or pull requests

7 participants