Added SharedArray #4939

amitmurthy · 2013-11-26T11:51:36Z

Note: This was originally RFC : WIP on adding shmem support to DArrays. Has evolved considerably and finally it was decided to implement SharedArray as a separate type.

Inspired by Tim Holy's SharedArrays PR.
The idea here is to add shmem support to DArrays, where shmem is used if possible. If not, it will default to current darray behaviour. So users can write code requesting DArrays with shmem=true, and it will work even in situations where shmem is not possible, albeit slower.
new kw args shmem=false, safe_r=false, safe_w=true in the DArray constructors.
If the user is on Windows or if the requested procs() are not all on the same host, shmem will not be used.
If shmem=true and safe_r=true then for all practical purposes, performance will be similar to existing DArray since requests will be remotely fulfilled via a remotecall into the process holding the relevant chunk.

TODO:

setindex for DArray in general
tests for DArray / DArray with shmem

Would like feedback whether we should go down this path, or keep shmem support distinct like Tim's SharedArrays or do both.

ivarne · 2013-11-26T11:56:35Z

This is a feature that might be worthy of a mention in the NEWS.md file.

timholy · 2013-11-26T15:44:52Z

Amit, thanks for tackling this.

One of my initial concerns about incorporating this into DArray was that A[i,j] can be guaranteed to be fast only if you're using shared memory. I was concerned that there might be circumstances where the choice of the best algorithm would depend on its internal representation.

However, it now occurs to me that if needed perhaps we could add another parameter to DArray:

type DArray{T,N,A,isShared}

where that last parameter is an integer, 0 or 1. Then one could perform dispatch on it.

Now that I think this through, I suspect the best approach might be the following: (1) merge this rather than my SharedArrays (it's more flexible); and (2) if/when my concern manifests, we can extend the DArray type parameters as suggested above (but don't do this until it becomes an issue).

Thoughts?

amitmurthy · 2013-11-26T16:46:59Z

Your last comment was on the first commit. If you look at commit 24dc0b6, you will notice that d.local_shmmap will always have the correct parameterized type when it is used. I didn't understand what you meant by "where we'll need to use dispatch"

amitmurthy · 2013-11-26T16:54:21Z

Ah! OK, I get it. Will remove the Union from d.local_shmmap

timholy · 2013-11-26T16:56:59Z

Ah! OK, I get it. Will remove the Union from d.local_shmmap

About to write a response, but you got there first...

A[i,j], when A is shared, needs to be no more complicated than a pointer-lookup (which is what it is when A is a plain Array). Otherwise there's too much overhead to referencing individual elements of the array.

ViralBShah · 2013-11-27T13:43:57Z

This is a really nice optimization to have.

ViralBShah · 2013-11-27T13:51:31Z

You probably haven't tried this on a mac yet, but I get:

julia> a = dzeros(100,100, shmem=true)
ERROR: shm_open() failed
 in shm_mmap_array at darray.jl:461
 in DArray at darray.jl:106
 in DArray at darray.jl:169
 in DArray at darray.jl:171
 in dzeros at darray.jl:279
 in dzeros at darray.jl:280

amitmurthy · 2013-11-27T15:28:42Z

cc: @tanmaykm

timholy · 2013-11-27T16:16:18Z

@amitmurthy, here's an example of where I think we should go with this. I should have started by forking your shmem branch to my github account, but instead I did it all locally then pushed to my account. I seem to be having trouble figuring out how to get GitHub to set your repository as the base fork for my PR (it doesn't list it in the drop-down box), so perhaps best is to see this commit: https://github.com/timholy/julia/commit/262d5e9d026e3f53f3e42dd5f2e065d983e9284c

My not-so-secret ambition is to get to the point where, for any sizable chunk of data, you might as well use a DArray as an Array. Then parallelism can start invading all kinds of algorithms (like sum, fill!, etc) in base Julia.

Test script:

A = ones(1000, 1100)
sum(A)
@time sum(A)
D = dones(1000, 1100; shmem=true)   # or set to false
sum(D)
@time sum(D)

Note that sum is not yet parallel, this is simply to measure whether there is any penalty for using a DArray.

With shmem=false (i.e., like what we had before Amit's change):

julia> include("/tmp/testdarray.jl")
elapsed time: 0.002323238 seconds (64 bytes allocated)
elapsed time: 3.764029668 seconds (793688032 bytes allocated)
1.1e6

With shmem=true:

julia> include("/tmp/testdarray.jl")
elapsed time: 0.003227348 seconds (64 bytes allocated)
elapsed time: 0.097602358 seconds (35201472 bytes allocated)
1.1e6

Much, much faster! But still far too slow compared to plain arrays (30x slower).

With the changes in that commit to my own fork:

julia> include("/tmp/testdarray.jl")
elapsed time: 0.002355822 seconds (64 bytes allocated)
elapsed time: 0.001969175 seconds (64 bytes allocated)
1.1e6

Here, there is no gap between Array and DArray. (The difference between the two appears to be noise, it's not that the DArray is faster.)

I'm not sure I understand what the safe_r and friends mean, so I just did the minimum needed to illustrate the point.

amitmurthy · 2013-11-27T16:47:31Z

This is really a nice learning for me on using dispatch for improving performance. Will incorporate it.

Thanks.

staticfloat · 2013-11-27T20:06:33Z

@timholy when you go to open a pull request, you can click "edit" on the righthand side of the base and head repos. The dropdown box next to the "base fork" allows you to type in whatever user you want to compare. Or, if you're impatient, you can just manually edit the URL in your browser to do the comparison you want. Like this.

timholy · 2013-11-27T20:44:34Z

Hmm, when I tried typing in the "filter" box amitmurthy/julia, it didn't accept that input. I like the URL solution, thanks!

staticfloat · 2013-11-27T20:52:01Z

Just type amitmurthy. It won't let you base off a different repo entirely.
On Nov 27, 2013 12:44 PM, "Tim Holy" [email protected] wrote:

Hmm, when I tried typing in the "filter" box amitmurthy/julia, it didn't
accept that input. I like the URL solution, thanks!

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4939#issuecomment-29418676
.

amitmurthy · 2013-11-28T04:24:43Z

Tim,

safe_r was meant to serialize (as in not concurrent) reads via the worker holding the appropriate chunck. But since shmem=true, safe_r=true has the same behavior as a non-shmem DArray, I think I'll drop it altogether. shmem=true would mean that reads can be concurrent for any index into the array across all workers.

safe_w means safe write, i.e. serializing writes to a chunk via the appropriate worker. I'll retain this.

One more thing:
getindex{T,N,A}(d::DArray{T,N,A,1}, i::Int) = getindex(d.local_shmmap, i) is what made the code efficient. The current DArray implementation allows us to serialize (as in into a stream) DArray objects onto workers that do not have any chunks of the darray locally and it still works. In order to support the same, is there a way to deserialize d::DArray{T,N,A,1} as d::DArray{T,N,A,0} on workers that have not mapped the shmem segment?

amitmurthy · 2013-11-28T04:55:24Z

OK. I think I figured out how I can do it. On workers where it is not possible to map the shmem, darray's deserialize will make a copy of the deserialized d::DArray{T,N,A,1} as d::DArray{T,N,A,0} and return that one.

amitmurthy · 2013-11-28T07:17:31Z

Hey Tim, your suggestions have been incorporated.

Darray on a worker without the shmem mapping works as a non-shmem darray.

julia> addprocs(3)
julia> A = dones(100, 110);
julia> sum(A);
julia> @time sum(A);
elapsed time: 2.56274503 seconds (63934396 bytes allocated)

julia> D = dones(100, 110; shmem=true);   # or set to false
julia> sum(D);
julia> @time sum(D);
elapsed time: 1.3836e-5 seconds (64 bytes allocated)

julia> addprocs(1)

julia> remotecall_fetch(4, d->sum(d), D);
julia> @time remotecall_fetch(4, d->begin sum(d); @time sum(d) end, D)
        From worker 4:  elapsed time: 1.3544e-5 seconds (6740 bytes allocated)
elapsed time: 0.113249829 seconds (1346232 bytes allocated)

julia> remotecall_fetch(5, d->sum(d), D);
julia> @time remotecall_fetch(5, d->begin @time sum(d) end, D)
        From worker 5:  elapsed time: 2.620149135 seconds (63307272 bytes allocated)
elapsed time: 2.722422997 seconds (83304 bytes allocated)

ViralBShah · 2013-11-28T13:56:41Z

This is really cool. I can confirm that it works on OS X for me.

ViralBShah · 2013-11-29T05:17:01Z

I would prefer that we use shmem by default when it is applicable. The keyword argument is certainly useful to have for cases where you want to turn it off for debugging purposes. Are safe_r and safe_w gone now?

ViralBShah · 2013-11-29T05:17:30Z

base/multi.jl

@@ -1665,3 +1665,16 @@ function interrupt(pids::AbstractVector=workers())
        end
    end
 end
+
+
+function islocalconnection(id)


Should this always return false on Windows, or is this implementation likely to work on Windows?

Should work on windows too. It just checks if the worker TCP connection is on the same host. A non-exported function only used by DArray in shmem mode.

amitmurthy · 2013-11-29T05:21:04Z

safe_r is gone.
safe_w exists.
I have yet to put in setindex!

ViralBShah · 2013-11-29T05:25:50Z

Can we call it safe_write? I presume you are providing this for cases where we need to avoid unordered simultaneous writes by serializing them through the processor that owns the memory?

amitmurthy · 2013-11-29T05:28:00Z

Yes, that is the reason. safe_write is fine. By default it will be false in the shmem case. In non-shmem, it is necessarily true.

ViralBShah · 2013-11-29T05:28:24Z

How about mapping Array as a shmem segment across multiple processors? We could even map existing sparse matrices across various processors. This would then make it possible for operations that can easily parallelize to work on different parts of the array.

amitmurthy · 2013-11-29T05:31:37Z

It will still involve copying data from an existing Array into the shmem segment.

ViralBShah · 2013-11-29T09:00:52Z

I think it would be ok to copy it over the first time.

amitmurthy · 2013-11-29T11:20:54Z

Here is the current status:

DArray constructors have 2 additional parameters : shmem=false, safe_write=false. With shmem=true, the array is created in shared memory. If we are unable to create in shared memory, currently it just gives a warning and defaults to non-shmem DArray. If safe_write=true, all writes are serialized via the respective chunk owners.

Question: Should we just give an error if shmem cannot be supported given that that the performance differences between shmem and non-shmem is so large.

In case of shmem, the init function is called to allocate the entire array which is then copied onto the shmem segment. This has been done in order to have the same expectation from the init function for shmem as well as non-shmem.

Any suggestions on avoiding this?

@ViralBShah, DArray already had a distribute function to distribute a regular array and return a DArray. If you pass shmem=true, it will now just copy it directly

basic setindex! (using Int indexes only) is available. But since the whole idea of DArray is that computation will be distributed, and folks are expected to work off localparts using myindexes, it does not make sense to support the whole gamut of setindex! as defined for Array. The localpart is of type Array anyway.

@timholy , w.r.t to your desire of "...you might as well use a DArray as an Array.", it may not completely possible. The following questions arise:

For the DArray{T,N,A,1}, i.e., using shmem, we can directly map many Array functions. However, unless we do this for the non-shmem cases too - a bit complicated given the chunked representation and poor performance too - there will be a disparity between DArray{T,N,A,1} and DArray{T,N,A,0}. Is this OK?
Array functions which result in the size of the array changing cannot be supported in DArray
Some Array methods return new arrays (e.g., rotl90, rotr90, etc) - Should similar functions in DArray return an Array or new DArrays?

cc: @JeffBezanson : Can you have a look at this?

timholy · 2013-11-29T12:36:40Z

Amit, this looks great. Some questions and responses:

On workers where it is not possible to map the shmem, darray's deserialize will make a copy of the deserialized d::DArray{T,N,A,1} as d::DArray{T,N,A,0} and return that one.

Is this something that the caller can determine in advance? We don't want to get into a situation like the following: suppose there are 5 processes, 4 can access the shmem. To parallelize an operation, the caller farms out a chunk to each of the processes. The 4 of them complete their chunk with blinding speed, but the 5th one is very slow, and this makes the entire operation slow. I'd much rather assign the task to just the 4 that can do shmem, and ignore the 5th altogether.

In other words, in my view a DArray that is declared shmem should not include processes that can't use it.

Question: Should we just give an error if shmem cannot be supported given that that the performance differences between shmem and non-shmem is so large.

I'd favor this. People can always wrap in a try/catch block if they want to write code that won't fail. But is Windows the only platform that doesn't yet support this? I'd favor getting Windows working, too, if possible, and then perhaps this won't even be something we have to worry about. (Although we should plan for Julia to spread to other platforms too, like Android.) I'd be happy to do my best to help make this work (I don't know Windows very well at all, but I could at least give it a try, or perhaps there are others who might tackle this).

In case of shmem, the init function is called to allocate the entire array which is then copied onto the shmem segment. This has been done in order to have the same expectation from the init function for shmem as well as non-shmem. Any suggestions on avoiding this?

I would favor changing the convention here. I guess the key question is whether DArray is supposed to allow chunks that are AbstractArrays but not Arrays. If so, then we have several issues to fix, including adding another parameter to DArray that represents the type of the local_shmmap. (@JeffBezanson, your input here is desired.) We might then need to pass two functions, one to allocate and one to initialize, and use just the latter in the case of shmem.

But since the whole idea of DArray is that computation will be distributed, and folks are expected to work off localparts using myindexes, it does not make sense to support the whole gamut of setindex! as defined for Array. The localpart is of type Array anyway.

My view on this is a little different. There is going to be some overhead to farming a task out to workers. It might be faster to run my fill! operation from a single process (it might be faster than paying the overhead of starting up workers and synchronizing their completion). But then I'd like to distribute my maximum-intensity-projection algorithm to multiple processes (because it's a slow operation, and will benefit from parallelization), and finally I'll want to save the result to disk using a single process (because it's easier to think about I/O from a single process, and I'm lazy). In SharedArray, what I was aiming for was a type that can work seamlessly in both of these modes; if DArray is going to supersede it, perhaps we need to take seriously the possibility that a DArray will be used exactly like a regular Array.

w.r.t to your desire of "...you might as well use a DArray as an Array.", it may not completely possible....there will be a disparity between DArray{T,N,A,1} and DArray{T,N,A,0}. Is this OK?

As far as I'm concerned, yes. Right now, outside of libraries like OpenBLAS and FFTW, julia is not really doing much to exploit multi-core machines, and my main interest is changing that. I'd be very content with providing good base library support for just the shared-memory versions, and declaring that in other cases you're on your own (likely through a package).

Array functions which result in the size of the array changing cannot be supported in DArray

That's OK. For reductions, etc, we'll probably want to have versions where the output can be pre-allocated.

Some Array methods return new arrays (e.g., rotl90, rotr90, etc) - Should similar functions in DArray return an Array or new DArrays?

I'd say DArrays. We could define a similar function for DArray.

amitmurthy · 2013-11-29T15:15:57Z

Is this something that the caller can determine in advance?

Yes. This is just a hypothetical situation like this:

addprocs(4)
d=dzeros(100,100; shmem= true) # d created on pid 1, also mapped on 2,3,4,5

addprocs("foo@some_other_host") # lets say this creates pid 6
remotecall_fetch(6, x->sum(x), d)

The above works with a regular DArray. But in the shmem case, while it
currently works, I think I'll change it to just throw an error.

In other words, in my view a DArray that is declared shmem should not
include processes that can't use it.

The existing DArray implementation supports accessing the darray even from
processes that were not involved in the construction process. But, I think
you are right, we will disallow the same in the shmem case.

Question: Should we just give an error if shmem cannot be supported
given that that the performance differences between shmem and non-shmem is
so large.

I'd favor this.....

There are two situations. One is Windows, where darray shmem support will
hopefully come sooner than later. The other is where the programmer
inadvertently does
d=dzeros((100,100), [2,3,4,5,6]; shmem=true) and process 6 happens to be
on some_other_host . I'll change both these situations to throw errors
instead of defaulting to non-shmem darray

We might then need to pass two functions, one to allocate and one to
initialize, and use just the latter in the case of shmem.

Actually if we decide to disallow any conversion between shmem darray and
non-shmem darray, this is a non-issue. In the shmem DArray case, the init
function will just be expected to initialize and not allocate. We just
document the same.

My view on this is a little different.....

Again if a shmem darray will never be used as non-shmem darray and
vice-versa, this is trivial. The creating process has full visibility into
the shmem segment, while the workers have visibility into the full segment
as well as know which segment they need to work on. Mapping all of the
Array setindex! functions only in the context of DArray{T,N,A,1} is
simple.

The complexity is in supporting the same for DArray{T,N,A,0}, but since
performance will be relatively much poorer, I don't know if we should do it
right away. Maybe the non-shmem complete setindex! support can be added
independently later.

amitmurthy · 2013-11-29T15:34:26Z

Actually the eltype is determined from the init function, So in the shmem case, if the init function does only initialization, we will have to get the eltype from elsewhere. I am thinking maybe kwarg shmem can be shmem::Union(Type, Bool)=false ....

JeffBezanson · 2013-11-30T06:36:51Z

I like this a lot! I basically agree with @timholy 's last comment and I'd say this discussion is going in the right direction.

DArrays do currently support any type of AbstractArray as chunks. It might not be possible to support this for shmem arrays, since Array is the only thing we know how to allocate in a shmem segment.

Switching init functions to only initialize and not allocate might be the right thing. This would allow the type information to flow "top down": asking for a DArray{Float64,2,Array{Float64,2}} would determine everything, which might make it easier to write init functions. For example init functions could use localpart, which could give a more uniform API.

This implementation seems not to use the correct memory layout. It looks like it is always basically column distributed. One thing that would work would be to make each chunk a SubArray based on its indexes.

Based on how this code is evolving, it seems to make more sense for shared arrays to be a separate type. The shared version of DArray has all its own fields, its own case in the constructor, and is dispatched differently for many functions. If we do that though, there should be a common ArrayDist type shared by both. By the way, booleans can be type parameters.

StefanKarpinski · 2013-12-04T01:44:08Z

That seems like a pretty reasonable idea.

timholy · 2013-12-04T11:46:56Z

I sprinkled in a couple of line comments, but overall this is looking very good. I agree with @StefanKarpinski about the concern re the proliferation of constructor names. However, is there any concern about the possibility of wanting to construct an array-of-arrays? Since zero(Array{Float64, 1}) is not defined, I doubt this is a problem, but I thought I should raise it.

Aside from these, it would probably be best to add some tests before merging, as these might catch problems (I haven't actually tried any of the code myself yet).

amitmurthy · 2013-12-04T18:08:25Z

d* deprecated
helper constructors (ones, zeros, et al) also dispatch based on DArray or SharedArray type argument
added a few tests
cleaned up based on Tims comments
documentation to be added

timholy · 2013-12-04T19:31:52Z

From my perspective, I'd say this seems fine to merge; since it's not a breaking feature, we can always continue to improve this in base. Perhaps the only reason to hold off might be Windows support. I have a Windows machine at work I can develop & test on, but it won't be before the end of the week.

ViralBShah · 2013-12-05T02:23:25Z

I am ok with merging this and continuing further development in new PRs. Windows support can always be added later.

amitmurthy · 2013-12-05T07:14:52Z

Would like @JeffBezanson to have a look once before merging.

timholy · 2013-12-09T14:34:11Z

As an update, Amit has added documentation and a few other tweaks. And so there's no duplication of effort, I've got a draft implementation for Windows done---it would need someone with a build environment to test it.

One last thing we could do is start writing versions of algorithms that use SharedArrays. If folks think that would be a good thing to do before merging to master, I'd strongly advocate turning this into a branch in julia, rather than leaving it in Amit's fork.

ViralBShah · 2013-12-10T02:04:50Z

I think we should merge this in order to encourage wider usage and testing. As we write some algorithms, we can figure out what tweaks are required.

Bump @JeffBezanson

StefanKarpinski · 2013-12-10T02:11:51Z

I'm largely ok with that, but with the caveat that it be announced as an experimental feature that may not make it into 0.3 in the current form. Let's still see what @JeffBezanson has to say.

timholy · 2013-12-12T16:23:51Z

A discussion in #1790 makes me question whether we really want/need ArrayDist as part of this type. First, each process has complete access to the entire array, with equal efficiency at all indexes; the "local chunks" are far less important here than for DArrays. Second, sometimes you might want to partition a single SharedArray in different ways for different algorithms. For example, imagine in step 1 of your algorithm you multiply two matrices, and in step 2 you sum all the entries. Cache-friendliness tells you that the appropriate way to partition the matrices is by "tiles" (as in gemm) for step 1, but by columns for step 2.

Hence, perhaps the algorithm, rather than the container, needs to be in charge of the partitioning---as long as you know how many processes are working on the array, and they're all running the same function, there is no danger that they will step on each others' toes unless the algorithm is badly-written. We could leave ArrayDist as a "hint" for algorithms that don't really care, but I'm not convinced it's essential here.

amitmurthy · 2013-12-13T04:36:54Z

I understand and agree. Which is why I documented the default partitioning provided by ArrayDist as

"While each worker has full visibility into the SharedArray, local chunks in SharedArrays
(of type SubArray) may be used to partition work across paticipating workers."

We probably should be even more explicit about this.

Also, ArrayDist can evolve to support more than one partitioning scheme - and, at least in the case of SharedArray, the user can switch schemes in the middle of a run too.

amitmurthy · 2013-12-20T08:50:40Z

Bump @JeffBezanson .

If there are any reservations on adding this to base at this time, I can always put it out as a standalone package for now.

JeffBezanson · 2013-12-20T09:22:48Z

This is a very nice PR.

The change to multi.jl looks generally useful; it should be done separately.

I think ArrayDist should only describe an array chunking scheme, and not hold remote references or anything like that. Part of the purpose of it is to write things like similar(A, distribution(B)), which would make some kind of array with the same distribution as B, where distribution returns an ArrayDist. ArrayDist is like a Dims tuple.

I understand wanting to get rid of dzeros etc. but other methods like these take an element type, not a container type. In a generic context rand(T, n) would be meaningless. We need to have a better general solution than separate functions for every way you might want to initialize an array (rand, randn, ones, zeros, trues, falses, infs, nans, threes, fours?)

Currently rand(T, size(A)) works, so maybe rand(T, ArrayDist(...)) makes sense. Of course that doesn't handle shared vs. distributed though.

JeffBezanson · 2013-12-20T09:28:37Z

Also functions like zeros don't make as much sense for distributed arrays. It implies that you're going to initialize by assigning to the array in a separate step, which is not a good idea.

ViralBShah · 2013-12-23T16:30:22Z

It may not be uncommon to allocate memory for a distributed array with zeros and then work on the localpart for some initialization. Although this is almost always avoidable, not having zeros will just lead to lots of questions about why it is not there.

JeffBezanson · 2013-12-23T17:20:26Z

I'm fine with zeros existing for DArrays if we can come up with a sane interface for it. So far the only thing I can think of is zeros(eltype, size, distribution), or zeros(n, m*p) :)

ViralBShah · 2013-12-25T18:08:25Z

Bump.

amitmurthy · 2013-12-31T09:00:41Z

Updated the PR and tried to get a good abstraction based on the discussion so far - though I am not yet fully satisfied with it.

Anyways, putting it out for further inputs:

ArrayDist is an abstract type with DimDist a concrete subtype. A tiled distribution (as suggested by Tim for certain workloads) may be implemented as a TileDist in the future.
DimDist has a field dmode (distribution mode) which specifies if the convenience constructors should create a DArray or a SharedArray. It is a cop-out, but I couldn't think of anything cleaner.
The d* functions are still deprecated. Only fill, rand and randn variants that accept a DimDist have been defined. zeros, ones and their ilk can be served via fill at least for distributed/shared arrays.
distribute has been deprecated in favor of similar.
The procs argument in the DArray/SharedArray constructors is now a keyword argument.
Doc updates are out of sync - will update them once we get a fix on the code.

Stuff I am not happy about:

In case of a TileDist (when implemented), localpart and myindexes will return an array of subparts / tile indexes - as opposed a single entity for DimDist .
In the non-default case - where the distributed array is only created on some of the workers - the number of partitions has to be specified in the DimDist constructor too (in addition to the dprocs keyword arg in the DArry/SharedArray constructors.

amitmurthy · 2014-01-06T03:09:41Z

Bump @JeffBezanson , @timholy .

Any thoughts?

timholy · 2014-01-11T12:13:20Z

If I were writing a multiplication algorithm I would just ignore the pre-defined ArrayDist, and have the algorithm implement its own way of breaking up the array---the right partitioning scheme is specific to the algorithm, not the array. (See https://github.com/JuliaLang/julia/blob/master/base/linalg/matmul.jl#L380-L407 for an example.) So I'm not even convinced we need a TileDist. That cuts out most of your remaining concerns, I think.

I don't really have anything more to add.

ViralBShah · 2014-01-11T14:52:14Z

Also, we could have a package to have all the fancy array distributions and experiment with them. I would really like to have only the basic and simple stuff in Base.

amitmurthy · 2014-01-20T08:49:17Z

Will submit separate PRs for ArrayDist and DArray changes. Hence closing this.

ViralBShah reviewed Nov 29, 2013
View reviewed changes

kmsquire mentioned this pull request Dec 6, 2013

sortperm has poor performance #939

Open

timholy mentioned this pull request Dec 12, 2013

support shared-memory parallelism (multithreading) #1790

Closed

amitmurthy added 4 commits December 31, 2013 13:12

Introduced ArrayDist and SharedArray

f502e7c

fix conflict while rebasing

ea48b6a

Added doc. Checking for isbits for SharedArrays

39c0806

revised as per discussion

cf2d18c

amitmurthy mentioned this pull request Jan 13, 2014

SharedArray - take 2 #5380

Merged

amitmurthy closed this Jan 20, 2014

Added SharedArray #4939

Added SharedArray #4939

Conversation

amitmurthy commented Nov 26, 2013

ivarne commented Nov 26, 2013

timholy commented Nov 26, 2013

amitmurthy commented Nov 26, 2013

amitmurthy commented Nov 26, 2013

timholy commented Nov 26, 2013

ViralBShah commented Nov 27, 2013

ViralBShah commented Nov 27, 2013

amitmurthy commented Nov 27, 2013

timholy commented Nov 27, 2013

amitmurthy commented Nov 27, 2013

staticfloat commented Nov 27, 2013

timholy commented Nov 27, 2013

staticfloat commented Nov 27, 2013

amitmurthy commented Nov 28, 2013

amitmurthy commented Nov 28, 2013

amitmurthy commented Nov 28, 2013

ViralBShah commented Nov 28, 2013

ViralBShah commented Nov 29, 2013

ViralBShah Nov 29, 2013

Choose a reason for hiding this comment

amitmurthy Nov 29, 2013

Choose a reason for hiding this comment

amitmurthy commented Nov 29, 2013

ViralBShah commented Nov 29, 2013

amitmurthy commented Nov 29, 2013

ViralBShah commented Nov 29, 2013

amitmurthy commented Nov 29, 2013

ViralBShah commented Nov 29, 2013

amitmurthy commented Nov 29, 2013

timholy commented Nov 29, 2013

amitmurthy commented Nov 29, 2013

amitmurthy commented Nov 29, 2013

JeffBezanson commented Nov 30, 2013

StefanKarpinski commented Dec 4, 2013

timholy commented Dec 4, 2013

amitmurthy commented Dec 4, 2013

timholy commented Dec 4, 2013

ViralBShah commented Dec 5, 2013

amitmurthy commented Dec 5, 2013

timholy commented Dec 9, 2013

ViralBShah commented Dec 10, 2013

StefanKarpinski commented Dec 10, 2013

timholy commented Dec 12, 2013

amitmurthy commented Dec 13, 2013

amitmurthy commented Dec 20, 2013

JeffBezanson commented Dec 20, 2013

JeffBezanson commented Dec 20, 2013

ViralBShah commented Dec 23, 2013

JeffBezanson commented Dec 23, 2013

ViralBShah commented Dec 25, 2013

amitmurthy commented Dec 31, 2013

amitmurthy commented Jan 6, 2014

timholy commented Jan 11, 2014

ViralBShah commented Jan 11, 2014

amitmurthy commented Jan 20, 2014