Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SharedArray #4939

Closed
wants to merge 4 commits into from
Closed

Added SharedArray #4939

wants to merge 4 commits into from

Conversation

amitmurthy
Copy link
Contributor

Note: This was originally RFC : WIP on adding shmem support to DArrays. Has evolved considerably and finally it was decided to implement SharedArray as a separate type.

  • Inspired by Tim Holy's SharedArrays PR.
  • The idea here is to add shmem support to DArrays, where shmem is used if possible. If not, it will default to current darray behaviour. So users can write code requesting DArrays with shmem=true, and it will work even in situations where shmem is not possible, albeit slower.
  • new kw args shmem=false, safe_r=false, safe_w=true in the DArray constructors.
  • If the user is on Windows or if the requested procs() are not all on the same host, shmem will not be used.
  • If shmem=true and safe_r=true then for all practical purposes, performance will be similar to existing DArray since requests will be remotely fulfilled via a remotecall into the process holding the relevant chunk.

TODO:

  • setindex for DArray in general
  • tests for DArray / DArray with shmem

Would like feedback whether we should go down this path, or keep shmem support distinct like Tim's SharedArrays or do both.

@ivarne
Copy link
Member

ivarne commented Nov 26, 2013

This is a feature that might be worthy of a mention in the NEWS.md file.

@timholy
Copy link
Member

timholy commented Nov 26, 2013

Amit, thanks for tackling this.

One of my initial concerns about incorporating this into DArray was that A[i,j] can be guaranteed to be fast only if you're using shared memory. I was concerned that there might be circumstances where the choice of the best algorithm would depend on its internal representation.

However, it now occurs to me that if needed perhaps we could add another parameter to DArray:

type DArray{T,N,A,isShared}

where that last parameter is an integer, 0 or 1. Then one could perform dispatch on it.

Now that I think this through, I suspect the best approach might be the following: (1) merge this rather than my SharedArrays (it's more flexible); and (2) if/when my concern manifests, we can extend the DArray type parameters as suggested above (but don't do this until it becomes an issue).

Thoughts?

@amitmurthy
Copy link
Contributor Author

Your last comment was on the first commit. If you look at commit 24dc0b6, you will notice that d.local_shmmap will always have the correct parameterized type when it is used. I didn't understand what you meant by "where we'll need to use dispatch"

@amitmurthy
Copy link
Contributor Author

Ah! OK, I get it. Will remove the Union from d.local_shmmap

@timholy
Copy link
Member

timholy commented Nov 26, 2013

Ah! OK, I get it. Will remove the Union from d.local_shmmap

About to write a response, but you got there first...

A[i,j], when A is shared, needs to be no more complicated than a pointer-lookup (which is what it is when A is a plain Array). Otherwise there's too much overhead to referencing individual elements of the array.

@ViralBShah
Copy link
Member

This is a really nice optimization to have.

@ViralBShah
Copy link
Member

You probably haven't tried this on a mac yet, but I get:

julia> a = dzeros(100,100, shmem=true)
ERROR: shm_open() failed
 in shm_mmap_array at darray.jl:461
 in DArray at darray.jl:106
 in DArray at darray.jl:169
 in DArray at darray.jl:171
 in dzeros at darray.jl:279
 in dzeros at darray.jl:280

@amitmurthy
Copy link
Contributor Author

cc: @tanmaykm

@timholy
Copy link
Member

timholy commented Nov 27, 2013

@amitmurthy, here's an example of where I think we should go with this. I should have started by forking your shmem branch to my github account, but instead I did it all locally then pushed to my account. I seem to be having trouble figuring out how to get GitHub to set your repository as the base fork for my PR (it doesn't list it in the drop-down box), so perhaps best is to see this commit: https://github.com/timholy/julia/commit/262d5e9d026e3f53f3e42dd5f2e065d983e9284c

My not-so-secret ambition is to get to the point where, for any sizable chunk of data, you might as well use a DArray as an Array. Then parallelism can start invading all kinds of algorithms (like sum, fill!, etc) in base Julia.

Test script:

A = ones(1000, 1100)
sum(A)
@time sum(A)
D = dones(1000, 1100; shmem=true)   # or set to false
sum(D)
@time sum(D)

Note that sum is not yet parallel, this is simply to measure whether there is any penalty for using a DArray.

With shmem=false (i.e., like what we had before Amit's change):

julia> include("/tmp/testdarray.jl")
elapsed time: 0.002323238 seconds (64 bytes allocated)
elapsed time: 3.764029668 seconds (793688032 bytes allocated)
1.1e6

With shmem=true:

julia> include("/tmp/testdarray.jl")
elapsed time: 0.003227348 seconds (64 bytes allocated)
elapsed time: 0.097602358 seconds (35201472 bytes allocated)
1.1e6

Much, much faster! But still far too slow compared to plain arrays (30x slower).

With the changes in that commit to my own fork:

julia> include("/tmp/testdarray.jl")
elapsed time: 0.002355822 seconds (64 bytes allocated)
elapsed time: 0.001969175 seconds (64 bytes allocated)
1.1e6

Here, there is no gap between Array and DArray. (The difference between the two appears to be noise, it's not that the DArray is faster.)

I'm not sure I understand what the safe_r and friends mean, so I just did the minimum needed to illustrate the point.

@amitmurthy
Copy link
Contributor Author

This is really a nice learning for me on using dispatch for improving performance. Will incorporate it.

Thanks.

@staticfloat
Copy link
Member

@timholy when you go to open a pull request, you can click "edit" on the righthand side of the base and head repos. The dropdown box next to the "base fork" allows you to type in whatever user you want to compare. Or, if you're impatient, you can just manually edit the URL in your browser to do the comparison you want. Like this.

@timholy
Copy link
Member

timholy commented Nov 27, 2013

Hmm, when I tried typing in the "filter" box amitmurthy/julia, it didn't accept that input. I like the URL solution, thanks!

@staticfloat
Copy link
Member

Just type amitmurthy. It won't let you base off a different repo entirely.
On Nov 27, 2013 12:44 PM, "Tim Holy" [email protected] wrote:

Hmm, when I tried typing in the "filter" box amitmurthy/julia, it didn't
accept that input. I like the URL solution, thanks!


Reply to this email directly or view it on GitHubhttps://github.com//pull/4939#issuecomment-29418676
.

@amitmurthy
Copy link
Contributor Author

Tim,

safe_r was meant to serialize (as in not concurrent) reads via the worker holding the appropriate chunck. But since shmem=true, safe_r=true has the same behavior as a non-shmem DArray, I think I'll drop it altogether. shmem=true would mean that reads can be concurrent for any index into the array across all workers.

safe_w means safe write, i.e. serializing writes to a chunk via the appropriate worker. I'll retain this.

One more thing:
getindex{T,N,A}(d::DArray{T,N,A,1}, i::Int) = getindex(d.local_shmmap, i) is what made the code efficient. The current DArray implementation allows us to serialize (as in into a stream) DArray objects onto workers that do not have any chunks of the darray locally and it still works. In order to support the same, is there a way to deserialize d::DArray{T,N,A,1} as d::DArray{T,N,A,0} on workers that have not mapped the shmem segment?

@amitmurthy
Copy link
Contributor Author

OK. I think I figured out how I can do it. On workers where it is not possible to map the shmem, darray's deserialize will make a copy of the deserialized d::DArray{T,N,A,1} as d::DArray{T,N,A,0} and return that one.

@amitmurthy
Copy link
Contributor Author

Hey Tim, your suggestions have been incorporated.

Darray on a worker without the shmem mapping works as a non-shmem darray.

julia> addprocs(3)
julia> A = dones(100, 110);
julia> sum(A);
julia> @time sum(A);
elapsed time: 2.56274503 seconds (63934396 bytes allocated)

julia> D = dones(100, 110; shmem=true);   # or set to false
julia> sum(D);
julia> @time sum(D);
elapsed time: 1.3836e-5 seconds (64 bytes allocated)

julia> addprocs(1)

julia> remotecall_fetch(4, d->sum(d), D);
julia> @time remotecall_fetch(4, d->begin sum(d); @time sum(d) end, D)
        From worker 4:  elapsed time: 1.3544e-5 seconds (6740 bytes allocated)
elapsed time: 0.113249829 seconds (1346232 bytes allocated)

julia> remotecall_fetch(5, d->sum(d), D);
julia> @time remotecall_fetch(5, d->begin @time sum(d) end, D)
        From worker 5:  elapsed time: 2.620149135 seconds (63307272 bytes allocated)
elapsed time: 2.722422997 seconds (83304 bytes allocated)

@ViralBShah
Copy link
Member

This is really cool. I can confirm that it works on OS X for me.

@ViralBShah
Copy link
Member

I would prefer that we use shmem by default when it is applicable. The keyword argument is certainly useful to have for cases where you want to turn it off for debugging purposes. Are safe_r and safe_w gone now?

@@ -1665,3 +1665,16 @@ function interrupt(pids::AbstractVector=workers())
end
end
end


function islocalconnection(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this always return false on Windows, or is this implementation likely to work on Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work on windows too. It just checks if the worker TCP connection is on the same host. A non-exported function only used by DArray in shmem mode.

@amitmurthy
Copy link
Contributor Author

safe_r is gone.
safe_w exists.
I have yet to put in setindex!

@ViralBShah
Copy link
Member

Can we call it safe_write? I presume you are providing this for cases where we need to avoid unordered simultaneous writes by serializing them through the processor that owns the memory?

@amitmurthy
Copy link
Contributor Author

Yes, that is the reason. safe_write is fine. By default it will be false in the shmem case. In non-shmem, it is necessarily true.

@ViralBShah
Copy link
Member

How about mapping Array as a shmem segment across multiple processors? We could even map existing sparse matrices across various processors. This would then make it possible for operations that can easily parallelize to work on different parts of the array.

@amitmurthy
Copy link
Contributor Author

It will still involve copying data from an existing Array into the shmem segment.

@ViralBShah
Copy link
Member

I think it would be ok to copy it over the first time.

@amitmurthy
Copy link
Contributor Author

Here is the current status:

  • DArray constructors have 2 additional parameters : shmem=false, safe_write=false. With shmem=true, the array is created in shared memory. If we are unable to create in shared memory, currently it just gives a warning and defaults to non-shmem DArray. If safe_write=true, all writes are serialized via the respective chunk owners.

Question: Should we just give an error if shmem cannot be supported given that that the performance differences between shmem and non-shmem is so large.

  • In case of shmem, the init function is called to allocate the entire array which is then copied onto the shmem segment. This has been done in order to have the same expectation from the init function for shmem as well as non-shmem.

Any suggestions on avoiding this?

@ViralBShah, DArray already had a distribute function to distribute a regular array and return a DArray. If you pass shmem=true, it will now just copy it directly

  • basic setindex! (using Int indexes only) is available. But since the whole idea of DArray is that computation will be distributed, and folks are expected to work off localparts using myindexes, it does not make sense to support the whole gamut of setindex! as defined for Array. The localpart is of type Array anyway.

@timholy , w.r.t to your desire of "...you might as well use a DArray as an Array.", it may not completely possible. The following questions arise:

  • For the DArray{T,N,A,1}, i.e., using shmem, we can directly map many Array functions. However, unless we do this for the non-shmem cases too - a bit complicated given the chunked representation and poor performance too - there will be a disparity between DArray{T,N,A,1} and DArray{T,N,A,0}. Is this OK?
  • Array functions which result in the size of the array changing cannot be supported in DArray
  • Some Array methods return new arrays (e.g., rotl90, rotr90, etc) - Should similar functions in DArray return an Array or new DArrays?

cc: @JeffBezanson : Can you have a look at this?

@timholy
Copy link
Member

timholy commented Nov 29, 2013

Amit, this looks great. Some questions and responses:

On workers where it is not possible to map the shmem, darray's deserialize will make a copy of the deserialized d::DArray{T,N,A,1} as d::DArray{T,N,A,0} and return that one.

Is this something that the caller can determine in advance? We don't want to get into a situation like the following: suppose there are 5 processes, 4 can access the shmem. To parallelize an operation, the caller farms out a chunk to each of the processes. The 4 of them complete their chunk with blinding speed, but the 5th one is very slow, and this makes the entire operation slow. I'd much rather assign the task to just the 4 that can do shmem, and ignore the 5th altogether.

In other words, in my view a DArray that is declared shmem should not include processes that can't use it.

Question: Should we just give an error if shmem cannot be supported given that that the performance differences between shmem and non-shmem is so large.

I'd favor this. People can always wrap in a try/catch block if they want to write code that won't fail. But is Windows the only platform that doesn't yet support this? I'd favor getting Windows working, too, if possible, and then perhaps this won't even be something we have to worry about. (Although we should plan for Julia to spread to other platforms too, like Android.) I'd be happy to do my best to help make this work (I don't know Windows very well at all, but I could at least give it a try, or perhaps there are others who might tackle this).

In case of shmem, the init function is called to allocate the entire array which is then copied onto the shmem segment. This has been done in order to have the same expectation from the init function for shmem as well as non-shmem. Any suggestions on avoiding this?

I would favor changing the convention here. I guess the key question is whether DArray is supposed to allow chunks that are AbstractArrays but not Arrays. If so, then we have several issues to fix, including adding another parameter to DArray that represents the type of the local_shmmap. (@JeffBezanson, your input here is desired.) We might then need to pass two functions, one to allocate and one to initialize, and use just the latter in the case of shmem.

But since the whole idea of DArray is that computation will be distributed, and folks are expected to work off localparts using myindexes, it does not make sense to support the whole gamut of setindex! as defined for Array. The localpart is of type Array anyway.

My view on this is a little different. There is going to be some overhead to farming a task out to workers. It might be faster to run my fill! operation from a single process (it might be faster than paying the overhead of starting up workers and synchronizing their completion). But then I'd like to distribute my maximum-intensity-projection algorithm to multiple processes (because it's a slow operation, and will benefit from parallelization), and finally I'll want to save the result to disk using a single process (because it's easier to think about I/O from a single process, and I'm lazy). In SharedArray, what I was aiming for was a type that can work seamlessly in both of these modes; if DArray is going to supersede it, perhaps we need to take seriously the possibility that a DArray will be used exactly like a regular Array.

w.r.t to your desire of "...you might as well use a DArray as an Array.", it may not completely possible....there will be a disparity between DArray{T,N,A,1} and DArray{T,N,A,0}. Is this OK?

As far as I'm concerned, yes. Right now, outside of libraries like OpenBLAS and FFTW, julia is not really doing much to exploit multi-core machines, and my main interest is changing that. I'd be very content with providing good base library support for just the shared-memory versions, and declaring that in other cases you're on your own (likely through a package).

Array functions which result in the size of the array changing cannot be supported in DArray

That's OK. For reductions, etc, we'll probably want to have versions where the output can be pre-allocated.

Some Array methods return new arrays (e.g., rotl90, rotr90, etc) - Should similar functions in DArray return an Array or new DArrays?

I'd say DArrays. We could define a similar function for DArray.

@amitmurthy
Copy link
Contributor Author

Is this something that the caller can determine in advance?

Yes. This is just a hypothetical situation like this:

addprocs(4)
d=dzeros(100,100; shmem= true) # d created on pid 1, also mapped on 2,3,4,5

addprocs("foo@some_other_host") # lets say this creates pid 6
remotecall_fetch(6, x->sum(x), d)

The above works with a regular DArray. But in the shmem case, while it
currently works, I think I'll change it to just throw an error.

In other words, in my view a DArray that is declared shmem should not
include processes that can't use it.

The existing DArray implementation supports accessing the darray even from
processes that were not involved in the construction process. But, I think
you are right, we will disallow the same in the shmem case.

Question: Should we just give an error if shmem cannot be supported
given that that the performance differences between shmem and non-shmem is
so large.

I'd favor this.....

There are two situations. One is Windows, where darray shmem support will
hopefully come sooner than later. The other is where the programmer
inadvertently does
d=dzeros((100,100), [2,3,4,5,6]; shmem=true) and process 6 happens to be
on some_other_host . I'll change both these situations to throw errors
instead of defaulting to non-shmem darray

We might then need to pass two functions, one to allocate and one to
initialize, and use just the latter in the case of shmem.

Actually if we decide to disallow any conversion between shmem darray and
non-shmem darray, this is a non-issue. In the shmem DArray case, the init
function will just be expected to initialize and not allocate. We just
document the same.

My view on this is a little different.....

Again if a shmem darray will never be used as non-shmem darray and
vice-versa, this is trivial. The creating process has full visibility into
the shmem segment, while the workers have visibility into the full segment
as well as know which segment they need to work on. Mapping all of the
Array setindex! functions only in the context of DArray{T,N,A,1} is
simple.

The complexity is in supporting the same for DArray{T,N,A,0}, but since
performance will be relatively much poorer, I don't know if we should do it
right away. Maybe the non-shmem complete setindex! support can be added
independently later.

@amitmurthy
Copy link
Contributor Author

Actually the eltype is determined from the init function, So in the shmem case, if the init function does only initialization, we will have to get the eltype from elsewhere. I am thinking maybe kwarg shmem can be shmem::Union(Type, Bool)=false ....

@JeffBezanson
Copy link
Member

I like this a lot! I basically agree with @timholy 's last comment and I'd say this discussion is going in the right direction.

DArrays do currently support any type of AbstractArray as chunks. It might not be possible to support this for shmem arrays, since Array is the only thing we know how to allocate in a shmem segment.

Switching init functions to only initialize and not allocate might be the right thing. This would allow the type information to flow "top down": asking for a DArray{Float64,2,Array{Float64,2}} would determine everything, which might make it easier to write init functions. For example init functions could use localpart, which could give a more uniform API.

This implementation seems not to use the correct memory layout. It looks like it is always basically column distributed. One thing that would work would be to make each chunk a SubArray based on its indexes.

Based on how this code is evolving, it seems to make more sense for shared arrays to be a separate type. The shared version of DArray has all its own fields, its own case in the constructor, and is dispatched differently for many functions. If we do that though, there should be a common ArrayDist type shared by both. By the way, booleans can be type parameters.

@StefanKarpinski
Copy link
Member

That seems like a pretty reasonable idea.

@timholy
Copy link
Member

timholy commented Dec 4, 2013

I sprinkled in a couple of line comments, but overall this is looking very good. I agree with @StefanKarpinski about the concern re the proliferation of constructor names. However, is there any concern about the possibility of wanting to construct an array-of-arrays? Since zero(Array{Float64, 1}) is not defined, I doubt this is a problem, but I thought I should raise it.

Aside from these, it would probably be best to add some tests before merging, as these might catch problems (I haven't actually tried any of the code myself yet).

@amitmurthy
Copy link
Contributor Author

  • d* deprecated
  • helper constructors (ones, zeros, et al) also dispatch based on DArray or SharedArray type argument
  • added a few tests
  • cleaned up based on Tims comments
  • documentation to be added

@timholy
Copy link
Member

timholy commented Dec 4, 2013

From my perspective, I'd say this seems fine to merge; since it's not a breaking feature, we can always continue to improve this in base. Perhaps the only reason to hold off might be Windows support. I have a Windows machine at work I can develop & test on, but it won't be before the end of the week.

@ViralBShah
Copy link
Member

I am ok with merging this and continuing further development in new PRs. Windows support can always be added later.

@amitmurthy
Copy link
Contributor Author

Would like @JeffBezanson to have a look once before merging.

@timholy
Copy link
Member

timholy commented Dec 9, 2013

As an update, Amit has added documentation and a few other tweaks. And so there's no duplication of effort, I've got a draft implementation for Windows done---it would need someone with a build environment to test it.

One last thing we could do is start writing versions of algorithms that use SharedArrays. If folks think that would be a good thing to do before merging to master, I'd strongly advocate turning this into a branch in julia, rather than leaving it in Amit's fork.

@ViralBShah
Copy link
Member

I think we should merge this in order to encourage wider usage and testing. As we write some algorithms, we can figure out what tweaks are required.

Bump @JeffBezanson

@StefanKarpinski
Copy link
Member

I'm largely ok with that, but with the caveat that it be announced as an experimental feature that may not make it into 0.3 in the current form. Let's still see what @JeffBezanson has to say.

@timholy
Copy link
Member

timholy commented Dec 12, 2013

A discussion in #1790 makes me question whether we really want/need ArrayDist as part of this type. First, each process has complete access to the entire array, with equal efficiency at all indexes; the "local chunks" are far less important here than for DArrays. Second, sometimes you might want to partition a single SharedArray in different ways for different algorithms. For example, imagine in step 1 of your algorithm you multiply two matrices, and in step 2 you sum all the entries. Cache-friendliness tells you that the appropriate way to partition the matrices is by "tiles" (as in gemm) for step 1, but by columns for step 2.

Hence, perhaps the algorithm, rather than the container, needs to be in charge of the partitioning---as long as you know how many processes are working on the array, and they're all running the same function, there is no danger that they will step on each others' toes unless the algorithm is badly-written. We could leave ArrayDist as a "hint" for algorithms that don't really care, but I'm not convinced it's essential here.

@amitmurthy
Copy link
Contributor Author

I understand and agree. Which is why I documented the default partitioning provided by ArrayDist as

"While each worker has full visibility into the SharedArray, local chunks in SharedArrays
(of type SubArray) may be used to partition work across paticipating workers."

We probably should be even more explicit about this.

Also, ArrayDist can evolve to support more than one partitioning scheme - and, at least in the case of SharedArray, the user can switch schemes in the middle of a run too.

@amitmurthy
Copy link
Contributor Author

Bump @JeffBezanson .

If there are any reservations on adding this to base at this time, I can always put it out as a standalone package for now.

@JeffBezanson
Copy link
Member

This is a very nice PR.

The change to multi.jl looks generally useful; it should be done separately.

I think ArrayDist should only describe an array chunking scheme, and not hold remote references or anything like that. Part of the purpose of it is to write things like similar(A, distribution(B)), which would make some kind of array with the same distribution as B, where distribution returns an ArrayDist. ArrayDist is like a Dims tuple.

I understand wanting to get rid of dzeros etc. but other methods like these take an element type, not a container type. In a generic context rand(T, n) would be meaningless. We need to have a better general solution than separate functions for every way you might want to initialize an array (rand, randn, ones, zeros, trues, falses, infs, nans, threes, fours?)

Currently rand(T, size(A)) works, so maybe rand(T, ArrayDist(...)) makes sense. Of course that doesn't handle shared vs. distributed though.

@JeffBezanson
Copy link
Member

Also functions like zeros don't make as much sense for distributed arrays. It implies that you're going to initialize by assigning to the array in a separate step, which is not a good idea.

@ViralBShah
Copy link
Member

It may not be uncommon to allocate memory for a distributed array with zeros and then work on the localpart for some initialization. Although this is almost always avoidable, not having zeros will just lead to lots of questions about why it is not there.

@JeffBezanson
Copy link
Member

I'm fine with zeros existing for DArrays if we can come up with a sane interface for it. So far the only thing I can think of is zeros(eltype, size, distribution), or zeros(n, m*p) :)

@ViralBShah
Copy link
Member

Bump.

@amitmurthy
Copy link
Contributor Author

Updated the PR and tried to get a good abstraction based on the discussion so far - though I am not yet fully satisfied with it.

Anyways, putting it out for further inputs:

  • ArrayDist is an abstract type with DimDist a concrete subtype. A tiled distribution (as suggested by Tim for certain workloads) may be implemented as a TileDist in the future.
  • DimDist has a field dmode (distribution mode) which specifies if the convenience constructors should create a DArray or a SharedArray. It is a cop-out, but I couldn't think of anything cleaner.
  • The d* functions are still deprecated. Only fill, rand and randn variants that accept a DimDist have been defined. zeros, ones and their ilk can be served via fill at least for distributed/shared arrays.
  • distribute has been deprecated in favor of similar.
  • The procs argument in the DArray/SharedArray constructors is now a keyword argument.
  • Doc updates are out of sync - will update them once we get a fix on the code.

Stuff I am not happy about:

  • In case of a TileDist (when implemented), localpart and myindexes will return an array of subparts / tile indexes - as opposed a single entity for DimDist .
  • In the non-default case - where the distributed array is only created on some of the workers - the number of partitions has to be specified in the DimDist constructor too (in addition to the dprocs keyword arg in the DArry/SharedArray constructors.

@amitmurthy
Copy link
Contributor Author

Bump @JeffBezanson , @timholy .

Any thoughts?

@timholy
Copy link
Member

timholy commented Jan 11, 2014

If I were writing a multiplication algorithm I would just ignore the pre-defined ArrayDist, and have the algorithm implement its own way of breaking up the array---the right partitioning scheme is specific to the algorithm, not the array. (See https://github.com/JuliaLang/julia/blob/master/base/linalg/matmul.jl#L380-L407 for an example.) So I'm not even convinced we need a TileDist. That cuts out most of your remaining concerns, I think.

I don't really have anything more to add.

@ViralBShah
Copy link
Member

Also, we could have a package to have all the fancy array distributions and experiment with them. I would really like to have only the basic and simple stuff in Base.

@amitmurthy amitmurthy mentioned this pull request Jan 13, 2014
@amitmurthy
Copy link
Contributor Author

Will submit separate PRs for ArrayDist and DArray changes. Hence closing this.

@amitmurthy amitmurthy closed this Jan 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants