-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added SharedArray #4939
Added SharedArray #4939
Conversation
This is a feature that might be worthy of a mention in the NEWS.md file. |
Amit, thanks for tackling this. One of my initial concerns about incorporating this into However, it now occurs to me that if needed perhaps we could add another parameter to
where that last parameter is an integer, 0 or 1. Then one could perform dispatch on it. Now that I think this through, I suspect the best approach might be the following: (1) merge this rather than my SharedArrays (it's more flexible); and (2) if/when my concern manifests, we can extend the Thoughts? |
Your last comment was on the first commit. If you look at commit 24dc0b6, you will notice that |
Ah! OK, I get it. Will remove the |
About to write a response, but you got there first...
|
This is a really nice optimization to have. |
You probably haven't tried this on a mac yet, but I get:
|
cc: @tanmaykm |
@amitmurthy, here's an example of where I think we should go with this. I should have started by forking your shmem branch to my github account, but instead I did it all locally then pushed to my account. I seem to be having trouble figuring out how to get GitHub to set your repository as the base fork for my PR (it doesn't list it in the drop-down box), so perhaps best is to see this commit: https://github.com/timholy/julia/commit/262d5e9d026e3f53f3e42dd5f2e065d983e9284c My not-so-secret ambition is to get to the point where, for any sizable chunk of data, you might as well use a Test script:
Note that With
With
Much, much faster! But still far too slow compared to plain arrays (30x slower). With the changes in that commit to my own fork:
Here, there is no gap between Array and DArray. (The difference between the two appears to be noise, it's not that the DArray is faster.) I'm not sure I understand what the |
This is really a nice learning for me on using dispatch for improving performance. Will incorporate it. Thanks. |
@timholy when you go to open a pull request, you can click "edit" on the righthand side of the base and head repos. The dropdown box next to the "base fork" allows you to type in whatever user you want to compare. Or, if you're impatient, you can just manually edit the URL in your browser to do the comparison you want. Like this. |
Hmm, when I tried typing in the "filter" box |
Just type amitmurthy. It won't let you base off a different repo entirely.
|
Tim,
One more thing: |
OK. I think I figured out how I can do it. On workers where it is not possible to map the shmem, darray's |
Hey Tim, your suggestions have been incorporated. Darray on a worker without the shmem mapping works as a non-shmem darray.
|
This is really cool. I can confirm that it works on OS X for me. |
I would prefer that we use shmem by default when it is applicable. The keyword argument is certainly useful to have for cases where you want to turn it off for debugging purposes. Are |
@@ -1665,3 +1665,16 @@ function interrupt(pids::AbstractVector=workers()) | |||
end | |||
end | |||
end | |||
|
|||
|
|||
function islocalconnection(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this always return false on Windows, or is this implementation likely to work on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should work on windows too. It just checks if the worker TCP connection is on the same host. A non-exported function only used by DArray in shmem mode.
safe_r is gone. |
Can we call it |
Yes, that is the reason. |
How about mapping |
It will still involve copying data from an existing |
I think it would be ok to copy it over the first time. |
Here is the current status:
Question: Should we just give an error if shmem cannot be supported given that that the performance differences between shmem and non-shmem is so large.
Any suggestions on avoiding this? @ViralBShah, DArray already had a
@timholy , w.r.t to your desire of "...you might as well use a DArray as an Array.", it may not completely possible. The following questions arise:
cc: @JeffBezanson : Can you have a look at this? |
Amit, this looks great. Some questions and responses:
Is this something that the caller can determine in advance? We don't want to get into a situation like the following: suppose there are 5 processes, 4 can access the shmem. To parallelize an operation, the caller farms out a chunk to each of the processes. The 4 of them complete their chunk with blinding speed, but the 5th one is very slow, and this makes the entire operation slow. I'd much rather assign the task to just the 4 that can do shmem, and ignore the 5th altogether. In other words, in my view a DArray that is declared shmem should not include processes that can't use it.
I'd favor this. People can always wrap in a
I would favor changing the convention here. I guess the key question is whether
My view on this is a little different. There is going to be some overhead to farming a task out to workers. It might be faster to run my
As far as I'm concerned, yes. Right now, outside of libraries like OpenBLAS and FFTW, julia is not really doing much to exploit multi-core machines, and my main interest is changing that. I'd be very content with providing good base library support for just the shared-memory versions, and declaring that in other cases you're on your own (likely through a package).
That's OK. For reductions, etc, we'll probably want to have versions where the output can be pre-allocated.
I'd say DArrays. We could define a |
The above works with a regular DArray. But in the shmem case, while it
The existing DArray implementation supports accessing the darray even from
Again if a shmem darray will never be used as non-shmem darray and The complexity is in supporting the same for |
Actually the |
I like this a lot! I basically agree with @timholy 's last comment and I'd say this discussion is going in the right direction. DArrays do currently support any type of Switching This implementation seems not to use the correct memory layout. It looks like it is always basically column distributed. One thing that would work would be to make each chunk a Based on how this code is evolving, it seems to make more sense for shared arrays to be a separate type. The shared version of DArray has all its own fields, its own case in the constructor, and is dispatched differently for many functions. If we do that though, there should be a common |
That seems like a pretty reasonable idea. |
I sprinkled in a couple of line comments, but overall this is looking very good. I agree with @StefanKarpinski about the concern re the proliferation of constructor names. However, is there any concern about the possibility of wanting to construct an array-of-arrays? Since Aside from these, it would probably be best to add some tests before merging, as these might catch problems (I haven't actually tried any of the code myself yet). |
|
From my perspective, I'd say this seems fine to merge; since it's not a breaking feature, we can always continue to improve this in base. Perhaps the only reason to hold off might be Windows support. I have a Windows machine at work I can develop & test on, but it won't be before the end of the week. |
I am ok with merging this and continuing further development in new PRs. Windows support can always be added later. |
Would like @JeffBezanson to have a look once before merging. |
As an update, Amit has added documentation and a few other tweaks. And so there's no duplication of effort, I've got a draft implementation for Windows done---it would need someone with a build environment to test it. One last thing we could do is start writing versions of algorithms that use |
I think we should merge this in order to encourage wider usage and testing. As we write some algorithms, we can figure out what tweaks are required. Bump @JeffBezanson |
I'm largely ok with that, but with the caveat that it be announced as an experimental feature that may not make it into 0.3 in the current form. Let's still see what @JeffBezanson has to say. |
A discussion in #1790 makes me question whether we really want/need Hence, perhaps the algorithm, rather than the container, needs to be in charge of the partitioning---as long as you know how many processes are working on the array, and they're all running the same function, there is no danger that they will step on each others' toes unless the algorithm is badly-written. We could leave |
I understand and agree. Which is why I documented the default partitioning provided by ArrayDist as "While each worker has full visibility into the SharedArray, local chunks in SharedArrays We probably should be even more explicit about this. Also, |
Bump @JeffBezanson . If there are any reservations on adding this to base at this time, I can always put it out as a standalone package for now. |
This is a very nice PR. The change to multi.jl looks generally useful; it should be done separately. I think I understand wanting to get rid of Currently |
Also functions like |
It may not be uncommon to allocate memory for a distributed array with |
I'm fine with |
Bump. |
Updated the PR and tried to get a good abstraction based on the discussion so far - though I am not yet fully satisfied with it. Anyways, putting it out for further inputs:
Stuff I am not happy about:
|
Bump @JeffBezanson , @timholy . Any thoughts? |
If I were writing a multiplication algorithm I would just ignore the pre-defined I don't really have anything more to add. |
Also, we could have a package to have all the fancy array distributions and experiment with them. I would really like to have only the basic and simple stuff in Base. |
Will submit separate PRs for ArrayDist and DArray changes. Hence closing this. |
Note: This was originally
RFC : WIP on adding shmem support to DArrays
. Has evolved considerably and finally it was decided to implementSharedArray
as a separate type.shmem=false
,safe_r=false
,safe_w=true
in the DArray constructors.shmem
will not be used.shmem=true
andsafe_r=true
then for all practical purposes, performance will be similar to existing DArray since requests will be remotely fulfilled via a remotecall into the process holding the relevant chunk.TODO:
setindex
for DArray in generalWould like feedback whether we should go down this path, or keep shmem support distinct like Tim's SharedArrays or do both.