-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda acceleration #112
Comments
Great to hear that you're interested in improving the performance! That sounds definitely feasible, and incrementally taking care of parts is probably the best way forward. The critical functions would be |
Thanks, time for me to learn Cuda then :D |
Just in case you're generally looking for speedups and are not yet commited to Cuda: It's probably worth having a look at SIMD intrinsics (SSE) as well. These changes could be less intrusive than switching certain parts to Cuda. |
Thanks for the tip and sorry for the late response. I unfortunately only recently received my hardware but SSE would certainly be interesting that way I could work from home without the need for the TK1. Please don't count too much on this though, if it is ever ready, it will be for the end of the summer. |
Hi, Can you point me to some resources which can point me to understand octrees more intuitively. I understand segment trees and also familiar with lazy update in 1D segment trees. Octrees are 3 dimensional version of segment trees but it is difficult for me to imagine lazy update in it. I wanted to make contribution for it. I am writing this comment because I also plan on parallelising, if it is even possible. |
The best documentation will be Wikipedia, the OctoMap AuRo journal paper, and the code; with increasing depth into the topic. |
Hi @andre-nguyen , How is it going the implementation of CUDA with Octomap? I am also planning on implementing CUDA in Octomap. Maybe I could try to help you. |
If you guys plan some specific tasks I would also love to help.
…On Mon, Jun 19, 2017, 12:20 PM David Mulero ***@***.***> wrote:
Hi @andre-nguyen <https://github.com/andre-nguyen> ,
How is it going the implementation of CUDA with Octomap? I am also
planning on implementing CUDA in Octomap. Maybe I could try to help you.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADLApYYEbhaSbUt0pXc6JdDUf7DkEmieks5sFhoWgaJpZM4HuHJ8>
.
|
Hi, Is there any update on the status of CUDA implementation? |
Hi @ahornung, I have developed a CUDA based replacement of the computeUpdate() and computeRayKeys(). Can you please look at my fork https://github.com/saifullah3396/octomap and tell me if its good for pull request. For now it does not have conflicts with the basic implementation? I'd really like further development on this to be done in this repository. The implementation can be tested by building the cuda-devel branch (add cmake parameter -D__CUDA_SUPPORT__=ON) and running graph2tree as follows: |
Thanks for your contribution @saifullah3396, that sounds really useful! Do you have a first indication about processing times, ideally on the same benchmark data as used in the paper? Unfortunately, I won't have time for an in-depth review, so best would be a cleaned up pull request that can be iteratively discussed and improved by the community. |
@ahornung Well in basic usage the current implementation is definitely faster but before I produce some results on the benchmark data, I will be working on the implementation a bit more for making it even faster. It might take me some time to add a CUDA - based hashmap in there but it will definitely increase performance. I will share the benchmark results once I'm finished with it and send a PR ! :) |
Hi @ahornung
I saw issue #29 and wasn't interested in the GPU-voxel approach. It is clear that many ros applications use octomap as a standard and we would gain to work on parallelizing octomap. The advent of embedded GPU's such as the nvidia TK1 and TX1 are making this much more interesting for mobile robotics.
I would like to slowly incrementally develop this by speeding up small parts of the code.
How feasible do you think this is and do you have any pointers on where to start?
The text was updated successfully, but these errors were encountered: