-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of multi_proposal_target_layer #1
Comments
The later part of the code is just doing some padding to set invalid labels upto max number of proposals, which doesn’t require much of compute, so that’s done in C++ |
How about the calculation of overlaps and targets (from line 481 to 572)? |
You are right, there is more to padding over there. The number of proposals is typically 500, so the compute is only 500x500xbatchsize. This wasn’t a big issue when the code was profiled. Now there could be a use case where this becomes an issue (where number of proposals is much larger). |
Thank you! Another question, you mentioned SNIPER repo "2. NO PYTHON LAYERS (Every layer is optimized for large batch sizes in CUDA/C++)". How do you optimize it in CUDA/C++? I am really interested in it. |
you can write cuda kernels differently for different batch sizes, for example in the proposal generation layer, NMS is used on top 6000/12000 proposals which is optimized for a batch of 1. This is because people are concerned about latency as well (both blocks and threads are used to compute overlaps in nms which keeps the gpu underutilized). During training you want to maximize throughout , so you can write your kernels differently. Like you can give each image to a block (which goes to a separate multi-processor) and do the overlap computation using threads (which gets executed in parallel in the cores inside an SMP), example: https://github.com/mahyarnajibi/SNIPER-mxnet/blob/master/src/operator/multi_proposal_target_mask.cu |
Hi, I notice that in the "MultiProposalTargetGPUOp.cu", you put some calculations in CPU then move the results back to GPU. What is the purpose?
SNIPER-mxnet/src/operator/multi_proposal_target.cu
Line 394 in 1678b4a
The text was updated successfully, but these errors were encountered: