Implementation of multi_proposal_target_layer #1

dingjiansw101 · 2019-05-12T13:09:02Z

Hi, I notice that in the "MultiProposalTargetGPUOp.cu", you put some calculations in CPU then move the results back to GPU. What is the purpose?

SNIPER-mxnet/src/operator/multi_proposal_target.cu

Line 394 in 1678b4a

    
           cudaMemcpy(gt_boxes, tgt_boxes.dptr_, 5 * sizeof(float) * num_images * 100, cudaMemcpyDeviceToHost);

bharatsingh430 · 2019-05-12T16:10:57Z

The later part of the code is just doing some padding to set invalid labels upto max number of proposals, which doesn’t require much of compute, so that’s done in C++

dingjiansw101 · 2019-05-13T08:44:06Z

How about the calculation of overlaps and targets (from line 481 to 572)?

bharatsingh430 · 2019-05-13T13:57:03Z

You are right, there is more to padding over there. The number of proposals is typically 500, so the compute is only 500x500xbatchsize. This wasn’t a big issue when the code was profiled. Now there could be a use case where this becomes an issue (where number of proposals is much larger).

dingjiansw101 · 2019-05-14T03:09:40Z

Thank you! Another question, you mentioned SNIPER repo "2. NO PYTHON LAYERS (Every layer is optimized for large batch sizes in CUDA/C++)". How do you optimize it in CUDA/C++? I am really interested in it.

bharatsingh430 · 2019-05-14T03:27:46Z

you can write cuda kernels differently for different batch sizes, for example in the proposal generation layer, NMS is used on top 6000/12000 proposals which is optimized for a batch of 1. This is because people are concerned about latency as well (both blocks and threads are used to compute overlaps in nms which keeps the gpu underutilized). During training you want to maximize throughout , so you can write your kernels differently. Like you can give each image to a block (which goes to a separate multi-processor) and do the overlap computation using threads (which gets executed in parallel in the cores inside an SMP), example: https://github.com/mahyarnajibi/SNIPER-mxnet/blob/master/src/operator/multi_proposal_target_mask.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of multi_proposal_target_layer #1

Implementation of multi_proposal_target_layer #1

dingjiansw101 commented May 12, 2019

bharatsingh430 commented May 12, 2019 •

edited

Loading

dingjiansw101 commented May 13, 2019

bharatsingh430 commented May 13, 2019

dingjiansw101 commented May 14, 2019

bharatsingh430 commented May 14, 2019

Implementation of multi_proposal_target_layer #1

Implementation of multi_proposal_target_layer #1

Comments

dingjiansw101 commented May 12, 2019

bharatsingh430 commented May 12, 2019 • edited Loading

dingjiansw101 commented May 13, 2019

bharatsingh430 commented May 13, 2019

dingjiansw101 commented May 14, 2019

bharatsingh430 commented May 14, 2019

bharatsingh430 commented May 12, 2019 •

edited

Loading