DeviceSegmentedSort synchronizes default stream and produces wrong results when launched from a kernel #409

fkallen · 2021-12-02T14:29:43Z

DeviceSegmentedSort performs cudaMemcpy on the default stream instead of the supplied stream.
In device code, memcpy may execute before group_sizes and num_selected_groups are calculated. This leads to wrong results.

    if (CUB_IS_HOST_CODE)
    {
      #if CUB_INCLUDE_HOST_CODE
      if (CubDebug(error = cudaMemcpy(h_group_sizes,
                                      group_sizes.get(),
                                      num_selected_groups * sizeof(unsigned int),
                                      cudaMemcpyDeviceToHost)))
      {
        return error;
      }
      #endif
    }
    else
    {
      #if CUB_INCLUDE_DEVICE_CODE
      memcpy(h_group_sizes,
             group_sizes.get(),
             num_selected_groups * sizeof(unsigned int));
      #endif
    }

The text was updated successfully, but these errors were encountered:

gevtushenko · 2021-12-03T12:15:27Z

Thank you for reporting this! I'll create PR soon.

gevtushenko · 2021-12-03T13:51:00Z

@fkallen could you check if the fix works for you?

fkallen · 2021-12-03T15:57:56Z

Your fix does work for me.

gevtushenko self-assigned this Dec 3, 2021

gevtushenko mentioned this issue Dec 3, 2021

Fix segmented sort device-side launch #410

Merged

gevtushenko closed this as completed in #410 Dec 11, 2021

gevtushenko mentioned this issue Mar 18, 2022

Eliminate device synchronization in CDP version of segmented sort #445

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeviceSegmentedSort synchronizes default stream and produces wrong results when launched from a kernel #409

DeviceSegmentedSort synchronizes default stream and produces wrong results when launched from a kernel #409

fkallen commented Dec 2, 2021

gevtushenko commented Dec 3, 2021

gevtushenko commented Dec 3, 2021

fkallen commented Dec 3, 2021

DeviceSegmentedSort synchronizes default stream and produces wrong results when launched from a kernel #409

DeviceSegmentedSort synchronizes default stream and produces wrong results when launched from a kernel #409

Comments

fkallen commented Dec 2, 2021

gevtushenko commented Dec 3, 2021

gevtushenko commented Dec 3, 2021

fkallen commented Dec 3, 2021