Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix synchronisation problems in the clusterizer #102

Conversation

fwyzard
Copy link

@fwyzard fwyzard commented Jul 24, 2018

Fix synchronisation problems in the findClus kernel, and improve its documentation.
Avoid unnecessary cudaStreamSynchronize in makeClustersAsync.

@fwyzard fwyzard force-pushed the various_fixes_and_cleanup_part2 branch from be5c775 to d448e03 Compare July 24, 2018 15:24
moduleStart_d,
clusInModule_d, moduleId_d,
clus_d,
wordCounter
);
cudaDeviceSynchronize();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cudaDeviceSynchronize() leftover from the debugging?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes...

#endif

first += threadIdx.x;
if (first>= numElements)
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, why is this replaced with the

bool active = (first < numElements);
if (active) {

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure how __syncthreads is affected by threads that have returned, so I have removed all return statements.
I can see if putting them back reintroduces a problem or not...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation!

@fwyzard
Copy link
Author

fwyzard commented Jul 24, 2018

Looks like we just pushed back the crash to event 400+... back to debugging.

@fwyzard fwyzard force-pushed the various_fixes_and_cleanup_part2 branch 2 times, most recently from 177c055 to 0260b3e Compare July 24, 2018 16:37
@fwyzard
Copy link
Author

fwyzard commented Jul 24, 2018

@VinInn can you help us looking into a related crash ?
instructions to reproduce:

  • follow [the usual instructions(https://patatrack.web.cern.ch/patatrack/wiki/patatrackdevelopment.html#create-a-working-area-for-cmssw1020pre6patatrack) up to Check out the patatrack development branch
  • add this PR:
git-cms-merge-topic cms-patatrack:102
scram b -j
  • run the Zmumu step3:
cmsRun ~fwyzard/public/step3.py

It should crash after few events with

...
Begin processing the 8th record. Run 1, Event 902, LumiSection 10 on stream 0 at 24-Jul-2018 18:50:14.616 CEST
cmsRun: /tmp/fwyzard/CMSSW_10_2_0_pre6_Patatrack/src/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterHeterogeneous.cc:585: void SiPixelRawToClusterHeterogeneous::convertGPUtoCPU(const GPUProduct&, SiPixelRawToClusterHeterogeneous::CPUProduct&) const:
Assertion `gpu.clus_h[i]<256' failed.

:-(

@VinInn
Copy link

VinInn commented Jul 24, 2018

ok. will need to understand your changes
found this
http://www.irisa.fr/alf/downloads/collange/cours/gpuprog_ufmg_2015/gpu_ufmg_2015_5.pdf
will study...

@fwyzard
Copy link
Author

fwyzard commented Jul 24, 2018

Validation summary

Reference release CMSSW_10_2_0_pre6 at a674e1f
Development branch CMSSW_10_2_X_Patatrack at e61eb06
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/1c4a724ba6f15b84ff51466858e9f59a7accf3ff/log .

@fwyzard
Copy link
Author

fwyzard commented Jul 24, 2018

Not OK, there is still something very wrong with the workflow Zmumu GPU workflow:

image

@fwyzard fwyzard force-pushed the various_fixes_and_cleanup_part2 branch from 0260b3e to 3c139fc Compare July 25, 2018 14:49
@fwyzard
Copy link
Author

fwyzard commented Jul 25, 2018

I think I found the error in this PR:

bool done = true;
while (not done) {
  ...
}

was not a very useful loop :-(

@fwyzard fwyzard changed the title Fix synchronisation problems in the findClus kernel Fix synchronisation problems in the clusterizer Jul 25, 2018
Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing #66.

countModules<<<blocks, threadsPerBlock, 0, stream.id()>>>(moduleInd_d, moduleStart_d, clus_d, wordCounter);
cudaCheck(cudaGetLastError());

// read the number of modules into a data memeber, used by getProduct())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, memeber -> member

Copy link
Author

@fwyzard fwyzard Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks


// clusters
cudaCheck(cudaMemcpyAsync(clus_h, clus_d, wordCounter*sizeof(uint32_t), cudaMemcpyDefault, stream.id()));
cudaCheck(cudaGetLastError());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the last cudaGetLastError() is not needed after removing the cudaStreamSynchronize(), possible errors in queueing the async copy should get caught already by its return value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

break;
}
default: if (debug) printf("Cabling check returned unexpected result, status = %i\n", status);
case(1) : {
Copy link

@makortel makortel Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you "point" to them, I find these parentheses weird.

@fwyzard
Copy link
Author

fwyzard commented Jul 25, 2018

Validation summary

Reference release CMSSW_10_2_0_pre6 at a674e1f
Development branch CMSSW_10_2_X_Patatrack at e61eb06
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8 are missing
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8 are missing
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/f6cbca6edfac14cf4486c2a9f64f5d48699d30ad/log .

@fwyzard
Copy link
Author

fwyzard commented Jul 25, 2018

Validation summary

Reference release CMSSW_10_2_0_pre6 at a674e1f
Development branch CMSSW_10_2_X_Patatrack at e61eb06
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/83a4646dcf3ecd6d38722071096129a5aab1542c/log .

@fwyzard fwyzard merged commit f484029 into cms-patatrack:CMSSW_10_2_X_Patatrack Jul 25, 2018
@fwyzard fwyzard deleted the various_fixes_and_cleanup_part2 branch January 26, 2019 08:08
fwyzard added a commit that referenced this pull request Oct 8, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Oct 19, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Oct 20, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Oct 23, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Nov 6, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Nov 16, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Dec 29, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
fwyzard added a commit that referenced this pull request Dec 29, 2020
Fix the use of `__syncthreads()` in the `calibDigis` and `findClus` kernels (possibly fixes #84).
Improve `SiPixelRawToClusterGPUKernel::makeClustersAsync()` to avoid calls to `cudaStreamSynchronize` (fixes #66).
Improve the documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants