Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null check postPhase in Worker.run to resolve race condition at terminate #211

Merged
merged 4 commits into from
Jul 20, 2022

Conversation

mbutrovich
Copy link
Collaborator

@mbutrovich mbutrovich commented Jul 20, 2022

ThreadBench and WorkloadState rely on WorkloadState.currentPhase being null as the indicator that the benchmark is over. For example:

if (this.currentPhase == null)
// Benchmark is over---wake everyone up so they can terminate

However, Worker relies on BenchmarkState.state being DONE as its indicator that the benchmark is over. For example, this switch assumes that because state is still MEASURE that the benchmark is still running and that the result of getCurrentPhase is safe to access:

switch (postState) {
case MEASURE:
// Non-serial measurement. Only measure if the state both
// before and after was MEASURE, and the phase hasn't
// changed, otherwise we're recording results for a query
// that either started during the warmup phase or ended
// after the timer went off.
Phase postPhase = workloadState.getCurrentPhase();
if (preState == MEASURE && postPhase.getId() == prePhase.getId()) {

The race occurs between the previously mentioned switch in Worker and in ThreadBench where this line advances to the next phase, which in the case of the benchmark ending, puts the current phase to null. If a Worker tries to read the current phase before the state is set to DONE when ThreadBench calls startCoolDown() then benchbase goes boom.

workState.switchToNextPhase();
lowestRate = Integer.MAX_VALUE;
phase = workState.getCurrentPhase();
interruptWorkers();
if (phase == null && !lastEntry) {
// Last phase
lastEntry = true;
testState.startCoolDown();

I believe the easiest fix here is a documented null check in Worker. Maybe there's a better way to sync up the indicators of the benchmark ending between all of these classes, but this seems like the lowest risk change.

Fixes #37.

@mbutrovich mbutrovich changed the title null check postPhase in Worker.run to resolve race condition at terminate null check postPhase in Worker.run to resolve race condition at terminate Jul 20, 2022
@apavlo apavlo self-requested a review July 20, 2022 19:58
Copy link
Member

@apavlo apavlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mbutrovich mbutrovich merged commit 660e2d9 into cmu-db:main Jul 20, 2022
@mbutrovich mbutrovich deleted the null_check_phase branch July 20, 2022 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find and fix race condition during post-phase checks.
2 participants