Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs with TPCH implementation #204

Merged
merged 4 commits into from
Jul 8, 2022
Merged

Fix bugs with TPCH implementation #204

merged 4 commits into from
Jul 8, 2022

Conversation

TheoVanderkooy
Copy link
Contributor

This changes fixes a few bugs with the TPCH queries where it does not match the specification. Note: each fix is a separate commit so they can be reviewed independently if that is easier. (I can also split into multiple PRs if preferred, but the changes are pretty small)

  1. For Q16, Q17, and Q19: The brand name in the query is capitalised wrong. It should be 'Brand#MN' (and this is what is generated when loading the data) but was using all-caps 'BRAND#MN'.
  2. For Q12: The query text is wrong in the java code. (it had Q11's query text) The XML files for different "dialects" had the correct query which is why this didn't error due to mismatched number & types of query parameters. (Maybe there is some database where this does produce an error, but I have only been using it with postgres)
  3. For Q11 there are 2 fixes. First it had 'ETHIOPIA' hardcoded for one of the parameters which should be chosen randomly and match the other [NATION] parameter. Second, this query is supposed to depend on the scale factor but it was previously ignoring this (assuming SF=1). To make the scale factor available I stored the workload configuration in the Procedure base class during initialize instead of just the database type, but if there is a better way to access the scale factor I am happy to change this.

…n meant that for at least some databases the correct query text is used anyways, which is why there were not errors from mismatching # and type of query parameters.
had 'ETHIOPIA' hardcoded for one of the nation keys.
@apavlo apavlo self-requested a review July 6, 2022 21:08
Copy link
Member

@apavlo apavlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Can you make it so that we pass in the scale factor as an input parameter to the getStatement() methods instead of storing the WorkloadConfiguration in Procedure?

Comment on lines 59 to 60
protected final <T extends Procedure> T initialize(WorkloadConfiguration workConf) {
this.workConf = workConf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this. It makes it more brittle since WorkloadConfiguration is basically a dumping ground for everything. Let's revert this and only pass in DatabaseType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -63,12 +63,12 @@ protected PreparedStatement getStatement(Connection conn, RandomGenerator rand)
String nation = TPCHUtil.choice(TPCHConstants.N_NAME, rand);

// FRACTION is chosen as 0.0001 / SF
// TODO: we should technically pass dbgen's SF down here somehow
double fraction = 0.0001;
double fraction = 0.0001 / this.workConf.getScaleFactor();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass this in as an input parameter to getStatement()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored it to pass scale factor to getStatement. The changes are all restricted to benchmarks/tpch now instead of changing the generic api.

@TheoVanderkooy TheoVanderkooy requested a review from apavlo July 7, 2022 13:03
Copy link
Member

@apavlo apavlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will merge when it passes.

@apavlo apavlo merged commit bb8f25a into cmu-db:main Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants