Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomicsDB upgrade fixes #7257

Merged
merged 16 commits into from
Jul 16, 2021
Merged

GenomicsDB upgrade fixes #7257

merged 16 commits into from
Jul 16, 2021

Conversation

mlathara
Copy link
Contributor

@mlathara mlathara commented May 14, 2021

This PR contains the GATK code necessary to enable some features present in the recent GenomicsDB update:

@droazen droazen self-requested a review June 7, 2021 16:37
@droazen droazen self-assigned this Jun 7, 2021
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doWork() in GATKTool needs to remain final, as GATKTools should not be overriding it. Can you refactor to restore the final?

@@ -1051,7 +1051,7 @@ public void onTraversalStart() {}
public Object onTraversalSuccess() { return null; }

@Override
protected final Object doWork() {
protected Object doWork() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of making this non final. We should find a way around it by adding a mechanism to provide a different progress meter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a boolean-valued method that lets you disable the progress meter would suffice here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to final and added a boolean valued method to disable progress meter

logger.info("Done importing batch " + arg.batchCount + "/" + arg.totalBatchCount);
logger.info("List of samples imported in batch " + arg.batchCount + ":");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to output the list of every sample in the batch? That seems very verbose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually requested this -- without a progress meter, knowing which samples you're on at least gives you some sense of where you are, and if something goes wrong you know which samples might be responsible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be debug output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make this debug if that's the consensus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed list of sample to debug

final int sampleCount = sampleNameToVcfPath.size();
final int updatedBatchSize = (batchSize == DEFAULT_ZERO_BATCH_SIZE) ? sampleCount : batchSize;
final int startBatch = (arg.batchCount - 1) * updatedBatchSize;
final int stopBatch = arg.batchCount * updatedBatchSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these calculations could be moved into the batch argument class itself but maybe that's not worth it since they're only used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left these since they're only used here...

int index = 0;
final int sampleCount = sampleNameToVcfPath.size();
final int updatedBatchSize = (batchSize == DEFAULT_ZERO_BATCH_SIZE) ? sampleCount : batchSize;
final int startBatch = (arg.batchCount - 1) * updatedBatchSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we decrement startBatch here but then we increment for the new batch later. Just checking to make sure we're not off by 1 in one direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an off by 1 issue here. Minus 1 here just to get the upper and lower bounds

@mlathara
Copy link
Contributor Author

@droazen @lbergelson can you take another look?

@mlathara mlathara requested review from lbergelson and droazen June 25, 2021 15:12
*
* @return true if this tools wants to disable progress meter output, otherwise false
*/
public boolean disableProgressMeter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues with integrating this new method into the various traversals, but I will address those myself in a separate PR after this is merged.

@droazen droazen merged commit 934a39d into master Jul 16, 2021
@droazen droazen deleted the ml_gatk_updates branch July 16, 2021 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in GenomicsDBImport progressMeter
4 participants