Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Add query_matrix_table an analogue to query_table #14806

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

chrisvittal
Copy link
Collaborator

CHANGELOG: Add query_matrix_table an analogue to query_table

Part of #14499.

Security Assessment

Delete all except the correct answer:

  • This change has no security impact

Impact Description

Increases the query python API surface, but is only a query change.

private val zippedReader = PartitionZippedNativeReader(rowsReader, entriesReader)

def contextType = rowsReader.contextType
def fullRowType = zippedReader.fullRowType
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrick-schultz I'm getting an NPE in initialization, I think it's because this assertion in PartitionReader is running before zippedReader is initialized. Thoughts on resolving that?

abstract class PartitionReader {
assert(fullRowType.hasField(uidFieldName))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying. My first thought is to make the sub-readers lazy vals. Maybe try that for now, and we can think more if there's any more satisfying fix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. It worked. I guess lazy val can force initialization, but val is only truly valid after full initialization of the class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, lazy val foo = init is basically private var _foo = null and def foo = { if (_foo == null) _foo = init; _foo }. So it will be initialized the first time it's accessed, rather than when the class initializer is run. Usually that means it's initialized later than the class, but in this case it happens earlier.

@chrisvittal chrisvittal self-assigned this Feb 3, 2025
@chrisvittal chrisvittal force-pushed the query/query-matrix-table branch 4 times, most recently from e065c6d to 4e0dbe5 Compare February 4, 2025 18:11
@chrisvittal
Copy link
Collaborator Author

I think this is ready for review.

@chrisvittal chrisvittal force-pushed the query/query-matrix-table branch 2 times, most recently from b04b3f5 to cd1d3a0 Compare February 5, 2025 06:21
Comment on lines 1499 to 1533
private[this] class PartitionEntriesNativeIntervalReader(
sm: HailStateManager,
entriesPath: String,
entriesSpec: AbstractTableSpec,
uidFieldName: String,
rowsTableSpec: AbstractTableSpec,
) extends PartitionNativeIntervalReader(sm, entriesPath, entriesSpec, uidFieldName) {
override lazy val partitioner = rowsTableSpec.rowsSpec.partitioner(sm)
}

case class PartitionZippedNativeIntervalReader(
sm: HailStateManager,
mtPath: String,
mtSpec: AbstractMatrixTableSpec,
uidFieldName: String,
) extends PartitionReader {
require(mtSpec.indexed)

// XXX: rows and entries paths are hardcoded, see MatrixTableSpec
private lazy val rowsReader =
PartitionNativeIntervalReader(sm, mtPath + "/rows", mtSpec.rowsSpec, "__dummy")

private lazy val entriesReader =
new PartitionEntriesNativeIntervalReader(
sm,
mtPath + "/entries",
mtSpec.entriesSpec,
uidFieldName,
rowsReader.tableSpec,
)

private lazy val zippedReader = PartitionZippedNativeReader(rowsReader, entriesReader)

def contextType = rowsReader.contextType
def fullRowType = zippedReader.fullRowType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me Chris, I'm not really following what's mandating all these private lazy vals.
Since you're not using PartitionZippedNativeIntervalReader for pattern matching, can you make this a smart constructor that returns an anonymous instance of PartitionReader? Perhaps that might solve the initialisation order issue?

Copy link
Member

@ehigham ehigham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I have a few comments on readability + test parameterisation. Thanks!

hail/python/hail/expr/functions.py Outdated Show resolved Hide resolved
hail/python/test/hail/matrixtable/test_matrix_table.py Outdated Show resolved Hide resolved
@chrisvittal chrisvittal force-pushed the query/query-matrix-table branch 2 times, most recently from 08bfa1a to 4999e0f Compare February 7, 2025 18:37
@chrisvittal chrisvittal force-pushed the query/query-matrix-table branch from 6bc01af to c1840f8 Compare February 10, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants