-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding InclusiveManifestEvaluator and ResidualEvaluator #205
Conversation
self.visitors = ManifestEvalVistor(self.expr) | ||
return self.visitors | ||
|
||
def __init__(self, spec, row_filter, case_sensitive=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Codestyle) I believe __init__
should come first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this, but the reason it's like this is because it's mirroring the ordering in the java implementation:
https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java#L50-L61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't see anything in PEP 8 that defines the method order, so consistency is probably better for now. However, I believe that we may have an issue in that we're only creating one visitor as opposed to just creating a new one every time.
I think the internal self.stats
is holding state which would prevent this from being used in multiple threads. The easy option is to just always return a new instance, the harder option is to figure out the equivalent using threading.local
in python.
def visitor(self): | ||
if not hasattr(self.thread_local_data, "visitors"): | ||
self.thread_local_data.visitors = StrictMetricsEvaluator.MetricsEvalVisitor(self.expr, | ||
self.schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Formatting looks weird here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clean this up in a future PR, I had refactored and didn't line the indent back up.
+1 This is a little different because we're storing the whole visitor instance as a thread local as opposed to just the internal state, but that's effectively the same. LGTM. |
* option-to-inject-kafka-metadata - SMT to add Kafka metadata (topic, partition, offset, timestamp) to Struct and Map types (cherry picked from commit 423b4a8b0f2e42f2dd7de315631e944c285dcb09)
These two evaluators will be needed in DataTableScan class, as well as several other classes in the core module.