Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding InclusiveManifestEvaluator and ResidualEvaluator #205

Merged
merged 1 commit into from
Jun 5, 2019

Conversation

TGooch44
Copy link
Contributor

@TGooch44 TGooch44 commented Jun 4, 2019

These two evaluators will be needed in DataTableScan class, as well as several other classes in the core module.

self.visitors = ManifestEvalVistor(self.expr)
return self.visitors

def __init__(self, spec, row_filter, case_sensitive=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Codestyle) I believe __init__ should come first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this, but the reason it's like this is because it's mirroring the ordering in the java implementation:
https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java#L50-L61

Copy link
Contributor

@danielcweeks danielcweeks Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see anything in PEP 8 that defines the method order, so consistency is probably better for now. However, I believe that we may have an issue in that we're only creating one visitor as opposed to just creating a new one every time.

I think the internal self.stats is holding state which would prevent this from being used in multiple threads. The easy option is to just always return a new instance, the harder option is to figure out the equivalent using threading.local in python.

def visitor(self):
if not hasattr(self.thread_local_data, "visitors"):
self.thread_local_data.visitors = StrictMetricsEvaluator.MetricsEvalVisitor(self.expr,
self.schema,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Formatting looks weird here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean this up in a future PR, I had refactored and didn't line the indent back up.

@danielcweeks
Copy link
Contributor

+1

This is a little different because we're storing the whole visitor instance as a thread local as opposed to just the internal state, but that's effectively the same. LGTM.

@danielcweeks danielcweeks merged commit f2e8b78 into apache:master Jun 5, 2019
ismailsimsek pushed a commit to ismailsimsek/iceberg that referenced this pull request Jan 15, 2025
* option-to-inject-kafka-metadata

- SMT to add Kafka metadata (topic, partition, offset, timestamp) to Struct and Map types

(cherry picked from commit 423b4a8b0f2e42f2dd7de315631e944c285dcb09)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants