-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #11357 - Implement profiler custom metric processing #14021
Conversation
|
||
|
||
class ThreadPoolMetrics(ConfigModel): | ||
"""thread pool metric""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's a threadpool metric? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are just the pool of metrics that we submit for computation on the profiler side
@@ -13,7 +13,6 @@ | |||
for the profiler | |||
""" | |||
from sqlalchemy import inspect, or_, text | |||
from trino.sqlalchemy.dialect import TrinoDialect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
"_convert_table_to_list_of_dataframe_objects", | ||
return_value=self.dfs, | ||
): | ||
self.sqa_profiler_interface = PandasProfilerInterface( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add kwargs here instead of passing it without name? Not sure if we are actually supporting that on the interface itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry just to understand the calls a bit better, to have
PandasProfilerInterface(
param1=self.datalake_conn,
param2=None,
param3=table_entity,
...None,
None,
None,
None,
None,
thread_count=1,
not sure if we are forcing this to be positional for any reason
CustomMetric( | ||
name="LastNameFilter", | ||
columnName="id", | ||
expression="'last_name' != Doe", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the language of the expression SQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not. it should return a boolean expression. I'll add it with the documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html
[open-metadata-ingestion] Kudos, SonarCloud Quality Gate passed! |
Describe your changes:
Fixes #11357
ThreadPoolMetrics
model toget_all_metrics
(vs tuple)Type of change:
Checklist:
Fixes <issue-number>: <short explanation>
Improvement