Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #11357 - Implement profiler custom metric processing #14021

Merged
merged 20 commits into from
Nov 17, 2023

Conversation

TeddyCr
Copy link
Contributor

@TeddyCr TeddyCr commented Nov 17, 2023

Describe your changes:

Fixes #11357

  • pass ThreadPoolMetrics model to get_all_metrics (vs tuple)
  • Add min/max support for DL sources
  • add logic to compute custom metrics

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Improvement

  • I have added tests around the new logic.
  • For connector/ingestion changes: I updated the documentation.

@TeddyCr TeddyCr requested a review from a team as a code owner November 17, 2023 14:22
@github-actions github-actions bot added Ingestion backend safe to test Add this label to run secure Github workflows on PRs labels Nov 17, 2023
@TeddyCr TeddyCr enabled auto-merge (squash) November 17, 2023 15:44


class ThreadPoolMetrics(ConfigModel):
"""thread pool metric"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a threadpool metric? 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are just the pool of metrics that we submit for computation on the profiler side

@@ -13,7 +13,6 @@
for the profiler
"""
from sqlalchemy import inspect, or_, text
from trino.sqlalchemy.dialect import TrinoDialect
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

"_convert_table_to_list_of_dataframe_objects",
return_value=self.dfs,
):
self.sqa_profiler_interface = PandasProfilerInterface(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add kwargs here instead of passing it without name? Not sure if we are actually supporting that on the interface itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

Copy link
Collaborator

@pmbrull pmbrull Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry just to understand the calls a bit better, to have

PandasProfilerInterface(
                param1=self.datalake_conn,
                param2=None,
                param3=table_entity,
                ...None,
                None,
                None,
                None,
                None,
                thread_count=1,

not sure if we are forcing this to be positional for any reason

CustomMetric(
name="LastNameFilter",
columnName="id",
expression="'last_name' != Doe",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the language of the expression SQL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not. it should return a boolean expression. I'll add it with the documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html

@TeddyCr TeddyCr merged commit c7ac28f into open-metadata:main Nov 17, 2023
Copy link

[open-metadata-ingestion] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

75.3% 75.3% Coverage
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Processing of Custom Profiler Metrics
2 participants