-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add gap classification function to data quality toolbox #395
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #395 +/- ##
==========================================
- Coverage 91.00% 90.81% -0.19%
==========================================
Files 110 111 +1
Lines 4178 4215 +37
Branches 550 556 +6
==========================================
+ Hits 3802 3828 +26
- Misses 232 242 +10
- Partials 144 145 +1
|
@@ -0,0 +1,75 @@ | |||
# Copyright 2023 Cognite AS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to adjust the year :)
def gaps_classification(x: pd.Series, eps: float = 0.5, min_samples: int = 2, std_thresholds: List[int] = [1, 2, 3]): | ||
"""Gaps Classification. | ||
|
||
Classify gaps in a time series dataset into categories based on duration and statistical properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is going to be visible for charts end-user if you are planning to expose this function. Thus, make sure it is clear and user friendly
|
||
Args: | ||
x: Time series | ||
eps: The maximum distance between samples for clustering in DBSCAN. Defaults to 0.5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should split arguments into
arg: Name
Description
e.g
eps: The maximum distance between samples for clustering in DBSCAN. Defaults to 0.5. | |
eps: The maximum distance between samples | |
The maximum distance between samples for clustering in DBSCAN. Defaults to 0.5. |
|
||
|
||
@check_types | ||
def gaps_classification(x: pd.Series, eps: float = 0.5, min_samples: int = 2, std_thresholds: List[int] = [1, 2, 3]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing return type
std_thresholds: Thresholds for classifying gaps based on standard deviations. Defaults to [1, 2, 3]. | ||
|
||
Returns: | ||
pd.DataFrame: A DataFrame with gap start, gap end, duration, and classification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Charts UI has support only for pd.Series. This also depends if you want to expose this function in Charts UI
Description
Motivation and Context
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Contributor Checklist:
fix: <description>
,feat: <description>
, etc.Reviewer Checklist for Charts compliant functions: