Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add gap classification function to data quality toolbox #395

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

karapeet95
Copy link

@karapeet95 karapeet95 commented Jan 22, 2025

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which improves implementation)
  • Performance (non-breaking change which improves performance. Please add associated performance test and results)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Non-functional change (xml comments/documentation/etc)

Contributor Checklist:

  • My code follows the code style of this project.
  • I have added an example of my new feature and included it in the documentation.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • My Pull Request name follows the naming convention fix: <description>, feat: <description>, etc.

Reviewer Checklist for Charts compliant functions:

  • The docstrings of the new function follow the contributing guidelines.
  • The new function is professionally documented
  • The new function and associated scripts are covered by one or more unit tests and code coverage did not decrease.
  • The new function is accompanied by an example and it is included in the Gallery of Charts.
  • The new function is reviewed in Chromatic. Access the storybook build results url and comment, approve or deny.
  • All function inputs, arguments, and outputs have a supported data type and have human readable names.
  • No code language is included in the description of the function or parameters (e.g use "polynomial order" instead of "poly_order")

@karapeet95 karapeet95 requested a review from a team as a code owner January 22, 2025 14:27
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 70.27027% with 11 lines in your changes missing coverage. Please review.

Project coverage is 90.81%. Comparing base (53651e0) to head (3da4c88).

Files with missing lines Patch % Lines
indsl/data_quality/gaps_classification.py 70.27% 10 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #395      +/-   ##
==========================================
- Coverage   91.00%   90.81%   -0.19%     
==========================================
  Files         110      111       +1     
  Lines        4178     4215      +37     
  Branches      550      556       +6     
==========================================
+ Hits         3802     3828      +26     
- Misses        232      242      +10     
- Partials      144      145       +1     
Files with missing lines Coverage Δ
indsl/data_quality/gaps_classification.py 70.27% <70.27%> (ø)

Copy link

github-actions bot commented Jan 22, 2025

Unit Test Results

    18 files  ± 0      18 suites  ±0   32m 33s ⏱️ -41s
 1 189 tests + 7   1 189 ✅ + 7  0 💤 ±0  0 ❌ ±0 
15 345 runs  +63  15 337 ✅ +64  8 💤  - 1  0 ❌ ±0 

Results for commit 3da4c88. ± Comparison against base commit 53651e0.

♻️ This comment has been updated with latest results.

@@ -0,0 +1,75 @@
# Copyright 2023 Cognite AS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to adjust the year :)

def gaps_classification(x: pd.Series, eps: float = 0.5, min_samples: int = 2, std_thresholds: List[int] = [1, 2, 3]):
"""Gaps Classification.

Classify gaps in a time series dataset into categories based on duration and statistical properties.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is going to be visible for charts end-user if you are planning to expose this function. Thus, make sure it is clear and user friendly


Args:
x: Time series
eps: The maximum distance between samples for clustering in DBSCAN. Defaults to 0.5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should split arguments into

arg: Name
   Description

e.g

Suggested change
eps: The maximum distance between samples for clustering in DBSCAN. Defaults to 0.5.
eps: The maximum distance between samples
The maximum distance between samples for clustering in DBSCAN. Defaults to 0.5.



@check_types
def gaps_classification(x: pd.Series, eps: float = 0.5, min_samples: int = 2, std_thresholds: List[int] = [1, 2, 3]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing return type

std_thresholds: Thresholds for classifying gaps based on standard deviations. Defaults to [1, 2, 3].

Returns:
pd.DataFrame: A DataFrame with gap start, gap end, duration, and classification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charts UI has support only for pd.Series. This also depends if you want to expose this function in Charts UI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants