Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add parsing of Expectation diagnostics to contrib packaging JSON object #4114

Merged
Merged
Show file tree
Hide file tree
Changes from 73 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
843ed94
feat: add JSON object classes
cdkini Dec 30, 2021
d8dca58
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Dec 30, 2021
b2899d4
feat: add script to JSON parse
cdkini Dec 31, 2021
f549285
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 4, 2022
8dc1775
chore: update enum value per @abegong
cdkini Jan 4, 2022
3d5d815
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 5, 2022
7c62428
refactor: move types into CLI tool
cdkini Jan 5, 2022
e0ca6ad
feat: add diagnostic parsing support to package class
cdkini Jan 5, 2022
232d07d
feat: implement dynamic importing to retrieve Expectations
cdkini Jan 5, 2022
8ff8812
refactor: use issubclass checks
cdkini Jan 5, 2022
2f7e1a5
feat: misc cleanup to ensure proper serialization
cdkini Jan 5, 2022
e443a16
fix: misc fixes to get hook working
cdkini Jan 5, 2022
150a858
chore: update logger calls
cdkini Jan 6, 2022
4ef74e2
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 10, 2022
214b32a
chore: add example Package JSON
cdkini Jan 10, 2022
95ad012
refactor: use enum for social media types
cdkini Jan 10, 2022
5bf7916
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 11, 2022
7439e3b
Merge branch 'develop' into feature/support-for-contrib-packaging-jso…
cdkini Jan 14, 2022
8231c67
Merge branch 'develop' into feature/support-for-contrib-packaging-jso…
cdkini Jan 15, 2022
f9c1edc
Merge branch 'feature/support-for-contrib-packaging-json-object' of g…
cdkini Jan 19, 2022
dc8e996
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 19, 2022
259bd6a
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 24, 2022
e6ef081
feat: make dataclasses frozen
cdkini Jan 24, 2022
c3647bf
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 31, 2022
85c11be
chore: make to_json_file func private
cdkini Jan 31, 2022
57a748f
chore: remove unnecessary imports
cdkini Jan 31, 2022
a824773
refactor: rename types to package
cdkini Jan 31, 2022
fcd2234
chore: add better exception/errors
cdkini Jan 31, 2022
0cfc069
chore: misc comments
cdkini Jan 31, 2022
568a4a1
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Feb 1, 2022
b66bfad
refactor: move serialization methods out of class
cdkini Feb 1, 2022
846eb75
refactor: use Config suffix
cdkini Feb 1, 2022
cb5d6a1
Merge branch 'hackathon-docs' of github.com:great-expectations/great_…
cdkini Feb 1, 2022
9b7127b
refactor: remove Config in place of Manifest
cdkini Feb 1, 2022
0df786e
fix: rename config to manifest
cdkini Feb 2, 2022
05d87cd
feat: start parsing diagnostics
cdkini Feb 2, 2022
1478978
Merge branch 'hackathon-docs' of github.com:great-expectations/great_…
cdkini Feb 2, 2022
243ab37
feat: parse requirements
cdkini Feb 2, 2022
016e1bf
feat: update requirements parsing
cdkini Feb 3, 2022
1a132e9
Merge branch 'hackathon-docs' of github.com:great-expectations/great_…
cdkini Feb 3, 2022
d9e08b5
refactor: make manifest mutable
cdkini Feb 3, 2022
debd6c1
feat: continue impl
cdkini Feb 3, 2022
997b93a
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Feb 6, 2022
11f9e0b
feat: continue update attr feature
cdkini Feb 7, 2022
29daafe
test: add tests for update methods
cdkini Feb 7, 2022
03514f1
chore: revert files back to hackathon-docs state
cdkini Feb 7, 2022
c1ee85d
fix: push unlinted changes
cdkini Feb 7, 2022
713efff
chore: revert docs
cdkini Feb 7, 2022
3a5299e
chore: revert azure yml
cdkini Feb 7, 2022
f6bb8fb
chore: delete extraneous files
cdkini Feb 7, 2022
6993a84
chore: revert additional files
cdkini Feb 7, 2022
6b4adf0
chore: revert final files
cdkini Feb 7, 2022
f6ddb0a
test: add test for static attrs
cdkini Feb 7, 2022
a80f26a
chore: revert cli.py
cdkini Feb 7, 2022
9d5b9ba
feat: exit early with bad requirements path
cdkini Feb 7, 2022
5c92972
feat: ensure enums are JSON serializable
cdkini Feb 7, 2022
4428ab1
feat: add sync cmd
cdkini Feb 7, 2022
2784ff8
chore: update comments
cdkini Feb 7, 2022
991872c
chore: additional cleanup
cdkini Feb 7, 2022
3bb9fe2
chore: misc updates
cdkini Feb 7, 2022
089d7bb
feat: add parsing support for package_info.yml
cdkini Feb 8, 2022
061f864
test: start writing tests for addl parsing
cdkini Feb 8, 2022
824723f
fix: fix filenotfound error issue
cdkini Feb 8, 2022
0a63e6b
feat: write build script for CI/CD uploading to S3
cdkini Feb 8, 2022
c7f37ef
chore: misc updates
cdkini Feb 9, 2022
a1a0145
chore: misc cleanup
cdkini Feb 9, 2022
53285fd
chore: bump black requirement
cdkini Feb 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
14 changes: 9 additions & 5 deletions contrib/cli/great_expectations_contrib/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
init_cmd,
publish_cmd,
read_package_from_file,
write_package_to_disk,
sync_package,
)
from great_expectations_contrib.package import GreatExpectationsContribPackageManifest

Expand Down Expand Up @@ -39,16 +39,20 @@ def init() -> None:
@click.pass_obj
def publish(pkg: GreatExpectationsContribPackageManifest) -> None:
publish_cmd()
pkg.update_package_state()
write_package_to_disk(pkg, PACKAGE_PATH)
sync_package(pkg, PACKAGE_PATH)


@cli.command(help="Check your package to make sure it's met all the requirements")
@click.pass_obj
def check(pkg: GreatExpectationsContribPackageManifest) -> None:
check_cmd()
pkg.update_package_state()
write_package_to_disk(pkg, PACKAGE_PATH)
sync_package(pkg, PACKAGE_PATH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to call sync_package here or just check_cmd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is that the sync action is what does the updating of the attributes (based on the state of the package). While you can invoke sync on its own, it's meant to be a hook that automatically invokes at the end of a given user action.

This keeps the underlying JSON object up to date as the user iterates on their package. Perhaps it's not entirely necessary since we have the CI/CD script to parse these files but I think it's okay for now.



@cli.command(help="Manually sync your package state")
@click.pass_obj
def sync(pkg: GreatExpectationsContribPackageManifest) -> None:
sync_package(pkg, PACKAGE_PATH)
Comment on lines +52 to +55
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the hook that is called after each CLI invocation to update the underlying JSON object. I've exposed the functionality in a command to aid with debugging.



if __name__ == "__main__":
Expand Down
16 changes: 11 additions & 5 deletions contrib/cli/great_expectations_contrib/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def read_package_from_file(path: str) -> GreatExpectationsContribPackageManifest
contents = f.read()

data = json.loads(contents)
logger.info(f"Succesfully read existing package data from {path}")
logger.info(f"Successfully read existing package data from {path}")
return GreatExpectationsContribPackageManifest(**data)


Expand All @@ -176,11 +176,17 @@ def write_package_to_disk(
path: The relative path to the target package JSON file.
"""
json_dict = asdict(package)
to_delete = [key for key, val in json_dict.items() if val is None]
for key in to_delete:
del json_dict[key]

data = json.dumps(json_dict, indent=4)
with open(path, "w") as f:
f.write(data)
logger.info(f"Succesfully wrote state to {path}.")


def sync_package(package: GreatExpectationsContribPackageManifest, path: str) -> None:
"""Evaluate the state of the contributor package and update the existing manifest.

Args:
package: The GreatExpectationsContribPackageManifest you wish to update/sync.
"""
package.update_package_state()
write_package_to_disk(package, path)
170 changes: 126 additions & 44 deletions contrib/cli/great_expectations_contrib/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,89 +5,92 @@
import sys
from dataclasses import dataclass
from enum import Enum
from typing import Any, Dict, List, Optional, Type
from typing import Any, List, Optional, Type

import pkg_resources

from great_expectations.core.expectation_diagnostics.expectation_diagnostics import (
ExpectationDiagnostics,
)
from great_expectations.expectations.expectation import Expectation
from great_expectations.types import SerializableDictDot

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Type alias that will need to be updated to reflect the complex nature of the 'run_diagnostics' return object
Diagnostics = Dict[str, Any]


@dataclass(frozen=True)
class ExpectationCompletenessCheck:
@dataclass
class ExpectationCompletenessCheck(SerializableDictDot):
Comment on lines +25 to +26
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted parity with the dataclasses written by @abegong and @kenwade4 for the diagnostics.

We require mutability since the object has to update it's own underlying state. The immutable approach is difficult since we have certain attributes that are static (like name)

message: str
passed: bool


@dataclass(frozen=True)
class ExpectationCompletenessChecklist:
@dataclass
class ExpectationCompletenessChecklist(SerializableDictDot):
experimental: List[ExpectationCompletenessCheck]
beta: List[ExpectationCompletenessCheck]
production: List[ExpectationCompletenessCheck]


@dataclass(frozen=True)
class PackageCompletenessStatus:
@dataclass
class PackageCompletenessStatus(SerializableDictDot):
concept_only: int
experimental: int
beta: int
production: int
total: int


@dataclass(frozen=True)
class RenderedExpectation:
@dataclass
class RenderedExpectation(SerializableDictDot):
name: str
tags: List[str]
supported: List[str]
status: ExpectationCompletenessChecklist


@dataclass(frozen=True)
class Dependency:
@dataclass
class Dependency(SerializableDictDot):
text: str
link: str
version: Optional[str]
version: Optional[str] = None


@dataclass(frozen=True)
class GitHubUser:
@dataclass
class GitHubUser(SerializableDictDot):
username: str
full_name: Optional[str]
full_name: Optional[str] = None


class SocialLinkType(Enum):
class SocialLinkType(str, Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By inheriting str, this becomes JSON serializable.

TWITTER = "TWITTER"
INSTAGRAM = "INSTAGRAM"
LINKEDIN = "LINKEDIN"
MEDIUM = "MEDIUM"


@dataclass(frozen=True)
class SocialLink:
@dataclass
class SocialLink(SerializableDictDot):
account_type: SocialLinkType
identifier: str


@dataclass(frozen=True)
class DomainExpert:
@dataclass
class DomainExpert(SerializableDictDot):
full_name: str
social_links: List[SocialLink]
picture: str


class Maturity(Enum):
class Maturity(str, Enum):
CONCEPT_ONLY = "CONCEPT_ONLY"
EXPERIMENTAL = "EXPERIMENTAL"
BETA = "BETA"
PRODUCTION = "PRODUCTION"


@dataclass(frozen=True)
class GreatExpectationsContribPackageManifest:
@dataclass
class GreatExpectationsContribPackageManifest(SerializableDictDot):
# Core
package_name: Optional[str] = None
icon: Optional[str] = None
Expand All @@ -110,22 +113,97 @@ def update_package_state(self) -> None:
"""
Parses diagnostic reports from package Expectations and uses them to update JSON state
"""
diagnostics = self._retrieve_package_expectations_diagnostics()
diagnostics = (
GreatExpectationsContribPackageManifest.retrieve_package_expectations_diagnostics()
)
self._update_attrs_with_diagnostics(diagnostics)

def _update_attrs_with_diagnostics(self, diagnostics: List[Diagnostics]) -> None:
# TODO: Write logic to assign values to attrs
# This is a black box for now
# for diagnostic in diagnostics:
# pass
raise NotImplementedError

def _retrieve_package_expectations_diagnostics(self) -> List[Diagnostics]:
def _update_attrs_with_diagnostics(
self, diagnostics: List[ExpectationDiagnostics]
) -> None:
self._update_expectations(diagnostics)
self._update_dependencies("requirements.txt")
self._update_contributors(diagnostics)

def _update_expectations(self, diagnostics: List[ExpectationDiagnostics]) -> None:
expectations = []
status = {maturity.name: 0 for maturity in Maturity}

for diagnostic in diagnostics:
expectation = RenderedExpectation(
name=diagnostic.description.snake_name,
tags=diagnostic.library_metadata.tags,
supported=[],
status=diagnostic.maturity_checklist, # Should be converted to the proper type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abegong it seems as though ExpectationDiagnosticMaturityMessages and ExpectationCompletenessChecklist have quite a bit of overlap. Can we leverage the existing type?

)
expectations.append(expectation)

expectation_maturity = diagnostic.library_metadata.maturity
status[expectation_maturity] += 1

self.expectations = expectations
self.expectation_count = len(expectations)

# Enum is all caps but status attributes are lowercase
lowercase_status = {k.lower(): v for k, v in status.items()}
lowercase_status["total"] = sum(status.values())

self.status = PackageCompletenessStatus(**lowercase_status)
maturity = max(status, key=status.get)
self.maturity = Maturity[maturity]

def _update_dependencies(self, path: str) -> None:
if not os.path.exists(path):
logger.warning(f"Could not find requirements file {path}")
self.dependencies = []
return

with open(path) as f:
requirements = [req for req in pkg_resources.parse_requirements(f)]

def _convert_to_dependency(
requirement: pkg_resources.Requirement,
) -> Dependency:
name = requirement.project_name
pypi_url = f"https://pypi.org/project/{name}"
if requirement.specs:
# Stringify tuple of pins
version = ", ".join(
"".join(symbol for symbol in pin)
for pin in sorted(requirement.specs)
)
else:
version = None
return Dependency(text=name, link=pypi_url, version=version)

dependencies = list(map(_convert_to_dependency, requirements))
self.dependencies = dependencies

def _update_contributors(self, diagnostics: List[ExpectationDiagnostics]) -> None:
contributors = []
for diagnostic in diagnostics:
for contributor in diagnostic.library_metadata.contributors:
github_user = GitHubUser(contributor)
if github_user not in contributors:
contributors.append(github_user)

self.contributors = contributors

@staticmethod
def retrieve_package_expectations_diagnostics() -> List[ExpectationDiagnostics]:
try:
package = self._identify_user_package()
expectations_module = self._import_expectations_module(package)
expectations = self._retrieve_expectations_from_module(expectations_module)
diagnostics = self._gather_diagnostics(expectations)
package = GreatExpectationsContribPackageManifest._identify_user_package()
expectations_module = (
GreatExpectationsContribPackageManifest._import_expectations_module(
package
)
)
expectations = GreatExpectationsContribPackageManifest._retrieve_expectations_from_module(
expectations_module
)
diagnostics = GreatExpectationsContribPackageManifest._gather_diagnostics(
expectations
)
Comment on lines +232 to +243
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these all static since they do not use or modify state.

return diagnostics
except Exception as e:
# Exceptions should not break the CLI - this behavior should be working in the background
Expand All @@ -135,7 +213,8 @@ def _retrieve_package_expectations_diagnostics(self) -> List[Diagnostics]:
)
return []

def _identify_user_package(self) -> str:
@staticmethod
def _identify_user_package() -> str:
# Guaranteed to have a dir named '<MY_PACKAGE>_expectations' through Cookiecutter validation
packages = [
d for d in os.listdir() if os.path.isdir(d) and d.endswith("_expectations")
Expand All @@ -149,7 +228,8 @@ def _identify_user_package(self) -> str:

return packages[0]

def _import_expectations_module(self, package: str) -> Any:
@staticmethod
def _import_expectations_module(package: str) -> Any:
# Need to add user's project to the PYTHONPATH
cwd = os.getcwd()
sys.path.append(cwd)
Expand All @@ -159,8 +239,9 @@ def _import_expectations_module(self, package: str) -> Any:
except ModuleNotFoundError:
raise

@staticmethod
def _retrieve_expectations_from_module(
self, expectations_module: Any
expectations_module: Any,
) -> List[Type[Expectation]]:
expectations: List[Type[Expectation]] = []
names: List[str] = []
Expand All @@ -172,9 +253,10 @@ def _retrieve_expectations_from_module(
logger.info(f"Found {len(names)} expectation(s): {names}")
return expectations

@staticmethod
def _gather_diagnostics(
self, expectations: List[Type[Expectation]]
) -> List[Diagnostics]:
expectations: List[Type[Expectation]],
) -> List[ExpectationDiagnostics]:
diagnostics_list = []
for expectation in expectations:
instance = expectation()
Expand Down
Empty file added contrib/cli/tests/__init__.py
Empty file.
Loading