Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add parsing of Expectation diagnostics to contrib packaging JSON object #4114

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
843ed94
feat: add JSON object classes
cdkini Dec 30, 2021
d8dca58
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Dec 30, 2021
b2899d4
feat: add script to JSON parse
cdkini Dec 31, 2021
f549285
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 4, 2022
8dc1775
chore: update enum value per @abegong
cdkini Jan 4, 2022
3d5d815
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 5, 2022
7c62428
refactor: move types into CLI tool
cdkini Jan 5, 2022
e0ca6ad
feat: add diagnostic parsing support to package class
cdkini Jan 5, 2022
232d07d
feat: implement dynamic importing to retrieve Expectations
cdkini Jan 5, 2022
8ff8812
refactor: use issubclass checks
cdkini Jan 5, 2022
2f7e1a5
feat: misc cleanup to ensure proper serialization
cdkini Jan 5, 2022
e443a16
fix: misc fixes to get hook working
cdkini Jan 5, 2022
150a858
chore: update logger calls
cdkini Jan 6, 2022
4ef74e2
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 10, 2022
214b32a
chore: add example Package JSON
cdkini Jan 10, 2022
95ad012
refactor: use enum for social media types
cdkini Jan 10, 2022
5bf7916
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 11, 2022
7439e3b
Merge branch 'develop' into feature/support-for-contrib-packaging-jso…
cdkini Jan 14, 2022
8231c67
Merge branch 'develop' into feature/support-for-contrib-packaging-jso…
cdkini Jan 15, 2022
f9c1edc
Merge branch 'feature/support-for-contrib-packaging-json-object' of g…
cdkini Jan 19, 2022
dc8e996
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 19, 2022
259bd6a
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 24, 2022
e6ef081
feat: make dataclasses frozen
cdkini Jan 24, 2022
c3647bf
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Jan 31, 2022
85c11be
chore: make to_json_file func private
cdkini Jan 31, 2022
57a748f
chore: remove unnecessary imports
cdkini Jan 31, 2022
a824773
refactor: rename types to package
cdkini Jan 31, 2022
fcd2234
chore: add better exception/errors
cdkini Jan 31, 2022
0cfc069
chore: misc comments
cdkini Jan 31, 2022
568a4a1
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Feb 1, 2022
b66bfad
refactor: move serialization methods out of class
cdkini Feb 1, 2022
846eb75
refactor: use Config suffix
cdkini Feb 1, 2022
cb5d6a1
Merge branch 'hackathon-docs' of github.com:great-expectations/great_…
cdkini Feb 1, 2022
9b7127b
refactor: remove Config in place of Manifest
cdkini Feb 1, 2022
0df786e
fix: rename config to manifest
cdkini Feb 2, 2022
05d87cd
feat: start parsing diagnostics
cdkini Feb 2, 2022
1478978
Merge branch 'hackathon-docs' of github.com:great-expectations/great_…
cdkini Feb 2, 2022
243ab37
feat: parse requirements
cdkini Feb 2, 2022
016e1bf
feat: update requirements parsing
cdkini Feb 3, 2022
1a132e9
Merge branch 'hackathon-docs' of github.com:great-expectations/great_…
cdkini Feb 3, 2022
d9e08b5
refactor: make manifest mutable
cdkini Feb 3, 2022
debd6c1
feat: continue impl
cdkini Feb 3, 2022
997b93a
Merge branch 'develop' of github.com:great-expectations/great_expecta…
cdkini Feb 6, 2022
11f9e0b
feat: continue update attr feature
cdkini Feb 7, 2022
29daafe
test: add tests for update methods
cdkini Feb 7, 2022
03514f1
chore: revert files back to hackathon-docs state
cdkini Feb 7, 2022
c1ee85d
fix: push unlinted changes
cdkini Feb 7, 2022
713efff
chore: revert docs
cdkini Feb 7, 2022
3a5299e
chore: revert azure yml
cdkini Feb 7, 2022
f6bb8fb
chore: delete extraneous files
cdkini Feb 7, 2022
6993a84
chore: revert additional files
cdkini Feb 7, 2022
6b4adf0
chore: revert final files
cdkini Feb 7, 2022
f6ddb0a
test: add test for static attrs
cdkini Feb 7, 2022
a80f26a
chore: revert cli.py
cdkini Feb 7, 2022
9d5b9ba
feat: exit early with bad requirements path
cdkini Feb 7, 2022
5c92972
feat: ensure enums are JSON serializable
cdkini Feb 7, 2022
4428ab1
feat: add sync cmd
cdkini Feb 7, 2022
2784ff8
chore: update comments
cdkini Feb 7, 2022
991872c
chore: additional cleanup
cdkini Feb 7, 2022
3bb9fe2
chore: misc updates
cdkini Feb 7, 2022
089d7bb
feat: add parsing support for package_info.yml
cdkini Feb 8, 2022
061f864
test: start writing tests for addl parsing
cdkini Feb 8, 2022
824723f
fix: fix filenotfound error issue
cdkini Feb 8, 2022
0a63e6b
feat: write build script for CI/CD uploading to S3
cdkini Feb 8, 2022
c7f37ef
chore: misc updates
cdkini Feb 9, 2022
a1a0145
chore: misc cleanup
cdkini Feb 9, 2022
53285fd
chore: bump black requirement
cdkini Feb 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions assets/scripts/build_package_gallery.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Purpose: Aggregate all contrib packages into a single JSON file to populate the gallery
#
# The generated file is sent to S3 through our CI/CD to be rendered on the front-end.


import json
import logging
import os
from dataclasses import asdict
from typing import List

from contrib.cli.great_expectations_contrib.commands import (
read_package_from_file,
sync_package,
)
from contrib.cli.great_expectations_contrib.package import (
GreatExpectationsContribPackageManifest,
)

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def gather_all_contrib_package_paths() -> List[str]:
"""Iterate through contrib/ and identify the relative paths to all contrib packages.

A contrib package is defined by the existence of a .great_expectations_package.json file.

Returns:
List of relative paths pointing to contrib packages
"""
package_paths: List[str] = []
for root, _, files in os.walk("contrib/"):
for file in files:
if file == ".great_expectations_package.json":
package_paths.append(root)

logger.info(f"Found {len(package_paths)} contrib packages")
return package_paths


def gather_all_package_manifests(package_paths: List[str]) -> List[dict]:
"""Takes a list of relative paths to contrib packages and collects dictionaries to represent package state.

Args:
package_paths: A list of relative paths point to contrib packages

Returns:
A list of dictionaries that represents contributor package manifests
"""
payload: List[dict] = []
root = os.getcwd()
for path in package_paths:
try:
# Go to package, read manifest, and sync it
os.chdir(path)
package: GreatExpectationsContribPackageManifest = read_package_from_file(
path
)
sync_package(package, path)

# Serialize to dict to append to payload
json_data: dict = asdict(package)
payload.append(json_data)
logger.info(
f"Successfully serialized {package.package_name} to dict and appended to manifest list"
)
except Exception as e:
logger.warning(
f"Something went wrong when syncing {path} and serializing to dict: {e}"
)
finally:
# Always ensure we revert back to the project root
os.chdir(root)

return payload


def write_results_to_disk(path: str, package_manifests: List[dict]) -> None:
"""Take the list of package manifests and write to JSON file.

Args:
path: The relative path to write to
package_manifest: A list of dictionaries that represents contributor package manifests
"""
with open(path, "w") as outfile:
json.dump(package_manifests, outfile)
logger.info(f"Successfully wrote package manifests to {path}")


if __name__ == "__main__":
package_paths = gather_all_contrib_package_paths()
payload = gather_all_package_manifests(package_paths)
write_results_to_disk("./package_manifests.json", payload)
Empty file added contrib/cli/__init__.py
Empty file.
Empty file.
14 changes: 9 additions & 5 deletions contrib/cli/great_expectations_contrib/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
init_cmd,
publish_cmd,
read_package_from_file,
write_package_to_disk,
sync_package,
)
from great_expectations_contrib.package import GreatExpectationsContribPackageManifest

Expand Down Expand Up @@ -39,16 +39,20 @@ def init() -> None:
@click.pass_obj
def publish(pkg: GreatExpectationsContribPackageManifest) -> None:
publish_cmd()
pkg.update_package_state()
write_package_to_disk(pkg, PACKAGE_PATH)
sync_package(pkg, PACKAGE_PATH)


@cli.command(help="Check your package to make sure it's met all the requirements")
@click.pass_obj
def check(pkg: GreatExpectationsContribPackageManifest) -> None:
check_cmd()
pkg.update_package_state()
write_package_to_disk(pkg, PACKAGE_PATH)
sync_package(pkg, PACKAGE_PATH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to call sync_package here or just check_cmd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is that the sync action is what does the updating of the attributes (based on the state of the package). While you can invoke sync on its own, it's meant to be a hook that automatically invokes at the end of a given user action.

This keeps the underlying JSON object up to date as the user iterates on their package. Perhaps it's not entirely necessary since we have the CI/CD script to parse these files but I think it's okay for now.



@cli.command(help="Manually sync your package state")
@click.pass_obj
def sync(pkg: GreatExpectationsContribPackageManifest) -> None:
sync_package(pkg, PACKAGE_PATH)
Comment on lines +52 to +55
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the hook that is called after each CLI invocation to update the underlying JSON object. I've exposed the functionality in a command to aid with debugging.



if __name__ == "__main__":
Expand Down
16 changes: 11 additions & 5 deletions contrib/cli/great_expectations_contrib/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def read_package_from_file(path: str) -> GreatExpectationsContribPackageManifest
contents = f.read()

data = json.loads(contents)
logger.info(f"Succesfully read existing package data from {path}")
logger.info(f"Successfully read existing package data from {path}")
return GreatExpectationsContribPackageManifest(**data)


Expand All @@ -176,11 +176,17 @@ def write_package_to_disk(
path: The relative path to the target package JSON file.
"""
json_dict = asdict(package)
to_delete = [key for key, val in json_dict.items() if val is None]
for key in to_delete:
del json_dict[key]

data = json.dumps(json_dict, indent=4)
with open(path, "w") as f:
f.write(data)
logger.info(f"Succesfully wrote state to {path}.")


def sync_package(package: GreatExpectationsContribPackageManifest, path: str) -> None:
"""Evaluate the state of the contributor package and update the existing manifest.

Args:
package: The GreatExpectationsContribPackageManifest you wish to update/sync.
"""
package.update_package_state()
write_package_to_disk(package, path)
Loading