Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add requirements-dev-lite.txt and update tests/docs #4273

Merged
merged 6 commits into from
Mar 1, 2022

Conversation

kenwade4
Copy link
Contributor

To make it as easy as possible for potential contributors to get a dev environment setup, provide a requirements-dev-lite.txt file with fewest dependencies needed.

Changes proposed in this pull request:

  • Add requirements-dev-lite.txt and update tests/docs
  • Update the get_dataset and build_sa_validator_with_data funcs to not blow up if various SQL dialects are not
    installed/available

Definition of Done

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have run any local integration tests and made sure that nothing is broken.

Also update get_dataset and build_sa_validator_with_data funcs in
self_check/util.py to not blow up if various SQL dialects are not
installed/available.
@netlify
Copy link

netlify bot commented Feb 24, 2022

✔️ Deploy Preview for niobium-lead-7998 ready!

🔨 Explore the source changes: 08503f3

🔍 Inspect the deploy log: https://app.netlify.com/sites/niobium-lead-7998/deploys/621e254c6d7e3a0007f885e6

😎 Browse the preview: https://deploy-preview-4273--niobium-lead-7998.netlify.app

@github-actions
Copy link
Contributor

HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖

Please don't forget to add a clear and succinct description of your change under the Develop header in docs_rtd/changelog.rst, if applicable. This will help us with the release process. See the Contribution checklist in the Great Expectations documentation for the type of labels to use!

Thank you!

"url": "postgresql+psycopg2://username:password@host:65432/database",
},
)
if is_library_loadable(library_name="psycopg2"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff looks funky, but it's just moving that chunk of code under this conditional statement

@kenwade4 kenwade4 enabled auto-merge (squash) February 24, 2022 13:37
Copy link
Contributor

@abegong abegong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for turning this around so quickly, @kenwade4 . Something like this will be a huge help for new contributors to get started with far less friction.

I imagine that @cdkini and @donaldheppner will have feedback on the specific implementation.

  • I'd suggest we find people to friction log these instructions across OSSes (e.g. mac, windows, ubuntu linux)
  • Friction logging in both conda and virtualenv setups would be good, too.
  • I'll friction log it myself as soon as I get a chance, but that likely won't be until next week.
  • If I read this right, developers who follow these instructions will be able to test against pandas and sqlite, but not other dialects of SQL, and not Spark. Is that correct? If so, we should make that clear in documentation, and provide steps for developers to progressively add Spark and other dialects to their test matrices.

Other comments inline.

@@ -385,7 +385,7 @@ def get_dataset(
return PandasDataset(df, profiler=profiler, caching=caching)

elif dataset_type == "sqlite":
if not create_engine:
if not create_engine or not SQLITE_TYPES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It scares me that we need to make these kinds of changes.

This reinforces my view that we should not be relying on imports to determine which backends to test against. Instead, I would strongly advocate for an explicit config file for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When imports fail for those dialects, we are setting the *_TYPES dicts to {}, but later in this function we are just trying to access them with square bracket notation (not .get()). There were already bailout points at if not create_engine and it thought it made sense to bail out at that point if those dicts were empty

Comment on lines 76 to 85
- postgresql: `psycopg2-binary>=2.7.6`
- mysql: `PyMySQL>=0.9.3,<0.10`
- mssql: `pyodbc>=4.0.30` (see step 6 for links on getting the odbc driver on your system first)
- athena: `pyathena>=1.11`
- bigquery: `sqlalchemy-bigquery>=1.3.0 google-cloud-secret-manager>=1.0.0 google-cloud-storage>=1.28.0`
- snowflake: `snowflake-connector-python==2.5.0 snowflake-sqlalchemy>=1.2.3 azure-storage-blob>=12.5.0`
- redshift: `sqlalchemy-redshift>=0.7.7`
- teradata: `teradatasqlalchemy==17.0.0.1`
- oracle: `cx_Oracle`
- dremio: `sqlalchemy-dremio>=1.2.1 pyarrow>=0.12.0 pyodbc>=4.0.30`
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small suggestion for this @kenwade4 would it be possible to have a reference here to the appropriate requirements.txt file? I imagine this section becoming really quickly out of date otherwise. I'm imagining something similar to what we do with python scripts in our how-to-guides.

```python file=../../../../tests/integration/docusaurus/connecting_to_your_data/database/mssql_yaml_example.py#L3-L6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements listed reference different requirements files and combine them in some instances (i.e. stuff from requirements-dev-base.txt and requirements-dev-sqlalchemy.txt for the bigquery line). The oracle requirement isn't in any of the files since we aren't testing on oracle DBs in CI/CD.

I'd be more inclined to not include that list of individual dialects at all and tell them to "try setting up the full dev environment (as mentioned in step 6) when you're ready for more robust testing of your custom Expectations" or something like that.

I'm pretty sure @donaldheppner is opposed to splitting up the requirements-dev-*.txt files into things like requirements-dev-postgresql.txt etc for dialects (although requirements-dev-spark.txt only has a single line). If we were to split them up, THEN it would make sense to link to specific lines of files with python file=../../../blah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @kenwade4 that makes sense. I guess I keep coming back to the thought of the list becoming out of date. As another idea would it be possible to add a (preferably loud) note to the requirements files as a reminder to update the contributing_setup doc if anything changes?

@kenwade4 kenwade4 disabled auto-merge February 24, 2022 18:42
Copy link
Contributor

@cdkini cdkini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work - just a few questions I'd love to talk through before approving.

@@ -132,6 +158,8 @@ Depending on which features of Great Expectations you want to work on, you may w

* Caution: If another service is using port 3306, Docker may start the container but silently fail to set up the port.

> If you have a Silicon Mac (M1) this Docker image does not work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - maybe we should use a :::warning tag here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with that syntax in markdown. With whatever theme we are using with docusaurus, inline comments like this have a bold yellow background that stands out pretty well.

Would it just become ::warning If you have a Silicon...??

Comment on lines +62 to +67
try:
assert (
PasswordMasker.mask_db_url(
f"postgresql://scott:tiger@{db_hostname}:65432/mydatabase"
)
== f"postgresql://scott:***@{db_hostname}:65432/mydatabase"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this try/except, should we use pytest decorators to skip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's following already existing logic that was there. I'm a fan of @pytest.mark.skipif which I added in other places

Comment on lines +86 to +92
# psycopg2 (if installed in test environment)
try:
assert (
PasswordMasker.mask_db_url(
f"postgresql+psycopg2://scott:tiger@{db_hostname}:65432/mydatabase"
)
== f"postgresql+psycopg2://scott:***@{db_hostname}:65432/mydatabase"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and throughout

@@ -15,6 +15,7 @@
list_gcs_keys,
map_batch_definition_to_data_reference_string_using_regex,
map_data_reference_string_to_batch_definition_list_using_regex,
storage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be importing the import from the source code or should we try to import GCS here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely tried to import storage from the 3 different source files since those were the places where they were failing. The test skip reasons also reflect where storage was failed to be imported.

Comment on lines +972 to +996
try:
dialect_classes["sqlite"] = sqlitetypes.dialect
dialect_types["sqlite"] = SQLITE_TYPES
except AttributeError:
pass
try:
dialect_classes["postgresql"] = postgresqltypes.dialect
dialect_types["postgresql"] = POSTGRESQL_TYPES
except AttributeError:
pass
try:
dialect_classes["mysql"] = mysqltypes.dialect
dialect_types["mysql"] = MYSQL_TYPES
except AttributeError:
pass
try:
dialect_classes["mssql"] = mssqltypes.dialect
dialect_types["mssql"] = MSSQL_TYPES
except AttributeError:
pass
try:
dialect_classes["bigquery"] = sqla_bigquery.BigQueryDialect
dialect_types["bigquery"] = BIGQUERY_TYPES
except AttributeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe for-loop this? And perhaps add a logging statement?

dialects: List[Tuple[x, y]] = [
    ("sqlite", sqlitetypes.dialect, SQLITE_TYPES)
    ...
]

for name, class_, type_ in dialects:
    try:
        dialect_classes[name] = class_
        dialect_types[name = type_
    except AttributeError:
        logging.debug(...)

What's the AttributeError here? I'm not immediately seeing how that can be triggered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoneType has no attribute whatever. A lot of try/except ImportError will just end up setting whatever failed import to None.

@kenwade4 kenwade4 merged commit f28243a into develop Mar 1, 2022
@kenwade4 kenwade4 deleted the feature/add-requirements-dev-lite branch March 1, 2022 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants