[FEATURE] Add requirements-dev-lite.txt and update tests/docs #4273

kenwade4 · 2022-02-24T10:39:01Z

To make it as easy as possible for potential contributors to get a dev environment setup, provide a requirements-dev-lite.txt file with fewest dependencies needed.

Changes proposed in this pull request:

Add requirements-dev-lite.txt and update tests/docs
Update the get_dataset and build_sa_validator_with_data funcs to not blow up if various SQL dialects are not
installed/available

Definition of Done

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have run any local integration tests and made sure that nothing is broken.

Also update get_dataset and build_sa_validator_with_data funcs in self_check/util.py to not blow up if various SQL dialects are not installed/available.

netlify · 2022-02-24T10:39:07Z

✔️ Deploy Preview for niobium-lead-7998 ready!

🔨 Explore the source changes: 08503f3

🔍 Inspect the deploy log: https://app.netlify.com/sites/niobium-lead-7998/deploys/621e254c6d7e3a0007f885e6

😎 Browse the preview: https://deploy-preview-4273--niobium-lead-7998.netlify.app

github-actions · 2022-02-24T10:39:21Z

HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖

Please don't forget to add a clear and succinct description of your change under the Develop header in docs_rtd/changelog.rst, if applicable. This will help us with the release process. See the Contribution checklist in the Great Expectations documentation for the type of labels to use!

✨ Thank you! ✨

kenwade4 · 2022-02-24T10:40:40Z

tests/data_context/test_data_context.py

-            "url": "postgresql+psycopg2://username:password@host:65432/database",
-        },
-    )
+    if is_library_loadable(library_name="psycopg2"):


The diff looks funky, but it's just moving that chunk of code under this conditional statement

abegong

Thanks for turning this around so quickly, @kenwade4 . Something like this will be a huge help for new contributors to get started with far less friction.

I imagine that @cdkini and @donaldheppner will have feedback on the specific implementation.

I'd suggest we find people to friction log these instructions across OSSes (e.g. mac, windows, ubuntu linux)
Friction logging in both conda and virtualenv setups would be good, too.
I'll friction log it myself as soon as I get a chance, but that likely won't be until next week.
If I read this right, developers who follow these instructions will be able to test against pandas and sqlite, but not other dialects of SQL, and not Spark. Is that correct? If so, we should make that clear in documentation, and provide steps for developers to progressively add Spark and other dialects to their test matrices.

Other comments inline.

abegong · 2022-02-24T13:54:14Z

great_expectations/self_check/util.py

@@ -385,7 +385,7 @@ def get_dataset(
        return PandasDataset(df, profiler=profiler, caching=caching)

    elif dataset_type == "sqlite":
-        if not create_engine:
+        if not create_engine or not SQLITE_TYPES:


It scares me that we need to make these kinds of changes.

This reinforces my view that we should not be relying on imports to determine which backends to test against. Instead, I would strongly advocate for an explicit config file for this.

When imports fail for those dialects, we are setting the *_TYPES dicts to {}, but later in this function we are just trying to access them with square bracket notation (not .get()). There were already bailout points at if not create_engine and it thought it made sense to bail out at that point if those dicts were empty

…d Spark

Shinnnyshinshin · 2022-02-24T17:32:23Z

docs/contributing/contributing_setup.md

+- postgresql: `psycopg2-binary>=2.7.6`
+- mysql: `PyMySQL>=0.9.3,<0.10`
+- mssql: `pyodbc>=4.0.30` (see step 6 for links on getting the odbc driver on your system first)
+- athena: `pyathena>=1.11`
+- bigquery: `sqlalchemy-bigquery>=1.3.0 google-cloud-secret-manager>=1.0.0 google-cloud-storage>=1.28.0`
+- snowflake: `snowflake-connector-python==2.5.0 snowflake-sqlalchemy>=1.2.3 azure-storage-blob>=12.5.0`
+- redshift: `sqlalchemy-redshift>=0.7.7`
+- teradata: `teradatasqlalchemy==17.0.0.1`
+- oracle: `cx_Oracle`
+- dremio: `sqlalchemy-dremio>=1.2.1 pyarrow>=0.12.0 pyodbc>=4.0.30`


one small suggestion for this @kenwade4 would it be possible to have a reference here to the appropriate requirements.txt file? I imagine this section becoming really quickly out of date otherwise. I'm imagining something similar to what we do with python scripts in our how-to-guides.

```python file=../../../../tests/integration/docusaurus/connecting_to_your_data/database/mssql_yaml_example.py#L3-L6

The requirements listed reference different requirements files and combine them in some instances (i.e. stuff from requirements-dev-base.txt and requirements-dev-sqlalchemy.txt for the bigquery line). The oracle requirement isn't in any of the files since we aren't testing on oracle DBs in CI/CD.

I'd be more inclined to not include that list of individual dialects at all and tell them to "try setting up the full dev environment (as mentioned in step 6) when you're ready for more robust testing of your custom Expectations" or something like that.

I'm pretty sure @donaldheppner is opposed to splitting up the requirements-dev-*.txt files into things like requirements-dev-postgresql.txt etc for dialects (although requirements-dev-spark.txt only has a single line). If we were to split them up, THEN it would make sense to link to specific lines of files with python file=../../../blah

Ok @kenwade4 that makes sense. I guess I keep coming back to the thought of the list becoming out of date. As another idea would it be possible to add a (preferably loud) note to the requirements files as a reminder to update the contributing_setup doc if anything changes?

…ng setup guide

…ements-dev-lite

cdkini

Really nice work - just a few questions I'd love to talk through before approving.

cdkini · 2022-03-01T15:11:32Z

docs/contributing/contributing_setup.md

@@ -132,6 +158,8 @@ Depending on which features of Great Expectations you want to work on, you may w

 * Caution: If another service is using port 3306, Docker may start the container but silently fail to set up the port.

+> If you have a Silicon Mac (M1) this Docker image does not work


Nitpick - maybe we should use a :::warning tag here?

Not familiar with that syntax in markdown. With whatever theme we are using with docusaurus, inline comments like this have a bold yellow background that stands out pretty well.

Would it just become ::warning If you have a Silicon...??

cdkini · 2022-03-01T15:24:24Z

tests/data_context/test_data_context_utils.py

+    try:
+        assert (
+            PasswordMasker.mask_db_url(
+                f"postgresql://scott:tiger@{db_hostname}:65432/mydatabase"
+            )
+            == f"postgresql://scott:***@{db_hostname}:65432/mydatabase"


Instead of this try/except, should we use pytest decorators to skip?

That's following already existing logic that was there. I'm a fan of @pytest.mark.skipif which I added in other places

cdkini · 2022-03-01T15:24:43Z

tests/data_context/test_data_context_utils.py

+    # psycopg2 (if installed in test environment)
+    try:
+        assert (
+            PasswordMasker.mask_db_url(
+                f"postgresql+psycopg2://scott:tiger@{db_hostname}:65432/mydatabase"
+            )
+            == f"postgresql+psycopg2://scott:***@{db_hostname}:65432/mydatabase"


Same here and throughout

cdkini · 2022-03-01T15:25:35Z

tests/datasource/data_connector/test_data_connector_util.py

@@ -15,6 +15,7 @@
    list_gcs_keys,
    map_batch_definition_to_data_reference_string_using_regex,
    map_data_reference_string_to_batch_definition_list_using_regex,
+    storage,


Should we be importing the import from the source code or should we try to import GCS here?

I purposely tried to import storage from the 3 different source files since those were the places where they were failing. The test skip reasons also reflect where storage was failed to be imported.

cdkini · 2022-03-01T15:30:54Z

great_expectations/self_check/util.py

+    try:
+        dialect_classes["sqlite"] = sqlitetypes.dialect
+        dialect_types["sqlite"] = SQLITE_TYPES
+    except AttributeError:
+        pass
+    try:
+        dialect_classes["postgresql"] = postgresqltypes.dialect
+        dialect_types["postgresql"] = POSTGRESQL_TYPES
+    except AttributeError:
+        pass
+    try:
+        dialect_classes["mysql"] = mysqltypes.dialect
+        dialect_types["mysql"] = MYSQL_TYPES
+    except AttributeError:
+        pass
+    try:
+        dialect_classes["mssql"] = mssqltypes.dialect
+        dialect_types["mssql"] = MSSQL_TYPES
+    except AttributeError:
+        pass
+    try:
+        dialect_classes["bigquery"] = sqla_bigquery.BigQueryDialect
+        dialect_types["bigquery"] = BIGQUERY_TYPES
+    except AttributeError:
+        pass


Could we maybe for-loop this? And perhaps add a logging statement?

dialects: List[Tuple[x, y]] = [ ("sqlite", sqlitetypes.dialect, SQLITE_TYPES) ... ] for name, class_, type_ in dialects: try: dialect_classes[name] = class_ dialect_types[name = type_ except AttributeError: logging.debug(...)

What's the AttributeError here? I'm not immediately seeing how that can be triggered.

NoneType has no attribute whatever. A lot of try/except ImportError will just end up setting whatever failed import to None.

Add requirements-dev-lite.txt and update tests/docs

15ec0f7

Also update get_dataset and build_sa_validator_with_data funcs in self_check/util.py to not blow up if various SQL dialects are not installed/available.

kenwade4 commented Feb 24, 2022

View reviewed changes

kenwade4 enabled auto-merge (squash) February 24, 2022 13:37

abegong reviewed Feb 24, 2022

View reviewed changes

Add some info on adding incremental support for other SQL dialects an…

ed0b24a

…d Spark

Shinnnyshinshin reviewed Feb 24, 2022

View reviewed changes

Add links to the various requirements-dev-xxx.txt files in contributi…

6b584b6

…ng setup guide

kenwade4 disabled auto-merge February 24, 2022 18:42

kenwade4 added 2 commits February 28, 2022 09:49

Don't include a list of SQL dialects and package versions

8dfbe84

Merge remote-tracking branch 'origin/develop' into feature/add-requir…

0d56b41

…ements-dev-lite

kenwade4 mentioned this pull request Feb 28, 2022

Uncaught exception on missing SQLAlchemy dialect #4243

Closed

Merge branch 'develop' into feature/add-requirements-dev-lite

08503f3

cdkini reviewed Mar 1, 2022

View reviewed changes

cdkini approved these changes Mar 1, 2022

View reviewed changes

kenwade4 merged commit f28243a into develop Mar 1, 2022

kenwade4 deleted the feature/add-requirements-dev-lite branch March 1, 2022 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add requirements-dev-lite.txt and update tests/docs #4273

[FEATURE] Add requirements-dev-lite.txt and update tests/docs #4273

kenwade4 commented Feb 24, 2022

netlify bot commented Feb 24, 2022 •

edited

Loading

github-actions bot commented Feb 24, 2022

kenwade4 Feb 24, 2022

abegong left a comment

abegong Feb 24, 2022

kenwade4 Feb 24, 2022

Shinnnyshinshin Feb 24, 2022 •

edited

Loading

kenwade4 Feb 24, 2022

Shinnnyshinshin Feb 24, 2022

cdkini left a comment

cdkini Mar 1, 2022

kenwade4 Mar 1, 2022

cdkini Mar 1, 2022

kenwade4 Mar 1, 2022

cdkini Mar 1, 2022

cdkini Mar 1, 2022

kenwade4 Mar 1, 2022

cdkini Mar 1, 2022

kenwade4 Mar 1, 2022

		@@ -132,6 +158,8 @@ Depending on which features of Great Expectations you want to work on, you may w

		* Caution: If another service is using port 3306, Docker may start the container but silently fail to set up the port.

		> If you have a Silicon Mac (M1) this Docker image does not work

[FEATURE] Add requirements-dev-lite.txt and update tests/docs #4273

[FEATURE] Add requirements-dev-lite.txt and update tests/docs #4273

Conversation

kenwade4 commented Feb 24, 2022

Definition of Done

netlify bot commented Feb 24, 2022 • edited Loading

github-actions bot commented Feb 24, 2022

HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖

Choose a reason for hiding this comment

abegong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shinnnyshinshin Feb 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdkini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Feb 24, 2022 •

edited

Loading

Shinnnyshinshin Feb 24, 2022 •

edited

Loading