Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-core: Greenplum support #361

Merged
merged 2 commits into from
Oct 8, 2021
Merged

Conversation

ivakoleva
Copy link
Contributor

Capability to connect to Greenplum database, and execute queries.

A vdk-greenplum plugin introduced, that starts a Greenplum instance in
docker, then runs a sample data job to persist data. Tests verify the
data has been populated, using the newly introduced vdk-core CLI
command "greenplum-query". README.md added, describing on how to use the
plugin, enlist greenplum-related vdk-core configurations, and test
locally.

Testing Done: test_vdk_greenplum_utils added

Signed-off-by: ikoleva [email protected]

Capability to connect to Greenplum database, and execute queries.

A vdk-greenplum plugin introduced, that starts a Greenplum instance in
docker, then runs a sample data job to persist data. Tests verify the
data has been populated, using the newly introduced vdk-core CLI
command "greenplum-query". README.md added, describing on how to use the
plugin, enlist greenplum-related vdk-core configurations, and test
locally.

Testing Done: test_vdk_greenplum_utils added

Signed-off-by: ikoleva <[email protected]>
@antoniivanov
Copy link
Collaborator

antoniivanov commented Oct 8, 2021

Please let us try to post a bit smaller PRs in the spirit of https://github.com/vmware/versatile-data-kit/wiki/How-to-prepare-a-new-PR#break-your-code-commits-into--small-self-contained-units .

This is almost too long (especially for python code, for java it's probably just fine :) )
I know it's rather routine to add new db driver with basic support but still -splitting it into 1. create plugin skeleton 2. create plugin may have been a bit easier to review. Those are just my thoughts . I am fine to leave it as single PR now.

Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@ivakoleva
Copy link
Contributor Author

Please let us try to post a bit smaller PRs in the spirit of https://github.com/vmware/versatile-data-kit/wiki/How-to-prepare-a-new-PR#break-your-code-commits-into--small-self-contained-units .

This is almost too long (especially for python code, for java it's probably just fine :) ) I know it's rather routine to add new db driver with basic support but still -splitting it into 1. create plugin skeleton 2. create plugin may have been a bit easier to review. Those are just my thoughts . I am fine to leave it as single PR now.

What comes to mind is https://mypy.readthedocs.io/en/stable/stubgen.html, so a basic vdk db plugin structure is generated based on an already existing one. Did try it, for me it was easier adapting the vdk-postgres plugin in this particular case

@ivakoleva ivakoleva enabled auto-merge (squash) October 8, 2021 12:35
@ivakoleva ivakoleva merged commit 0a2018c into main Oct 8, 2021
@ivakoleva ivakoleva deleted the person/ikoleva/greenplum_support branch October 8, 2021 12:40
Copy link
Contributor

@gabrielgeorgiev1 gabrielgeorgiev1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great submission overall, all my comments are minor.


# Configuration

Run vdk config-help - search for those prefixed with "GREENPLUM_" to see what configuration options are available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think whether we should extend config-help to allow config variable sorting. If one has 15 or 20 plugins installed, it might be very annoying to have to scroll through them to find the one you're looking for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously unrelated to this PR.

name="vdk-greenplum",
version=__version__,
url="https://github.com/vmware/versatile-data-kit",
description="Versatile Data Kit SDK plugin provides support for Greenplum database and greenplum transformation templates.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
description="Versatile Data Kit SDK plugin provides support for Greenplum database and greenplum transformation templates.",
description="Versatile Data Kit SDK plugin provides support for connecting to a Greenplum database and Greenplum transformation templates.",

description="Versatile Data Kit SDK plugin provides support for Greenplum database and greenplum transformation templates.",
long_description=pathlib.Path("README.md").read_text(),
long_description_content_type="text/markdown",
install_requires=["vdk-core", "psycopg2-binary"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't tabulate also be included here?

root_command.add_command(greenplum_query)


@click.command(name="greenplum-query", help="executes SQL query against Greenplum")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@click.command(name="greenplum-query", help="executes SQL query against Greenplum")
@click.command(name="greenplum-query", help="Executes SQL query against Greenplum.")

os.environ,
{
VDK_DB_DEFAULT_TYPE: "GREENPLUM",
VDK_GREENPLUM_DBNAME: "postgres",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird, unless it's some joke I'm not getting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greenplum is PosgtreSQL-based :) using same postgres default schema

VDK_GREENPLUM_PORT: "5432",
},
)
class GreenplumUtilsTests(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class, and the file as well, referencing Greenplum Utils, when it seems to be testing the connection, and there is no greenplum_utils.py file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite catch that, could you elaborate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants