Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-trino: collect lineage for select/insert and rename table only #756

Merged
merged 9 commits into from
Mar 15, 2022

Conversation

philip-alexiev
Copy link
Contributor

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
@antoniivanov
Copy link
Collaborator

antoniivanov commented Mar 8, 2022

I am not sure if you noticed - the CI tests failed (ci/gitlab/gitlab.com -> Click Details) - https://gitlab.com/vmware-analytics/versatile-data-kit/-/jobs/2176218096

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
@philip-alexiev philip-alexiev self-assigned this Mar 8, 2022
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. You can add a few more tests about some corner cases.

@philip-alexiev
Copy link
Contributor Author

@tozka Thank you for the review and valuable comments.

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
@philip-alexiev philip-alexiev merged commit 15a119c into main Mar 15, 2022
@philip-alexiev philip-alexiev deleted the person/palexiev/trino_lineage branch March 15, 2022 14:48
ivakoleva pushed a commit that referenced this pull request Mar 22, 2022
)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev ([email protected])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants