-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse information returned by list_relations_without_caching macro to speed up catalog generation #160
Merged
jtcohen6
merged 10 commits into
dbt-labs:master
from
franloza:feature/93-catalog-generation
Apr 17, 2021
Merged
Parse information returned by list_relations_without_caching macro to speed up catalog generation #160
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
07e4c55
Parse information returned by show table extended
d98588d
Fix linter errors
fe50b05
Add logic when relation is None
2e307f6
Revert previous commit and fix bug
3b54482
Rename method and add unit test
e1bf654
Fix bug in column_index
64870e9
Add test with view
2c6a5d8
Update CHANGELOG.md
7612fcb
Parse statistics
36367e6
Fix inter errors
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
import unittest | ||
|
||
from dbt.adapters.spark import SparkColumn | ||
|
||
|
||
class TestSparkColumn(unittest.TestCase): | ||
|
||
def test_convert_table_stats_with_no_statistics(self): | ||
self.assertDictEqual( | ||
SparkColumn.convert_table_stats(None), | ||
{} | ||
) | ||
|
||
def test_convert_table_stats_with_bytes(self): | ||
self.assertDictEqual( | ||
SparkColumn.convert_table_stats("123456789 bytes"), | ||
{ | ||
'stats:bytes:description': '', | ||
'stats:bytes:include': True, | ||
'stats:bytes:label': 'bytes', | ||
'stats:bytes:value': 123456789 | ||
} | ||
) | ||
|
||
def test_convert_table_stats_with_bytes_and_rows(self): | ||
self.assertDictEqual( | ||
SparkColumn.convert_table_stats("1234567890 bytes, 12345678 rows"), | ||
{ | ||
'stats:bytes:description': '', | ||
'stats:bytes:include': True, | ||
'stats:bytes:label': 'bytes', | ||
'stats:bytes:value': 1234567890, | ||
'stats:rows:description': '', | ||
'stats:rows:include': True, | ||
'stats:rows:label': 'rows', | ||
'stats:rows:value': 12345678 | ||
} | ||
) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here... any chance I could convince you to regex for stats as well? :)
From some (anecdotal) testing, delta tables include a line like
Statistics: 1109049927 bytes
, whereas (e.g.) parquet tables include a line likeStatistics: 1109049927 bytes, 14093476 rows
.The
SparkColumn
object takes atable_stats
argument; we'd just need to adjustconvert_table_stats
to handle the delta table case, which is missing therows
bit after the comma split.Alternatively, if you wanted to do it all in regex, I guess you could:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 This is a great idea! This is useful information and I think it's worth trying to parse it. I'll give it a try and add an extra test for parquet tables. I'll let you know when I have this implemented. (Switching back to WIP)