Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

Closed
9 tasks
tswast opened this issue Aug 6, 2019 · 1 comment · Fixed by #8992
Closed
9 tasks
Assignees
Labels
api: bigquerystorage Issues related to the BigQuery Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Aug 6, 2019

Simple functional tests

  • Simple correctness: create a table with some initial data, create a read session over the table, and verify that the expected number of rows is returned.
  • Filtering: create a table with some initial data, create a read session over the table with a push-down filter which excludes some data, and verify that the expected number of rows is returned. (Avro-only)
  • Column selection: create a table with some initial data, create a read session over the table with a list of columns specified, and verify that the expected columns and rows are returned.
  • Snapshot test: create a table with some initial data, load some additional data as a separate step, create a read session using the timestamp of the initial load, read the data, and verify that the initial data is returned and the additional data is not returned. (Avro-only)
  • Column-partitioned table test: create a column-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned. (Avro-only)
  • Naturally-partitioned table test: create a date-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned.
  • Data decoding: create a table with at least one field of each type supported by BigQuery -- including numeric, geographic, etc. -- and verify that the fields are decoded successfully.
  • Resuming a read at an offset. For example, we can have some using the Shakespeare samples table which reads half-way and then resumes.

Many of these tests can be implemented using a pre-created sample table if appropriate -- tests a, b, and c use the Shakespeare samples table in our internal tests, for example.

Long-running tests

  • Open a set of streams in parallel and read the full contents of a large table -- our google3 internal equivalent uses the Wikipedia sample table (which is about 35GB in size) and runs for between 10 and 20 minutes. This test should detect issues with long-running streams and, over time, should eventually add coverage for transparent stream resumption in Java and Python. [swast] I don't think we want to block client presubmits with a 10-to-20 minute system test. I'll work with @shollyman and the backend team to figure out a more appropriate home for these long-running tests.

This was originally filed internally as bug 133243219.

@tswast tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquerystorage Issues related to the BigQuery Storage API. labels Aug 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the BigQuery Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants