BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

tswast · 2019-08-06T19:30:11Z

Simple functional tests

Simple correctness: create a table with some initial data, create a read session over the table, and verify that the expected number of rows is returned.
Filtering: create a table with some initial data, create a read session over the table with a push-down filter which excludes some data, and verify that the expected number of rows is returned. (Avro-only)
Column selection: create a table with some initial data, create a read session over the table with a list of columns specified, and verify that the expected columns and rows are returned.
Snapshot test: create a table with some initial data, load some additional data as a separate step, create a read session using the timestamp of the initial load, read the data, and verify that the initial data is returned and the additional data is not returned. (Avro-only)
Column-partitioned table test: create a column-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned. (Avro-only)
Naturally-partitioned table test: create a date-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned.
Data decoding: create a table with at least one field of each type supported by BigQuery -- including numeric, geographic, etc. -- and verify that the fields are decoded successfully.
Resuming a read at an offset. For example, we can have some using the Shakespeare samples table which reads half-way and then resumes.

Many of these tests can be implemented using a pre-created sample table if appropriate -- tests a, b, and c use the Shakespeare samples table in our internal tests, for example.

Long-running tests

Open a set of streams in parallel and read the full contents of a large table -- our google3 internal equivalent uses the Wikipedia sample table (which is about 35GB in size) and runs for between 10 and 20 minutes. This test should detect issues with long-running streams and, over time, should eventually add coverage for transparent stream resumption in Java and Python. [swast] I don't think we want to block client presubmits with a 10-to-20 minute system test. I'll work with @shollyman and the backend team to figure out a more appropriate home for these long-running tests.

This was originally filed internally as bug 133243219.

tswast · 2019-08-06T19:32:24Z

For reference, the Java PRs that add similar tests:

tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquerystorage Issues related to the BigQuery Storage API. labels Aug 6, 2019

tswast assigned plamut Aug 6, 2019

tswast mentioned this issue Aug 6, 2019

BigQuery Storage: Refactor system tests. #8984

Merged

plamut mentioned this issue Aug 8, 2019

BigQuery Storage: Add more in-depth system tests #8992

Merged

2 tasks

plamut closed this as completed in #8992 Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

tswast commented Aug 6, 2019 •

edited

Loading

tswast commented Aug 6, 2019

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

Comments

tswast commented Aug 6, 2019 • edited Loading

tswast commented Aug 6, 2019

tswast commented Aug 6, 2019 •

edited

Loading