BigQuery usage docs do not explain references #4930

inglesp · 2018-02-25T21:19:52Z

The BigQuery usage docs do not explain what TableReferences are, when/why you'd need to use one instead of a Table, and how to get a TableReference from a Table or vice versa.

(And similarly for DatasetReferences and Datasets.)

The text was updated successfully, but these errors were encountered:

tswast · 2018-02-26T20:33:01Z

Yes, this could use some additional explanation in the docs.

The main purpose of TableReference and DatasetReference is to indicate that it is only a pointer to a Table or Dataset. Several properties in the REST API only accept / return a pointer to a table, such as QueryJob.destination.

It is possible to go from a Table to a TableReference with Table.reference. Likewise, to go from a Dataset to a DatasetReference with Dataset.reference.

To go from a reference to a full object, use the client to fetch the full object from the API with get_table() or get_dataset(). If a Table or Dataset does not exist (for example, you want to create one with create_table or create_dataset), the Table and Dataset constructors accept a reference as their argument.

tswast · 2018-02-26T20:35:36Z

Note: the usage docs do have examples for create_table(), get_table(), create_dataset() and get_dataset().

I agree that examples using the table.reference and dataset.reference properties would be helpful.

inglesp · 2018-02-27T09:47:50Z

Thanks for your comments here. It'd be really helpful to have this in the documentation!

max-sixty · 2018-04-21T19:30:18Z

I did find this very confusing. For example, client.dataset(dataset_name) returns a DatasetReference, in spite of its name.

Is this something we're coupled to because of the REST API? Could we at least add options to supply strings, so bigquery.Dataset(dataset_name) returned a dataset?

Overall, the API is extremely class-heavy for a python library. A recent frustrating example was client.list_datasets() doesn't return a list, or even a generator, it returns a google.api_core.page_iterator.HTTPIterator (though if you try and use it as an iterator you get TypeError: HTTPIterator object is not an iterator!)

tswast · 2018-04-21T21:40:46Z

client.dataset(dataset_name) returns a DatasetReference, in spite of its name

It's funny you mention that method. It's probably the only thing that didn't change in the 0.27 to 0.28 rewrite. In 0.27 and earlier, the dataset() method returned a Dataset class but it was really just a reference. Confusingly, even though it was a Dataset none of the properties were populated besides the ID!

Could we at least add options to supply strings, so bigquery.Dataset(dataset_name) returned a dataset?

In my first version of the rewrite, I proposed exactly this (allowing either string or reference), but the number of combinations exploded pretty fast. Some folks on the Datalab / Colab teams gave me some feedback that only allowing references would greatly simplify the implementation (which I do agree it did accomplish that).

For example, one trouble with bigquery.Dataset(dataset_name) is that in that case you don't have a project associated with the dataset because only the client has that info. This would require the API have hooks to handle partial references that get filled with defaults anywhere that the current API can just use the full path from the reference.

Also, yes we are slightly tied to having reference objects because of the REST API. For example, there are 16 instances of TableReference in the Jobs resource alone. https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs In 0.27, the Python API again pretended these were full Table objects but they only had the ID fields populated, which I thought was even more confusing.

Overall, the API is extremely class-heavy for a python library. A recent frustrating example was client.list_datasets() doesn't return a list, or even a generator, it returns a google.api_core.page_iterator.HTTPIterator

True, it's not an iterator. But it is an iterable. For example list(client.list_datasets()) does build a list of values from all pages of the API response.

max-sixty · 2018-04-21T22:48:35Z

Thanks for the reply @tswast

I still don't understand what a DatasetReference has over a 'project.dataset' string - looking through the implementation the only advantage I can see is creating a table from the object. Is it that it could be project:dataset or dataset-with-implicit-project, and we don't want to support all the permutations?

It sounds like you've thought about it a lot, so I pause in humility. But I remain confused why (I think) this is what's currently required to create a dataset, where I would have expected client.create_dataset(name)

dataset = client.create_dataset(
    bigquery.Dataset(
        client.dataset(dataset_name)
    )
)

it's not an iterator. But it is an iterable

Yes that's fair, and my original comment probably wasn't balanced. Still, calling next(dataset_list) and getting an error isn't ideal, even though it's minor

tswast · 2018-04-21T23:29:22Z

Honestly since the REST API separates everything out like

{"projectId": "my-project", "datasetId": "my_dataset"}

it hadn't crossed my mind to accept a fully-qualified dataset ID. I would be open to a PR that modifies Dataset to accept strings like "project.dataset" as an option where there is a Dataset reference.

tswast · 2018-04-27T17:28:49Z

Re: my previous comment.

I've sent #5255 to add Dataset/Table.from_string(fully_qualified_id), which I think will address the concern that it requires too many objects to create a table/dataset.

tseaver · 2018-05-29T17:32:39Z

@tswast With #5255 merged, should this issue be closed?

tswast · 2018-05-29T17:57:47Z

Yeah, I think the combo of #5255 plus #5340 covers this issue well enough.

References are now documented at

chemelnucfin added documentation api: bigquery Issues related to the BigQuery API. type: cleanup An internal cleanup or hygiene concern. labels Feb 26, 2018

chemelnucfin self-assigned this Feb 26, 2018

theacodes assigned tswast Feb 26, 2018

tseaver unassigned chemelnucfin Apr 10, 2018

tswast closed this as completed May 29, 2018

theacodes unassigned tswast Sep 28, 2018

JustinBeckwith assigned tswast Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery usage docs do not explain references #4930

BigQuery usage docs do not explain references #4930

inglesp commented Feb 25, 2018

tswast commented Feb 26, 2018

tswast commented Feb 26, 2018

inglesp commented Feb 27, 2018

max-sixty commented Apr 21, 2018

tswast commented Apr 21, 2018 •

edited

Loading

max-sixty commented Apr 21, 2018 •

edited

Loading

tswast commented Apr 21, 2018

tswast commented Apr 27, 2018 •

edited

Loading

tseaver commented May 29, 2018

tswast commented May 29, 2018

BigQuery usage docs do not explain references #4930

BigQuery usage docs do not explain references #4930

Comments

inglesp commented Feb 25, 2018

tswast commented Feb 26, 2018

tswast commented Feb 26, 2018

inglesp commented Feb 27, 2018

max-sixty commented Apr 21, 2018

tswast commented Apr 21, 2018 • edited Loading

max-sixty commented Apr 21, 2018 • edited Loading

tswast commented Apr 21, 2018

tswast commented Apr 27, 2018 • edited Loading

tseaver commented May 29, 2018

tswast commented May 29, 2018

tswast commented Apr 21, 2018 •

edited

Loading

max-sixty commented Apr 21, 2018 •

edited

Loading

tswast commented Apr 27, 2018 •

edited

Loading