Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illogical use "full_table_id" #115

Closed
pbkool opened this issue May 19, 2020 · 3 comments
Closed

Illogical use "full_table_id" #115

pbkool opened this issue May 19, 2020 · 3 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: question Request for information or clarification. Not an issue.

Comments

@pbkool
Copy link

pbkool commented May 19, 2020

Hi there! Up front, I am not that experienced with either Python nor the BigQuery API, so this may be intended. But if it is, I do not understand why 😅

I have the following simple script that looks for a given data set, and then loops through tables inside that data set and fires queries. Right now it does a simple select, but I intend to delete some old partitions that are no longer needed.

import os
import logging
from google.cloud import bigquery

logging.basicConfig(level=logging.INFO,format='%(levelname)s %(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "FILE_NAME"

bq = bigquery.Client(project="XXXX")
SEARCH_DATASET = "YYYY"                
MAX_AGE_PARTITIONS = 7

for database in list(bq.list_datasets()):
    if SEARCH_DATASET in database.dataset_id:
        temp_tables_object = bq.list_tables(database.dataset_id)
        for table in temp_tables_object:
            if (str(table.table_id).startswith("p_")
                and "stats" not in str(table.table_id).lower()): 
                    logging.info(  "Currently processing partions for table " + str(table.table_id) + " It contains " + str(len(bq.list_partitions(table))) +
                                    " partitions. With a total size of " + str(round(bq.get_table(table).num_bytes /pow(10,9),2)) + " GB" )
                    logging.info( "Deleting the partitions older than "  + str(MAX_AGE_PARTITIONS))
                    QUERY = (
                                "SELECT * FROM " + 
                                str(bq.get_table(table).full_table_id) + 
                                " where _PARTITIONTIME < TIMESTAMP(DATE_SUB(current_date(), INTERVAL" + str(MAX_AGE_PARTITIONS) + "DAY)) LIMIT 1")
                    query_job = bq.query(QUERY)
                    result = query_job.result()
                    logging.info(result)

I am stuck at the part of str(bq.get_table(table).full_table_id) I expected a table reference such as bigquery-public-data.austin_311.311_request but it returns something like bigquery-public-data:austin_311.311_request (note the : )

And as expected my script fails with google.api_core.exceptions.BadRequest: 400 Syntax error: Unexpected ":" at [1:27]

pip freeze
gcloud==0.18.3
google-api-core==1.17.0
google-auth==1.14.1
google-cloud==0.34.0
google-cloud-bigquery==1.24.0
google-cloud-core==1.3.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0

python --version Python 3.8.1
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label May 19, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 20, 2020
@meredithslota meredithslota added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels May 20, 2020
@IlyaFaer
Copy link

IlyaFaer commented May 22, 2020

@pbkool, thank you for detailing the question.

Yes, full ids are expected to be in a form {project_id}:{dataset_id}.{table_id}, there are unit tests to check if it's working this way. It cames from a backend (see "id" on the datasets docs page), so it's something expected on other languages as well (see Go code for an example).

There are even more symbols used in BigQuery, for example, $ in ingestion-time partitioned tables, and @ for snapshots. These are special symbols to make things easier, I believe, it's the only answer to a question "why it is there?".

Looking at your code snippet, I don't actually think you need to pass a full id into your SQL code, as you're already set project onto your client.

@IlyaFaer
Copy link

@pbkool, take a look at this. Probably it can help you to solve the problem with minimal changes

@tswast
Copy link
Contributor

tswast commented Jun 2, 2020

The : is leftover from the days of "legacy SQL". This field is provided by the backend API, which would consider changing the full table ID format to the standard SQL format to be a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

5 participants