Skip to content

Commit

Permalink
Merge pull request #1016 from tseaver/bigquery-devx-jobs-export_copy
Browse files Browse the repository at this point in the history
Add examples for browsing / copying / exporting table data.
  • Loading branch information
tseaver committed Jul 28, 2015
2 parents ee5039f + 2624822 commit 25545c8
Show file tree
Hide file tree
Showing 3 changed files with 131 additions and 5 deletions.
132 changes: 129 additions & 3 deletions docs/bigquery-usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,23 @@ Update all writable metadata for a table
... SchemaField(name='age', type='int', mode='required)]
>>> table.update() # API request

Get rows from a table's data:

.. doctest::

>>> from gcloud import bigquery
>>> client = bigquery.Client()
>>> dataset = client.dataset('dataset_name')
>>> table = dataset.table(name='person_ages')
>>> rows, next_page_token = table.data(max_results=100) # API request
>>> rows.csv.headers
('full_name', 'age')
>>> list(rows.csv)
[('Abel Adamson', 27), ('Beverly Bowman', 33)]
>>> for row in rows:
... for field, value in zip(table.schema, row):
... do_something(field, value)

Delete a table:

.. doctest::
Expand Down Expand Up @@ -307,7 +324,7 @@ Background a query, loading the results into a table:
>>> job.job_id
'e3344fba-09df-4ae0-8337-fddee34b3840'
>>> job.type
'load'
'query'
>>> job.created
None
>>> job.state
Expand Down Expand Up @@ -377,8 +394,8 @@ Inserting data (asynchronous)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Start a job loading data asynchronously from a set of CSV files, located on
GCloud Storage, appending rows into an existing table. First, create the job
locally:
Google Cloud Storage, appending rows into an existing table. First, create
the job locally:

.. doctest::

Expand Down Expand Up @@ -429,3 +446,112 @@ Poll until the job is complete:
'done'
>>> job.ended
datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)

Exporting data (async)
~~~~~~~~~~~~~~~~~~~~~~

Start a job exporting a table's data asynchronously to a set of CSV files,
located on Google Cloud Storage. First, create the job locally:

.. doctest::

>>> from gcloud import bigquery
>>> client = bigquery.Client()
>>> table = dataset.table(name='person_ages')
>>> job = table.export_to_storage(bucket_name='bucket-name',
... object_name_glob='export-prefix*.csv',
... destination_format='CSV',
... print_header=1,
... write_disposition='truncate')
>>> job.job_id
'e3344fba-09df-4ae0-8337-fddee34b3840'
>>> job.type
'load'
>>> job.created
None
>>> job.state
None

.. note::

- ``gcloud.bigquery`` generates a UUID for each job.
- The ``created`` and ``state`` fields are not set until the job
is submitted to the BigQuery back-end.

Then, begin executing the job on the server:

.. doctest::

>>> job.submit() # API call
>>> job.created
datetime.datetime(2015, 7, 23, 9, 30, 20, 268260, tzinfo=<UTC>)
>>> job.state
'running'

Poll until the job is complete:

.. doctest::

>>> import time
>>> retry_count = 100
>>> while retry_count > 0 and job.state == 'running':
... retry_count -= 1
... time.sleep(10)
... job.reload() # API call
>>> job.state
'done'
>>> job.ended
datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)


Copy tables (async)
~~~~~~~~~~~~~~~~~~~

First, create the job locally:

.. doctest::

>>> from gcloud import bigquery
>>> client = bigquery.Client()
>>> source_table = dataset.table(name='person_ages')
>>> destination_table = dataset.table(name='person_ages_copy')
>>> job = source_table.copy_to(destination_table) # API request
>>> job.job_id
'e3344fba-09df-4ae0-8337-fddee34b3840'
>>> job.type
'copy'
>>> job.created
None
>>> job.state
None

.. note::

- ``gcloud.bigquery`` generates a UUID for each job.
- The ``created`` and ``state`` fields are not set until the job
is submitted to the BigQuery back-end.

Then, begin executing the job on the server:

.. doctest::

>>> job.submit() # API call
>>> job.created
datetime.datetime(2015, 7, 23, 9, 30, 20, 268260, tzinfo=<UTC>)
>>> job.state
'running'

Poll until the job is complete:

.. doctest::

>>> import time
>>> retry_count = 100
>>> while retry_count > 0 and job.state == 'running':
... retry_count -= 1
... time.sleep(10)
... job.reload() # API call
>>> job.state
'done'
>>> job.ended
datetime.datetime(2015, 7, 23, 9, 30, 21, 334792, tzinfo=<UTC>)
2 changes: 1 addition & 1 deletion gcloud/bigquery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""GCloud BigQuery API wrapper.
"""Google Cloud BigQuery API wrapper.
The main concepts with this API are:
Expand Down
2 changes: 1 addition & 1 deletion gcloud/pubsub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""GCloud Pubsub API wrapper.
"""Google Cloud Pubsub API wrapper.
The main concepts with this API are:
Expand Down

0 comments on commit 25545c8

Please sign in to comment.