Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Add tqdm progress bar for downloads #7552

Merged
merged 12 commits into from
Mar 28, 2019
Prev Previous commit
Next Next commit
tqdm import to pandas pattern
  • Loading branch information
JohnPaton committed Mar 27, 2019
commit 3c1f0081c2ebc25063933a6fb74a2e2a63b1d602
19 changes: 14 additions & 5 deletions bigquery/google/cloud/bigquery/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
except ImportError: # pragma: NO COVER
pandas = None

try:
import tqdm
except ImportError: # pragma: NO COVER
tqdm = None

from google.api_core.page_iterator import HTTPIterator

import google.cloud._helpers
Expand Down Expand Up @@ -1336,11 +1341,15 @@ def _to_dataframe_tabledata_list(self, dtypes):
frames = []

# report progress if tqdm installed
try:
from tqdm import tqdm
pbar = tqdm(desc="Downloading", total=self.total_rows, unit="rows")
except (ImportError, KeyError, TypeError):
pbar = None
pbar = None
if tqdm is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a progress_bar_type (string) parameter to to_dataframe and check that it also is not None.

It should default to None for now, until we update the magics module to support turning it off in the Context and also update the magics to use the tqdm_notebook option instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should default to None for now.

Actually, I think defaulting to 'tqdm' is fine. When @alixhami tests notebooks, he can set google.cloud.bigquery.table.tqdm = None like you do in your tests.

Copy link
Contributor

@alixhami alixhami Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should default to None because otherwise it will throw errors for users who don't have tqdm installed who aren't using the parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commits.

try:
pbar = tqdm.tqdm(
desc="Downloading", total=self.total_rows, unit="rows"
)
except (KeyError, TypeError):
# tqdm error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this comment to explain a little more why these errors might happen. And more importantly why we are letting them pass (because a broken progress bar shouldn't stop us from downloading results).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here was to protect from something like an interface change in tqdm. Indeed the fallback should just be to not show a progress bar.

pass

for page in iter(self.pages):
frames.append(self._to_dataframe_dtypes(page, column_names, dtypes))
Expand Down