-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process: make mypy happy with type annotations #1036
Conversation
Several dependencies lack type annotations, or they don't advertise themselves as type-annotated. We do not want `mypy` to complain about these.
I tried to keep the |
@@ -144,7 +144,7 @@ def from_api_repr(cls, stats: Dict[str, str]) -> "DmlStats": | |||
|
|||
args = ( | |||
int(stats.get(api_field, default_val)) | |||
for api_field, default_val in zip(api_fields, cls.__new__.__defaults__) | |||
for api_field, default_val in zip(api_fields, cls.__new__.__defaults__) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is barfing because the Dict[str, str]
above mismatches the int
of the __defaults__
. Does the backend really return the counts as strings? That would seem odd, unless they are expecting to be returning > 2^64 values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the error here was Callable[...] does not have the attribute __defaults__
, or something like that. I actually expected mypy
to recognize that cls
is a namedtuple and its __new__()
has the attribute __defaults__
, but it seems that it did not for some reason.
Maybe if I explicitly annotate cls
, hm...
Update: Nope, sadly that did not work.
@@ -130,7 +130,7 @@ def _view_use_legacy_sql_getter(table): | |||
class _TableBase: | |||
"""Base class for Table-related classes with common functionality.""" | |||
|
|||
_PROPERTY_TO_API_FIELD = { | |||
_PROPERTY_TO_API_FIELD: Dict[str, Union[str, List[str]]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, the mix-and-match of strings / lists of strings here has always given me heartburn. I'm not surprised mypy
bitched about it, and frankly, giving it a union type doesn't make the pattern any better.
@@ -526,12 +538,12 @@ def _ensure_bqstorage_client( | |||
bqstorage_client = bigquery_storage.BigQueryReadClient( | |||
credentials=self._credentials, | |||
client_options=client_options, | |||
client_info=client_info, | |||
client_info=client_info, # type: ignore # (None is also accepted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update google-cloud-bigquery-storage
/ the generated clients to allow None for client_info
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears so, yes. The docstring says that None
is allowed, but mypy
complains that Optional[...ClientInfo]
is not acceptable here as an argument.
It's probably because the default value of client_info
is not a literal None
, but DEFAULT_CLIENT_INFO
instead.
@@ -1934,7 +1944,7 @@ def create_job( | |||
return self.load_table_from_uri( | |||
source_uris, | |||
destination, | |||
job_config=load_job_config, | |||
job_config=typing.cast(LoadJobConfig, load_job_config), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised the cast
is needed here. Would specifying the type of load_job_config
at assignment time do the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise for the other config objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that from_api_repr()
is defined on the base _JobConfig
class and its return type is annotated as _JobConfig
. Which makes sense on the base class, but in practice the actual cls
subclass is chosen here dynamically by introspecting job_config
, and mypy
cannot understand that.
Specifying the type of load_job_config
at assignment time would not help, because again a base type (_JobConfig
) is assigned to load_job_config
with a specific sub-type, and mypy
would still complain. Cast solves this.
The alternative could be changing the return type on the base class to include all sub-classes, but that doesn't feel right. It can get of of sync if a new sub-type is created, and a circular import would be needed.
Besides, mypy
would probably still complain if "Union[All, Config, Subtypes]"
is assigned to a specific sub-class type.
google/cloud/bigquery/client.py
Outdated
@@ -2064,15 +2077,19 @@ def get_job( | |||
timeout=timeout, | |||
) | |||
|
|||
return self.job_from_resource(resource) | |||
job_instance = self.job_from_resource(resource) # never an UnknownJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure about that?
I recall that UnknownJob
was introduced in list_jobs
because someone might have permission to know that a job happened but not fetch the details about it.
I'd be interested in a little experiment to see what happens when we try to call get_job
on one of those unknown jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I was under impression that job_from_resource()
is a helper that only returns UnknownJob
if somebody manually provides an "invalid"/incomplete resource dict.
However, if the backend can also return such resource (e.g. due to permissions), the assumption in the comment does not hold (although the return type in the docstring already excludes UnknownJob
).
Do you maybe remember the combination of permissions that result in an UnknownJob
?
(also, I might need assistance here, because my access to cloud console is now severely limited - I'll have to check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you maybe remember the combination of permissions that result in an UnknownJob?
(also, I might need assistance here, because my access to cloud console is now severely limited - I'll have to check)
It related to creating a job with one account and trying to list all jobs for the project with another account.
See: jobs.list
versus jobs.listAlll
here: https://cloud.google.com/bigquery/docs/access-control
Permission | Description |
---|---|
bigquery.jobs.list | List all jobs and retrieve metadata on any job submitted by any user.1 For jobs submitted by other users, details and metadata are redacted. |
bigquery.jobs.listAll | List all jobs and retrieve metadata on any job submitted by any user.1 |
1 For any job you create, you automatically have the equivalent of the bigquery.jobs.get and bigquery.jobs.update permissions for that job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tswast My access to Cloud Console is limited now and I cannot create an account anymore that only has bigquery.jobs.list
permission (and possibly a few others to access jobs on the project), while the current development account that I use has the bigquery.jobs.listAll
permission.
In other words, I don't think I can actually retrieve an UnknownJob
.
Could you run the below script and see what happens if we call get_job()
on an UnknownJob
?
unknown_jobs.py
from collections import Counter
from google.cloud import bigquery
from google.oauth2 import service_account
PROJECT_ID = "TODO: set"
DATASET_ID = "TODO: set"
TABLE_ID = "TODO: set"
MAIN_CRED_FILE = "TODO: /path/to/service_account.json"
# NOTE: This one must have "bigquery.jobs.list" permission, but NOT "bigquery.jobs.listAll".
# We want metadata of jobs submitted by other users redacted.
OTHER_CRED_FILE = "TODO: /path/to/other/service_account.json"
def main():
credentials = service_account.Credentials.from_service_account_file(MAIN_CRED_FILE)
client = bigquery.Client(credentials=credentials)
# Create table if needed.
table_name_full = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
table = client.create_table(table_name_full, exists_ok=True)
# Create at least one job (load job in this case).
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("user_id", "STRING"),
bigquery.SchemaField("name", "STRING"),
],
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
)
row = {"user_id": "1234567890", "name": "John"}
load_job = client.load_table_from_json([row], destination=table, job_config=job_config)
result = load_job.result()
print(result)
print("=" * 20)
# Using another client, use other users' jobs. The client should NOT have
# "bigquery.jobs.listAll" permission.
credentials = service_account.Credentials.from_service_account_file(OTHER_CRED_FILE)
other_client = bigquery.Client(credentials=credentials)
jobs_iter = other_client.list_jobs(
project=PROJECT_ID, max_results=10_000, all_users=True
)
retrieved_jobs = list(jobs_iter)
jobs_count = Counter(type(job) for job in retrieved_jobs)
print(jobs_count)
try:
unknown_job = next(
job for job in retrieved_jobs if isinstance(job, bigquery.UnknownJob)
)
except StopIteration:
print("No UnknownJobs received, cannot test get_job().")
return
unknown_job = other_client.get_job(unknown_job)
print(unknown_job._properties)
if __name__ == "__main__":
main()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I preemptively changed the return type annotation to also include UnknownJob
.
More recent releases of google-api-core have typing enabled.
Co-authored-by: Tim Swast <[email protected]>
A subtlety here is that type checks will only pass of As the time passes, this will eventually become a non-issue, but in the transitional period people might experience failures if they have less recent |
The Opentelemetry APi has changed from the minimum version the BigQuery client currently uses, we thus need to bound the maximum Opentelemetry version. In addition, that maximum version does not yet support type checks, thus it is ignored.
It appears some more adjustments are needed to opentelemetry tests (or code). Update: The reason why opentelemetry tests fail in Python 3.7+ is because only in Python 3.6 we pin the dependency to This restriction affects our fixtures in some way, causing the opentelemetry tests to fail if run in the entire test suite. A short term workaround could be pinning opentelemetry to |
🎉
|
* process: add mypy types check to nox sessions * Ignore type errors for not annotated modules Several dependencies lack type annotations, or they don't advertise themselves as type-annotated. We do not want `mypy` to complain about these. * Fix mypy complaints (batch 1) * Fix mypy complaints (batch 2) * Fix mypy complaints (batch 3) * Fix mypy false positive errors * Simplify external config options instantiation * Do not ignore api-core in type checks More recent releases of google-api-core have typing enabled. * Remove unneeded __hash__ = None lines * Use an alias for timeout type in client.py * Fix PathLike subscription error in pre-Python 3.9 * Fix a typo in docstring Co-authored-by: Tim Swast <[email protected]> * Add mypy to the list of nox sessions to run * Fix opentelemetry type error The Opentelemetry APi has changed from the minimum version the BigQuery client currently uses, we thus need to bound the maximum Opentelemetry version. In addition, that maximum version does not yet support type checks, thus it is ignored. * 🦉 Updates from OwlBot See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Exclude type-checking code from coverage * Fix patching opentelemetry tracer pvoider * Adjust get_job() return type, ignore opentelemetry Co-authored-by: Tim Swast <[email protected]> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Closes #987.
PR checklist: