Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator throws KeyError 'table.head' while interactively creating Expectation Suite on a BigQuery datasource #5424

Closed
riwim opened this issue Jul 1, 2022 · 4 comments · Fixed by #5630
Labels
community devrel This item is being addressed by the Developer Relations Team

Comments

@riwim
Copy link

riwim commented Jul 1, 2022

Describe the bug
I am getting the same error as described in #3540 when interactively creating an Expectation Suite on a BigQuery datasource via CLI. As requested in the discussion, I am opening a new issue for this.

In the "Edit Your Expectation Suite" notebook provided by great_expectations suite new, the following function call throws an error:

validator.head(n_rows=5, fetch_all=False)

Thrown error:

KeyError                                  Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 validator.head(n_rows=5, fetch_all=False)

File some-path/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py:2146, in Validator.head(self, n_rows, domain_kwargs, fetch_all)
   2141 if domain_kwargs is None:
   2142     domain_kwargs = {
   2143         "batch_id": self.execution_engine.active_batch_data_id,
   2144     }
-> 2146 data: Any = self.get_metric(
   2147     metric=MetricConfiguration(
   2148         metric_name="table.head",
   2149         metric_domain_kwargs=domain_kwargs,
   2150         metric_value_kwargs={
   2151             "n_rows": n_rows,
   2152             "fetch_all": fetch_all,
   2153         },
   2154     )
   2155 )
   2157 df: pd.DataFrame
   2158 if isinstance(
   2159     self.execution_engine, (PandasExecutionEngine, SqlAlchemyExecutionEngine)
   2160 ):

File some-path/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py:891, in Validator.get_metric(self, metric)
    889 def get_metric(self, metric: MetricConfiguration) -> Any:
    890     """return the value of the requested metric."""
--> 891     return self.get_metrics(metrics={metric.metric_name: metric})[
    892         metric.metric_name
    893     ]

File some-path/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py:856, in Validator.get_metrics(self, metrics)
    848 """
    849 metrics: Dictionary of desired metrics to be resolved, with metric_name as key and MetricConfiguration as value.
    850 Return Dictionary with requested metrics resolved, with metric_name as key and computed metric as value.
    851 """
    852 resolved_metrics: Dict[Tuple[str, str, str], Any] = self.compute_metrics(
    853     metric_configurations=list(metrics.values())
    854 )
--> 856 return {
    857     metric_configuration.metric_name: resolved_metrics[metric_configuration.id]
    858     for metric_configuration in metrics.values()
    859 }

File some-path/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py:857, in <dictcomp>(.0)
    848 """
    849 metrics: Dictionary of desired metrics to be resolved, with metric_name as key and MetricConfiguration as value.
    850 Return Dictionary with requested metrics resolved, with metric_name as key and computed metric as value.
    851 """
    852 resolved_metrics: Dict[Tuple[str, str, str], Any] = self.compute_metrics(
    853     metric_configurations=list(metrics.values())
    854 )
    856 return {
--> 857     metric_configuration.metric_name: resolved_metrics[metric_configuration.id]
    858     for metric_configuration in metrics.values()
    859 }

KeyError: ('table.head', 'batch_id=15a077d486452b3e1c894458758b7972', '04166707abe073177c1dd922d3584468')

To Reproduce
Steps to reproduce the behavior:

  1. Initialize GE project
  2. Add a BigQuery datasource via great_expectations datasource new
  3. Create a new Expectation Suite via great_expectations suite new
  4. Choose Interactively and select your datasource and data asset
  5. Execute notebook code including the validator.head() call
  6. See error above

Expected behavior
Calling validator.head() should not raise a KeyError.

Environment

  • Operating System: MacOS 12.3.1
  • Great Expectations Version: 0.15.11

Additional context

I have examined the GCP logs in the period of the call of the validator.head() function. I exclude a permission error, because the used service account has maximum rights on used GCP project during debugging. However, errors occur here in the BigQuery service with the JobService.InsertJob method, which are not due to insufficient permissions:

"serviceName": "bigquery.googleapis.com",
"methodName": "google.cloud.bigquery.v2.JobService.InsertJob",
"authorizationInfo": [
  {
    "resource": "projects/my-project",
    "permission": "bigquery.jobs.create",
    "granted": true,
    "resourceAttributes": {}
  }
],

The error itself is reported in the response object jobStatus:

"jobStatus": {
  "errors": [
    {
      "code": 3,
      "message": "Cannot access field id on a value with type ARRAY<STRUCT<id STRING>> at [1:4656]"
    }
  ],
  "errorResult": {
    "message": "Cannot access field id on a value with type ARRAY<STRUCT<id STRING>> at [1:4656]",
    "code": 3
  },
  "jobState": "DONE"
},

Some fields of the table I use are nested fields. Does the validator have problems with these?

@talagluck
Copy link
Contributor

Hi @riwim - thanks for raising this. I would have expected that this would be due to issues with temp table creation (or table creation in the case of BigQuery). Could you please share the BatchRequest that you used with this?

@omerjakub
Copy link

omerjakub commented Nov 8, 2022

hello guys,
I get same error when I try to when I try to create suite on a clickhouse via cli:
image

does somebody have same issue ?

@talagluck
Copy link
Contributor

Hi @poohdini1994 - that makes sense, since we don't yet have full support for Clickhouse, and that metric is not yet implemented for Clickhouse.

@omerjakub
Copy link

hi @talagluck, do you know when it will be supported ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community devrel This item is being addressed by the Developer Relations Team
Projects
None yet
3 participants