Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting KeyError 'table.head' repeatedly when running validator.head() using runtime spark dataframe #5478

Closed
mramakrishnan12 opened this issue Jul 11, 2022 · 3 comments
Labels
triage Used by the GE core team to flag issues that were not yet triaged

Comments

@mramakrishnan12
Copy link

Describe the bug
Continuously getting KeyError when running validator.head(5)

860 for metric_configuration in metrics.values():
861     logger.warning(f"Printing metric_configuration {metric_configuration}") 
863 return {
864     metric_configuration.metric_name: resolved_metrics[metric_configuration.id]
865     for metric_configuration in metrics.values()
866 }

KeyError: ('table.head', 'batch_id=574f509507d4b19632b35f50cc8a275f', '04166707abe073177c1dd922d3584468')

To Reproduce
Below is my data source

datasources={
        "name": "test_spark_df",
        "class_name":"Datasource",
        "module_name":"great_expectations.datasource",
        "execution_engine":{
            "class_name": "SparkDFExecutionEngine",
            "module_name": "great_expectations.execution_engine",
        },
        "data_connectors":{
            "default_runtime_data_connector_name": {
                "class_name": "RuntimeDataConnector",
                "module_name": "great_expectations.datasource.data_connector",
                "batch_identifiers":  ["batch_id"],
            }
        },
}

image

df=spark.read.csv('****/provider.csv',header='true')
df.show(5)

batch_request_pass = RuntimeBatchRequest(
datasource_name="test_spark_df_2",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="df",
batch_identifiers={"batch_id": "df"},
runtime_parameters={"batch_data": df},  

)

context.create_expectation_suite(
expectation_suite_name="test_suite", overwrite_existing=True)

validator = context.get_validator(
batch_request=batch_request_pass, expectation_suite_name="test_suite")

validator.head()

KeyError: ('table.head', 'batch_id=574f509507d4b19632b35f50cc8a275f', '04166707abe073177c1dd922d3584468')

Expected behavior
Expected to the first few rows from the spark dataframe

Environment (please complete the following information):

  • Linux
  • Great Expectations, version 0.15.11

Additional context
Add any other context about the problem here.

@mramakrishnan12 mramakrishnan12 changed the title Getting KeyError repeatedly when running validator.head() Getting KeyError repeatedly when running validator.head() runtime spark dataframe Jul 11, 2022
@mramakrishnan12 mramakrishnan12 changed the title Getting KeyError repeatedly when running validator.head() runtime spark dataframe Getting KeyError: ('table.head' , 'batch_id=574f509507d4b19632b35f50cc8a275f') repeatedly when running validator.head() runtime spark dataframe Jul 11, 2022
@mramakrishnan12 mramakrishnan12 changed the title Getting KeyError: ('table.head' , 'batch_id=574f509507d4b19632b35f50cc8a275f') repeatedly when running validator.head() runtime spark dataframe Getting KeyError 'table.head' repeatedly when running validator.head() runtime spark dataframe Jul 11, 2022
@mramakrishnan12 mramakrishnan12 changed the title Getting KeyError 'table.head' repeatedly when running validator.head() runtime spark dataframe Getting KeyError 'table.head' repeatedly when running validator.head() using runtime spark dataframe Jul 11, 2022
@kyleaton kyleaton added the triage Used by the GE core team to flag issues that were not yet triaged label Sep 23, 2022
@rdodev
Copy link
Contributor

rdodev commented Mar 9, 2023

Hey @mramakrishnan12 if you remove the line validator.head() the issue should go away.

@alexsherstinsky
Copy link
Contributor

@mramakrishnan12 Thank you very much for reporting this issue. Much has changed since you first reported it (almost a year has passed), and right now the connecting to data API in GX has been significantly improved, making it easier than ever before to do so (using the new "Fluent Datasources" approach). Would it be possible for you to rerun your test case and let us know whether or not the error you have observed still occurs? In the later case, the full code with data for us to be able to reproduce would be greatly appreciated. Thank you -- and thank you for using Great Expectations!

@alexsherstinsky
Copy link
Contributor

@mramakrishnan12 I am closing this issue for now. Please feel free to reopen and let us know at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Used by the GE core team to flag issues that were not yet triaged
Projects
None yet
Development

No branches or pull requests

4 participants