Skip to content

Commit

Permalink
[druid] fix 'Unorderable types' when col has nuls
Browse files Browse the repository at this point in the history
Error "unorderable types: str() < int()" occurs when grouping by a
numerical Druid colummn that contains null values.

* druid/pydruid returns strings in the datafram with NAs for nulls
* Superset has custom logic around get_fillna_for_col that fills in the
NULLs based on declared column type (FLOAT here), so now we have a mixed
bag of type in the series
* pandas chokes on pivot_table or groupby operations as it cannot sorts
mixed types

The approach here is to stringify and fillna('<NULL>') to get a
consistent series.
  • Loading branch information
mistercrunch committed Apr 6, 2018
1 parent 92230b8 commit 6aaab71
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions superset/connectors/druid/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1277,13 +1277,30 @@ def run_query( # noqa / druid
client.query_builder.last_query.query_dict, indent=2)
return query_str

@staticmethod
def homogenize_types(df, groupby_cols):
"""Converting all GROUPBY columns to strings
When grouping by a numeric (say FLOAT) column, pydruid returns
strings in the dataframe. This creates issues downstream related
to having mixed types in the dataframe
Here we replace None with <NULL> and make the whole series a
str instead of an object.
"""
for col in groupby_cols:
df[col] = df[col].fillna('<NULL>').astype(str)
return df

def query(self, query_obj):
qry_start_dttm = datetime.now()
client = self.cluster.get_pydruid_client()
query_str = self.get_query_str(
client=client, query_obj=query_obj, phase=2)
df = client.export_pandas()

df = self.homogenize_types(df, query_obj.get('groupby', []))

if df is None or df.size == 0:
raise Exception(_('No data was returned.'))
df.columns = [
Expand Down

0 comments on commit 6aaab71

Please sign in to comment.