Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake #5827

villebro · 2018-09-05T21:24:53Z

A continuation of PR #5686 for databases that have non-standard handling of column/alias names. The purpose of this PR is to make all engines 'just work' regardless of connector quirks or database restrictions. Based on available documentation and empirical experience, the following holds for dbapi1 query results used by Superset:

BigQuery: Column names can be no longer than 128 characters, must start with a letter and can contain only letters, numbers and underscores. Mixed case.
Redshift: Column names in query results are all lowercase, even if mixed-case alias is quoted.
Oracle: Column names have a maximum length 30 characters.
DB2: Same as Oracle.
Snowflake: Lowercase aliases that are unquoted result in all uppercase column names. In addition 256 character limit on column names.

This PR changes the following:

No functional changes to any existing databases; the PR only modifies behavior of the five databases mentioned above.
Centralizes all label mutation and quoting logic in /connectors/sqla/models.py, which modifies labels as necessary on the fly and returns a dataframe with the original column headers, irrespective of database type. All SQL Alchemy specific logic removed from viz.py.
Extends the work started in Field names in big query can contain only alphanumeric and underscore #5641 where engines with special restrictions for labels can provide a custom mutate_label method in db_engine_specs.py to change the label as needed. For BigQuery the logic is as follows:
1. If the label contains unsupported characters, replace all characters in violation with underscores and add an md5 hash to the end of the column to avoid collisions.
2. Return the new label from 1) if the new column name is less than 128 characters long, otherwise return only the hash.
Other engines, use a similar approach as described above: Redshift lowercases everything, while Oracle and DB2 restricts the column length to 30 characters. If a label is mutated, the original and mutated labels are listed in the View Query view.

Below some examples of before and after this PR.

BigQuery

Currently BigQuery mutates column names to comply with the minimum requirements. This causes funny column names where e.g. parentheses are replaced by underscores:

This PR changes the columns back to their original state:

Looking at the query one can see that a hash has been added to the mutated column to avoid collisions, .e.g. if the columns SUM(x)and SUM[x] are present.

Redshift

Currently Redshift shows all lowercase column names in tables:

On the other hand timeseries don't work at all:

After this PR timeseries work just fine:

Oracle

Currently column names that exceed 30 characters don't work:

This PR makes it possible to use arbitrarily long column names:

This is done by changing the column names that exceed 30 chars to the first 30 chars from a MD5 hash in the query:

Snowflake

Currently timeseries graphs don't work due to forced quotes being missing from temporal column names (my bad, I forgot to add it to the original PR):

After this PR timeseries work fine:

Db2

Not tested at all, but should work similar to Oracle.

codecov-io · 2018-09-05T22:43:25Z

Codecov Report

Merging #5827 into master will decrease coverage by 0.07%.
The diff coverage is 62.29%.

@@            Coverage Diff             @@
##           master    #5827      +/-   ##
==========================================
- Coverage   73.32%   73.25%   -0.08%     
==========================================
  Files          67       67              
  Lines        9604     9615      +11     
==========================================
+ Hits         7042     7043       +1     
- Misses       2562     2572      +10

Impacted Files	Coverage Δ
superset/db_engine_specs.py	`54.72% <28.57%> (-0.44%)`	⬇️
superset/viz.py	`72.04% <60%> (-0.2%)`	⬇️
superset/connectors/sqla/models.py	`77.91% <92%> (+0.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96f5106...9ea3c58. Read the comment docs.

minh5 · 2018-09-06T02:12:51Z

Hey @villebro I'm trying to run some tests and ran into some weird issues on Redshift. It's the same error I've been running into with Redshift aggregations.

2018-09-05 22:01:49,675:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://[email protected]:5439/testdb
2018-09-05 22:01:49,681:INFO:root:SELECT day AS __timestamp, COUNT(*) AS count
FROM test.sales_table
WHERE day >= '2018-08-29 00:00:00' AND day <= '2018-09-05 22:01:49' GROUP BY day ORDER BY count DESC
 LIMIT 10000
2018-09-05 22:01:49,692:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://[email protected]:5439/testdb
2018-09-05 22:01:51,754:DEBUG:root:[stats_logger] (incr) loaded_from_source
2018-09-05 22:01:51,791:INFO:werkzeug:127.0.0.1 - - [05/Sep/2018 22:01:51] "POST /superset/explore_json/ HTTP/1.1" 200 -
2018-09-05 22:01:51,826:DEBUG:root:[stats_logger] (incr) log
2018-09-05 22:01:51,829:INFO:werkzeug:127.0.0.1 - - [05/Sep/2018 22:01:51] "POST /superset/log/?slice_id=0 HTTP/1.1" 200 -
2018-09-05 22:02:02,313:DEBUG:root:[stats_logger] (incr) explore_json
2018-09-05 22:02:02,402:INFO:root:Cache key: a1a934775176c2934b028cad8f959e86
2018-09-05 22:02:02,404:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://cavagrill:[email protected]:5439/testdb
2018-09-05 22:02:02,413:INFO:root:SELECT day AS __timestamp, SUM(itemsales) AS "SUM(itemsales)"
FROM test.sales_table
WHERE day >= '2018-08-29 00:00:00' AND day <= '2018-09-05 22:02:02' GROUP BY day ORDER BY "SUM(itemsales)" DESC
 LIMIT 10000
2018-09-05 22:02:02,425:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://cavagrill:[email protected]:5439/testdb
2018-09-05 22:02:04,715:DEBUG:root:[stats_logger] (incr) loaded_from_source
2018-09-05 22:02:04,716:ERROR:root:'SUM(itemsales)'
Traceback (most recent call last):
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/views/core.py", line 1105, in generate_json
    payload = viz_obj.get_payload()
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 359, in get_payload
    payload['data'] = self.get_data(df)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 1243, in get_data
    df = self.process_data(df)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 1147, in process_data
    values=utils.get_metric_names(fd.get('metrics')))
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/pandas/core/frame.py", line 5303, in pivot_table
    margins_name=margins_name)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/pandas/core/reshape/pivot.py", line 61, in pivot_table
    raise KeyError(i)
KeyError: 'SUM(itemsales)'

Also here's a screen shot

villebro · 2018-09-06T04:18:48Z

Thanks @minh5 for testing, I'll make some adjustments and push through an update soon.

villebro · 2018-09-06T10:54:09Z

I read up on the Redshift dialect, can you give it another go @minh5?

minh5 · 2018-09-07T14:58:17Z

I may not be doing this right, I'm not too familiar with npm. But I got the same error

2018-09-07 10:53:42,410:INFO:root:SELECT day AS __timestamp, SUM(itemsales) AS "SUM(itemsales)"
FROM test.sales_table
WHERE day >= '2018-07-20 00:00:00' AND day <= '2018-09-07 10:53:42' GROUP BY day ORDER BY "SUM(itemsales)" DESC
 LIMIT 10000
2018-09-07 10:53:42,434:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://testuser:[email protected]:5439/testdb
2018-09-07 10:53:43,575:DEBUG:root:[stats_logger] (incr) loaded_from_source
2018-09-07 10:53:43,576:ERROR:root:'SUM(itemsales)'
Traceback (most recent call last):
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/views/core.py", line 1105, in generate_json
    payload = viz_obj.get_payload()
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 359, in get_payload
    payload['data'] = self.get_data(df)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 1243, in get_data
    df = self.process_data(df)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 1147, in process_data
    values=utils.get_metric_names(fd.get('metrics')))
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/pandas/core/frame.py", line 5303, in pivot_table
    margins_name=margins_name)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/pandas/core/reshape/pivot.py", line 61, in pivot_table
    raise KeyError(i)
KeyError: 'SUM(itemsales)'

Then there's this error when I run npm run dev-server

WARNING in ./node_modules/luma.gl/dist/esm/webgl-context/create-headless-context.js
Module not found: Error: Can't resolve 'gl' in '/Users/minhmai/incubator-superset/superset/assets/node_modules/luma.gl/dist/esm/webgl-context'
 @ ./node_modules/luma.gl/dist/esm/webgl-context/create-headless-context.js
 @ ./node_modules/luma.gl/dist/esm/webgl-context/index.js
 @ ./node_modules/luma.gl/dist/esm/webgl/functions.js
 @ ./node_modules/luma.gl/dist/esm/index.js
 @ ./node_modules/@deck.gl/layers/dist/esm/line-layer/line-layer.js
 @ ./node_modules/@deck.gl/layers/dist/esm/index.js
 @ ./node_modules/deck.gl/dist/esm/index.js
 @ ./src/visualizations/deckgl/layers/geojson.jsx
 @ ./src/visualizations/index.js
 @ ./src/modules/AnnotationTypes.js
 @ ./src/chart/chartAction.js
 @ ./src/dashboard/containers/Dashboard.jsx
 @ ./src/dashboard/index.jsx
 @ multi (webpack)-dev-server/client?http://localhost:9000 (webpack)/hot/dev-server.js babel-polyfill ./src/dashboard/index.jsx

WARNING in ./node_modules/luma.gl/dist/esm/webgl-utils/webgl-types.js
Module not found: Error: Can't resolve 'gl/wrap' in '/Users/minhmai/incubator-superset/superset/assets/node_modules/luma.gl/dist/esm/webgl-utils'
 @ ./node_modules/luma.gl/dist/esm/webgl-utils/webgl-types.js
 @ ./node_modules/luma.gl/dist/esm/webgl-utils/index.js
 @ ./node_modules/luma.gl/dist/esm/webgl/functions.js
 @ ./node_modules/luma.gl/dist/esm/index.js
 @ ./node_modules/@deck.gl/layers/dist/esm/line-layer/line-layer.js
 @ ./node_modules/@deck.gl/layers/dist/esm/index.js
 @ ./node_modules/deck.gl/dist/esm/index.js
 @ ./src/visualizations/deckgl/layers/geojson.jsx
 @ ./src/visualizations/index.js
 @ ./src/modules/AnnotationTypes.js
 @ ./src/chart/chartAction.js
 @ ./src/dashboard/containers/Dashboard.jsx
 @ ./src/dashboard/index.jsx
 @ multi (webpack)-dev-server/client?http://localhost:9000 (webpack)/hot/dev-server.js babel-polyfill ./src/dashboard/index.jsx
ℹ ｢wdm｣: Compiled with warnings.

villebro · 2018-09-07T15:41:51Z

@minh5 I'm also now getting some npm errors which I think are coming from master. Regarding testing on Redshift, I've spun up a Redshift cluster to make it easier to test, hoping to complete this feature during the weekend.

villebro · 2018-09-07T18:53:25Z

One question @minh5 , is Sum(itemsales) an adhoc metric or an "old school" metric (defined in the datasource)? And if you change the name manually to all lowercase sum(itemsales), does the error go away?

minh5 · 2018-09-07T19:10:05Z

Yea right now the SUM is an "old school" metric. I used to get around this issue by changing SUM to sum or make some ad hoc modification similar to this.

Here's a screenshot with changing the metric name

villebro · 2018-09-07T19:21:21Z

In this branch adhoc metrics should now work, but not old school ones. By the looks of it making the old school ones work automatically is slightly more challenging, as they don't have a separate label to override.

villebro · 2018-09-08T04:06:51Z

@minh5 Ready for another round of testing.

minh5 · 2018-09-09T02:15:32Z

Same error. The traceback is below. I'm pretty new to this but I just want to make sure my dev environment is set up. I'm running npm run dev-server and then superset runserver -d and run my graph on port 9000. However the traceback I see is in my virtualenv's pythonpath. Just wanted to make sure.

Traceback (most recent call last):
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/views/core.py", line 1105, in generate_json
    payload = viz_obj.get_payload()
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 359, in get_payload
    payload['data'] = self.get_data(df)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 1243, in get_data
    df = self.process_data(df)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/superset/viz.py", line 1147, in process_data
    values=utils.get_metric_names(fd.get('metrics')))
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/pandas/core/frame.py", line 5303, in pivot_table
    margins_name=margins_name)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/pandas/core/reshape/pivot.py", line 61, in pivot_table
    raise KeyError(i)
KeyError: 'SUM(itemsales)'

villebro · 2018-09-09T03:19:33Z

@minh5 Hmm, the stacktrace is referencing code that has changed in this branch. Are you sure you are using the timestamp_label branch? The npm part sounds right to me. Once this hopefully starts to work, can you also verify that the metric name is also showing up as SUM(itemsales) in the legend, not sum(itemsales) (lowercase)?

minh5 · 2018-09-11T22:08:03Z

Hey @villebro Got another error this time around

2018-09-11 18:04:30,711:ERROR:root:'tuple' object has no attribute 'lower'
Traceback (most recent call last):
  File "/Users/minhmai/incubator-superset/superset/viz.py", line 385, in get_df_payload
    df = self.get_df(query_obj)
  File "/Users/minhmai/incubator-superset/superset/viz.py", line 190, in get_df
    self.results = self.datasource.query(query_obj)
  File "/Users/minhmai/incubator-superset/superset/connectors/sqla/models.py", line 813, in query
    sql, mutated_labels = self.get_query_str(query_obj)
  File "/Users/minhmai/incubator-superset/superset/connectors/sqla/models.py", line 482, in get_query_str
    sql = self.database.compile_sqla_query(qry)
  File "/Users/minhmai/incubator-superset/superset/models/core.py", line 819, in compile_sqla_query
    compile_kwargs={'literal_binds': True},
  File "<string>", line 1, in <lambda>
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 442, in compile
    return self._compiler(dialect, bind=bind, **kw)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 448, in _compiler
    return dialect.statement_compiler(dialect, self, **kw)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 453, in __init__
    Compiled.__init__(self, dialect, statement, **kwargs)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 219, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 245, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch
    return meth(self, **kw)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 1785, in visit_select
    for name, column in select._columns_plus_names
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 1785, in <listcomp>
    for name, column in select._columns_plus_names
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 1557, in _label_select_column
    **column_clause_args
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch
    return meth(self, **kw)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 684, in visit_label
    self.preparer.format_label(label, labelname)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 3089, in format_label
    return self.quote(name or label.name)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 3062, in quote
    if self._requires_quotes(ident):
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/sqlalchemy/sql/compiler.py", line 3033, in _requires_quotes
    lc_value = value.lower()
AttributeError: 'tuple' object has no attribute 'lower'

villebro · 2018-09-11T22:22:13Z

@minh5 Sorry, just putting on the finishing touches, I think I just got the last bugs sorted. Feeling pretty confident about this PR, but wouldn't be surprised if there are still some small typos lurking somewhere.

mistercrunch · 2018-09-12T04:42:08Z

I'm not sure whether this is the right approach. There's a lot going on in here, and passing the mutated_labels dict around seems very hard to track and error-prone. Do we really need this reverse lookup?

villebro · 2018-09-12T06:21:31Z

@mistercrunch I agree that this might seem excessive, but let me explain the reasoning behind the changes:

The main argument is that viz.py should be independent of database type. In order to achieve this, any column names or labels in query_obj need to be in their original format. Let's assume that we have a metric with the label SUM(col). In the case of Redshift and BigQuery these would cause the following:

Redshift: the resulting dataframe will contain a column called sum(col) (gets automatically mutated by the database)
Bigquery: will fail unless the label is mutated.

Currently (in master) BigQuery solves this by replacing the parens with underscores, resulting in a column called SUM_col_. Aside from the mostly theoretical risk of collisions, this introduces a discrepancy between the dataframe and form_data/query_obj. Furthermore, the mapping from metric name to verbose name used by all charts doesn't work (data.verbose_map in /connectors/base/models.py) unless that is also made aware of the new mutated labels. The way I see it there are two ways around this:

Either the columns in the dataframe returned by /connectors/sqla/models.py need to be renamed to their original state prior to being passed to their respective Viz, or
The Viz need to be aware that some labels have changed (in this case SUM_col_ actually refers to SUM(col)).

What this PR attempts to do is move from 2) to 1), i.e. encapsulate all SQLA specific logic in /connectors/sqla/models.py. In this proposal this has been done by pushing around a dict that collects all mutated labels in a dict, and renames the dataframe columns to their original state prior to being returned. I agree that this looks clumsy, but seemed like the best solution at the time. This can probably be refactored to something more maintainable/understandable. Where this approach adds complexity to SQLA models logic, this decouples Viz logic completely from the database backend, which I think is a good thing.

While it might appear excessively complicated, I think the heterogeneous nature of the SQLA ecosystem seems to require a lot of flexibility from the backend to be able to conform to the quirks of every individual engine. However, if this still feels like the wrong approach I am open to suggestions.

villebro · 2018-09-12T09:37:59Z

Anyway, I'll park this for now. @minh5 the functionality should now be testable, would appreciate feedback on whether or not this works in your context.

minh5 · 2018-09-12T15:18:46Z

No problem, @villebro , I don't mind testing since I would really love for this bug to be ironed out. Right now I just have a very hacky way of dealing with Redshift data since my org only uses Redshift. However, running the latest test I've ran into this

2018-09-12 11:15:01,596:INFO:root:SELECT DATE_TRUNC('day', day) AT TIME ZONE 'UTC' AS __timestamp, SUM(itemsales) AS "sum(itemsales)"
FROM test.sales_table
WHERE day >= '2017-09-12 00:00:00' AND day <= '2018-09-12 00:00:00' GROUP BY DATE_TRUNC('day', day) AT TIME ZONE 'UTC' ORDER BY "sum(itemsales)" DESC
 LIMIT 10000
2018-09-12 11:15:01,622:INFO:root:Database.get_sqla_engine(). Masked URL: redshift+psycopg2://testuser:[email protected]:5439/testdb
2018-09-12 11:15:05,136:DEBUG:root:[stats_logger] (incr) loaded_from_source
2018-09-12 11:15:05,506:INFO:werkzeug:127.0.0.1 - - [12/Sep/2018 11:15:05] "POST /superset/explore_json/ HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/minhmai/incubator-superset/superset/models/core.py", line 1010, in wrapper
    value = f(*args, **kwargs)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/flask_appbuilder/security/decorators.py", line 52, in wraps
    return f(self, *args, **kwargs)
  File "/Users/minhmai/incubator-superset/superset/views/core.py", line 1180, in explore_json
    force=force)
  File "/Users/minhmai/incubator-superset/superset/views/core.py", line 1114, in generate_json
    return json_success(viz_obj.json_dumps(payload), status=status)
  File "/Users/minhmai/incubator-superset/superset/viz.py", line 444, in json_dumps
    sort_keys=sort_keys,
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/simplejson/__init__.py", line 399, in dumps
    **kw).encode(obj)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/simplejson/encoder.py", line 296, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/minhmai/.pyenv/versions/3.4.3/envs/superset/lib/python3.4/site-packages/simplejson/encoder.py", line 378, in iterencode
    return _iterencode(o, 0)
  File "/Users/minhmai/incubator-superset/superset/utils.py", line 380, in json_int_dttm_ser
    obj = datetime_to_epoch(obj)
  File "/Users/minhmai/incubator-superset/superset/utils.py", line 366, in datetime_to_epoch
    return (dttm - epoch_with_tz).total_seconds() * 1000
  File "pandas/_libs/tslibs/timestamps.pyx", line 311, in pandas._libs.tslibs.timestamps._Timestamp.__sub__

TypeError: Timestamp subtraction must have the same timezones or no timezones

villebro · 2018-09-12T15:44:47Z

@minh5 I'm thinking this error might be related to the timestamp with/without tz bug that's been reported by Redshift/Postgres users. Can you test other charts, e.g. table viz and such, that don't have a time dimension to them, or plot the line chart without a time grain, which I think someone reported works now?

mistercrunch · 2018-09-12T17:32:40Z

I think pushing that dict around is very error prone and super hard to reason about. It adds a layer on top of an already overloaded model and breaks all sorts of design patterns. Personally I think we need to go back to the design board on this one.

JamshedRahman · 2018-09-13T19:52:29Z

@villebro Where is this replacement happening in 'master'? Can you point me to the code please? I seem to be not able to find it. :-)

Currently (in master) BigQuery solves this by replacing the parens with underscores, resulting in a column called SUM_col_.

villebro · 2018-09-13T21:40:38Z

@mistercrunch I think the label mutation logic is sound, but do agree that the dict pushing is overly complicated. As the original code wasn't designed to handle this type of added complexity, some changes are probably inevitable, but should be less invasive than was proposed here. Will revert with a better proposal.

@JamshedRahman check the following lines: https://github.com/apache/incubator-superset/blob/7098ada8c5e241ba59b985478c1249da89b9b676/superset/db_engine_specs.py#L1384-L1394

villebro · 2018-11-30T11:24:00Z

@minh5 This WIP together with the fix from #6453 should now make life easier on Redshift. This PR has been in use in production for a few months and has been tested to work very reliably with Snowflake. BigQuery and Redshift have also been tested to work, although not as extensively. I also don't see any reason why Oracle and DB2 shouldn't now work (along with any other quirky dialect/engine), but haven't tested against them. If you have the time to give this a go I would be very thankful. If all is good I can make a last thorough check of the code before submitting this as a SIP, as I'm sure this will need to be thoroughly reviewed.

…iz.py

…in db_engine_specs

mistercrunch · 2018-12-11T07:08:28Z

This is well needed and looks good to me on a first pass. Adding a label about this being a bit hard to test, though outside of the platforms where this is needed it should be straightforward.

mistercrunch · 2018-12-11T07:17:00Z

superset/db_engine_specs.py

+    @staticmethod
+    def mutate_label(label):
+        """
+        Oracle 12.1 and earlier support a maximum of 30 byte length object names, which


OMG they finally fixed this in Oracle!!!!!!!!!!!!!!! I thought this day would never happen.

mistercrunch · 2018-12-11T07:18:35Z

superset/db_engine_specs.py

+    @staticmethod
+    def mutate_label(label):
+        """
+        Db2 for z/OS supports a maximum of 30 byte length object names, which usually


OMG db2 too does this? Wow.

mistercrunch

Did another pass and it LGTM. We may want another set of 👀 on this one though. Maybe @john-bodley or someone else from Airbnb

villebro · 2018-12-11T07:30:37Z

This is well needed and looks good to me on a first pass. Adding a label about this being a bit hard to test, though outside of the platforms where this is needed it should be straightforward.

I have actually tested this semi thoroughly on Snowflake, BigQuery, Redshift and Oracle, so the main stuff should work. But the devil is in the details. Will update the description to describe exactly what is going on and why and change to SIP today if needed.

villebro · 2018-12-11T12:59:43Z

Thanks for reviewing @mistercrunch . I changed the title to a SIP to highlight the fact that this is a fairly substantial change that might bring with it regressions. I also updated the original description (with pics!) to make it easier for @john-bodley or other reviewers to understand the reasoning behind the changes and see before/after.

mistercrunch · 2018-12-12T05:38:06Z

I'm thinking about merging this and shipping as 0.31.x which hasn't been cut yet. Let's get a review from someone at Airbnb first maybe though.

villebro · 2019-01-18T06:42:47Z

Ping @mistercrunch @john-bodley any chance of getting additional comments or merging this?

villebro changed the title ~~Add missing label quotes to sqla queries~~ [WIP] Add missing label quotes to sqla queries Sep 6, 2018

villebro mentioned this pull request Sep 6, 2018

Error when trying to visualize 'Time Series - Line Chart' -- Error message is unclear #5700

Closed

3 tasks

villebro changed the title ~~[WIP] Add missing label quotes to sqla queries~~ [WIP] Improve column/alias handling for case insensitive engines Sep 6, 2018

villebro mentioned this pull request Dec 10, 2018

Match viz dataframe column case to form_data fields for Snowflake, Oracle and Redshift #5487

Merged

villebro added 3 commits December 10, 2018 15:21

Conditionally mutate and quote sqla labels decouple sqla logic from v…

3241389

…iz.py

Prefix hashed label with underscore if bigquery label exceeds 128 chars

f3c6b1b

Add comments for label cache

242f829

villebro added 6 commits December 10, 2018 15:21

Rename to mutated_labels and simply

8c13b2f

Rename mutated_label to get_label and simplify make_label_compatible …

72b9d91

…in db_engine_specs

Add note about deterministic and unique mutated labels

de580ee

add hash to label that has been prefixed with underscore

d5f5a98

Fix PEP8 escape warning

ed50785

Fix DeckPathViz get_metric_label call

9ea3c58

mistercrunch added reviewable risk:hard-to-test The change will be hard to test labels Dec 11, 2018

mistercrunch reviewed Dec 11, 2018

View reviewed changes

mistercrunch approved these changes Dec 11, 2018

View reviewed changes

villebro changed the title ~~[WIP] Improve column/alias handling for case insensitive engines~~ [SIP-10] Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake Dec 11, 2018

villebro mentioned this pull request Dec 17, 2018

No combination of requirements work for Superset 0.29_ + Snowflake (because of Botocore) #6542

Closed

3 tasks

villebro changed the title ~~[SIP-10] Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake~~ Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake Dec 18, 2018

mistercrunch merged commit 7ee8afb into apache:master Jan 18, 2019

villebro mentioned this pull request Mar 7, 2019

BUG: Time series Line chart has broken #6986

Closed

1 task

villebro mentioned this pull request May 15, 2020

feat(query): remove redundant metric label truncation apache-superset/superset-ui#492

Merged

rumbin mentioned this pull request Jan 27, 2022

Snowflake: Inconsistent column name case #18085

Closed

3 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake #5827

Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake #5827

villebro commented Sep 5, 2018 •

edited

Loading

codecov-io commented Sep 5, 2018 •

edited

Loading

minh5 commented Sep 6, 2018 •

edited

Loading

villebro commented Sep 6, 2018

villebro commented Sep 6, 2018

minh5 commented Sep 7, 2018

villebro commented Sep 7, 2018

villebro commented Sep 7, 2018

minh5 commented Sep 7, 2018

villebro commented Sep 7, 2018

villebro commented Sep 8, 2018

minh5 commented Sep 9, 2018

villebro commented Sep 9, 2018 •

edited

Loading

minh5 commented Sep 11, 2018

villebro commented Sep 11, 2018

mistercrunch commented Sep 12, 2018

villebro commented Sep 12, 2018

villebro commented Sep 12, 2018

minh5 commented Sep 12, 2018

villebro commented Sep 12, 2018

mistercrunch commented Sep 12, 2018

JamshedRahman commented Sep 13, 2018

villebro commented Sep 13, 2018

villebro commented Nov 30, 2018

mistercrunch commented Dec 11, 2018

mistercrunch Dec 11, 2018

mistercrunch Dec 11, 2018

mistercrunch left a comment

villebro commented Dec 11, 2018

villebro commented Dec 11, 2018

mistercrunch commented Dec 12, 2018 •

edited

Loading

villebro commented Jan 18, 2019

Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake #5827

Improve support for BigQuery, Redshift, Oracle, Db2, Snowflake #5827

Conversation

villebro commented Sep 5, 2018 • edited Loading

BigQuery

Redshift

Oracle

Snowflake

Db2

codecov-io commented Sep 5, 2018 • edited Loading

Codecov Report

minh5 commented Sep 6, 2018 • edited Loading

villebro commented Sep 6, 2018

villebro commented Sep 6, 2018

minh5 commented Sep 7, 2018

villebro commented Sep 7, 2018

villebro commented Sep 7, 2018

minh5 commented Sep 7, 2018

villebro commented Sep 7, 2018

villebro commented Sep 8, 2018

minh5 commented Sep 9, 2018

villebro commented Sep 9, 2018 • edited Loading

minh5 commented Sep 11, 2018

villebro commented Sep 11, 2018

mistercrunch commented Sep 12, 2018

villebro commented Sep 12, 2018

villebro commented Sep 12, 2018

minh5 commented Sep 12, 2018

villebro commented Sep 12, 2018

mistercrunch commented Sep 12, 2018

JamshedRahman commented Sep 13, 2018

villebro commented Sep 13, 2018

villebro commented Nov 30, 2018

mistercrunch commented Dec 11, 2018

mistercrunch Dec 11, 2018

Choose a reason for hiding this comment

mistercrunch Dec 11, 2018

Choose a reason for hiding this comment

mistercrunch left a comment

Choose a reason for hiding this comment

villebro commented Dec 11, 2018

villebro commented Dec 11, 2018

mistercrunch commented Dec 12, 2018 • edited Loading

villebro commented Jan 18, 2019

villebro commented Sep 5, 2018 •

edited

Loading

codecov-io commented Sep 5, 2018 •

edited

Loading

minh5 commented Sep 6, 2018 •

edited

Loading

villebro commented Sep 9, 2018 •

edited

Loading

mistercrunch commented Dec 12, 2018 •

edited

Loading