Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
Some DB API 2.0 drivers use
pyformat
forparamstyle
. This means that queries should be parameterized with%s
for the placeholders, like this:The driver then performs "old-school" string interpolation:
Because of the percent interpolation, when SQL compiles a query for a database that uses
pyformat
orformat
, any percent symbols in the query are escaped by being replaced with%%
:https://github.com/sqlalchemy/sqlalchemy/blob/6888cf79db79d5e5660300ccf2a2a91f1eecf75f/lib/sqlalchemy/sql/compiler.py#L2652-L2653
For some reason we undo that process (introduced in #5178):
superset/superset/models/core.py
Lines 668 to 669 in 76d897e
The code above doesn't make sense. If SQLAlchemy is replacing
%
with%%
for databases wheredialect.identifier_preparer._double_percents
is true, why would we reverse it when we compile the query for the same databases?One clue can be found in another codepath were we compile the query, but that replacement is missing. In
values_for_column
:superset/superset/models/helpers.py
Lines 1380 to 1384 in 44690fb
@Vitor-Avila noticed that here, when the column is a calculated column containing a percent symbol, like:
Then the generated SQL being sent to the database is:
Note that the query above is completely valid and syntactically equivalent to the original one, so everything works as expected when we run it... except that in Druid, the query performs extremely poorly, compared to the one with a single percent. This suggests that the fix is to add
sql = sql.replace("%%", "%")
to thevalues_for_column
method as well.But again... why are we undoing what SQLAlchemy is doing?
Looking deeper into the problem, I found a bug in pydruid:
https://github.com/druid-io/pydruid/blob/1d72d26c3e14bc9a7c6725dfa877c98a7afbe6f3/pydruid/db/api.py#L430-L435
Note that in the code above, when no parameters are passed to
execute
— which is the case when Superset calls the method — the string interpolation never happens, because the SQL is returned early! This means that any escaped percent symbols (%%
) will not be unescaped to%
.I've fixed pydruid in druid-io/pydruid#317, and made a new release. This PR bumps the version to the fixed one, and removes the
sql = sql.replace("%%", "%")
logic completely. This way, when double percents are passed to pydruid, they will be unescaped by the driver, as expected.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
I tested with Postgres, since psycopg2 uses
pyformat
. Queries run as expected:ADDITIONAL INFORMATION