Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy connector returning incorrect case for quoted uppercase alias #73

Closed
mike-seekwell opened this issue Dec 18, 2018 · 8 comments
Labels

Comments

@mike-seekwell
Copy link

mike-seekwell commented Dec 18, 2018

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using (python --version)?
    3.7

  2. What operating system and processor architecture are you using (python -c 'import platform; print(platform.platform())')?
    Darwin-17.7.0-x86_64-i386-64bit

  3. What are the component versions in the environment (pip list)?
    Name Version Build Channel
    appnope 0.1.0 py37_0
    asn1crypto 0.24.0
    azure-common 1.1.16
    azure-storage-blob 1.4.0
    azure-storage-common 1.4.0
    backcall 0.1.0 py37_0
    blas 1.0 mkl
    bleach 3.0.2 py37_0
    boto3 1.9.67
    botocore 1.12.67
    ca-certificates 2018.03.07 0
    certifi 2018.11.29 py37_0
    cffi 1.11.5
    chardet 3.0.4
    cryptography 2.4.2
    cycler 0.10.0 py37_0
    dbus 1.13.2 h760590f_1
    decorator 4.3.0 py37_0
    docutils 0.14
    entrypoints 0.2.3 py37_2
    expat 2.2.6 h0a44026_0
    freetype 2.9.1 hb4e5f40_0
    future 0.17.1
    gettext 0.19.8.1 h15daf44_3
    glib 2.56.2 hd9629dc_0
    icu 58.2 h4b95b61_1
    idna 2.8
    ijson 2.3
    intel-openmp 2019.1 144
    ipykernel 5.1.0 py37h39e3cac_0
    ipython 7.2.0 py37h39e3cac_0
    ipython_genutils 0.2.0 py37_0
    ipywidgets 7.4.2 py37_0
    jedi 0.13.2 py37_0
    jinja2 2.10 py37_0
    jmespath 0.9.3
    jpeg 9b he5867d9_2
    jsonschema 2.6.0 py37_0
    jupyter 1.0.0 py37_7
    jupyter_client 5.2.4 py37_0
    jupyter_console 6.0.0 py37_0
    jupyter_core 4.4.0 py37_0
    keyring 17.0.0
    kiwisolver 1.0.1 py37h0a44026_0
    libcxx 4.0.1 hcfea43d_1
    libcxxabi 4.0.1 hcfea43d_1
    libedit 3.1.20170329 hb402a30_2
    libffi 3.2.1 h475c297_4
    libgfortran 3.0.1 h93005f0_2
    libiconv 1.15 hdd342a3_7
    libpng 1.6.35 ha441bb4_0
    libsodium 1.0.16 h3efe00b_0
    libxml2 2.9.8 hab757c2_1
    libxslt 1.1.32 hb819dd2_0
    lxml 4.2.5 py37hef8c89e_0
    markupsafe 1.1.0 py37h1de35cc_0
    matplotlib 3.0.2 py37h54f8f79_0
    mistune 0.8.4 py37h1de35cc_0
    mkl 2019.1 144
    mkl_fft 1.0.6 py37h27c97d8_0
    mkl_random 1.0.2 py37h27c97d8_0
    nbconvert 5.3.1 py37_0
    nbformat 4.4.0 py37_0
    ncurses 6.1 h0a44026_1
    notebook 5.7.4 py37_0
    numpy 1.15.4 py37hacdab7b_0
    numpy-base 1.15.4 py37h6575580_0
    openssl 1.1.1a h1de35cc_0
    pandas 0.23.4 py37h6440ff4_0
    pandoc 2.2.3.2 0
    pandocfilters 1.4.2 py37_1
    parso 0.3.1 py37_0
    pcre 8.42 h378b8a2_0
    pexpect 4.6.0 py37_0
    pickleshare 0.7.5 py37_0
    pip 18.1 py37_0
    prometheus_client 0.5.0 py37_0
    prompt_toolkit 2.0.7 py37_0
    ptyprocess 0.6.0 py37_0
    pyasn1 0.4.4
    pyasn1-modules 0.2.2
    pycparser 2.19
    pycryptodomex 3.7.2
    pygments 2.3.1 py37_0
    PyJWT 1.7.1
    pyOpenSSL 18.0.0
    pyparsing 2.3.0 py37_0
    pyqt 5.9.2 py37h655552a_2
    python 3.7.1 haf84260_7
    python-dateutil 2.7.5 py37_0
    pytz 2018.7 py37_0
    pyzmq 17.1.2 py37h1de35cc_0
    qt 5.9.7 h468cd18_1
    qtconsole 4.4.3 py37_0
    readline 7.0 h1de35cc_5
    requests 2.21.0
    s3transfer 0.1.13
    send2trash 1.5.0 py37_0
    setuptools 40.6.3 py37_0
    sip 4.19.8 py37h0a44026_0
    six 1.12.0 py37_0
    snowflake-connector-python 1.7.3
    snowflake-sqlalchemy 1.1.4
    sqlalchemy 1.2.15 py37h1de35cc_0
    sqlite 3.26.0 ha441bb4_0
    terminado 0.8.1 py37_1
    testpath 0.4.2 py37_0
    tk 8.6.8 ha441bb4_0
    tornado 5.1.1 py37h1de35cc_0
    traitlets 4.3.2 py37_0
    urllib3 1.24.1
    wcwidth 0.1.7 py37_0
    webencodings 0.5.1 py37_1
    wheel 0.32.3 py37_0
    widgetsnbextension 3.4.2 py37_0
    xz 5.2.4 h1de35cc_4
    zeromq 4.2.5 h0a44026_1
    zlib 1.2.11 h1de35cc_3

  4. What did you do?

sql = '''SELECT CC_CALL_CENTER_ID,
    CC_CITY AS "cc_city",
    CC_CITY AS "CC_CITYU",
    cc.CC_REC_END_DATE
FROM TPCDS_SF100TCL.CALL_CENTER as cc
LIMIT 100'''
 
res = engine.execute(sql)
print(res.keys())
  1. What did you expect to see?
    ['cc_call_center_id', 'cc_city', 'CC_CITYU', 'cc_rec_end_date']

  2. What did you see instead?
    ['cc_call_center_id', 'cc_city', 'cc_cityu', 'cc_rec_end_date']

Please see here for more details
https://support.snowflake.net/s/question/0D50Z00008hyC4pSAE/sqlalchemy-connector-returning-incorrect-case-upper-lower-for-queries

@smtakeda
Copy link
Contributor

@mike-seekwell it seems tough to address this, because Snowflake dialect may not detect the given name, e.g., CC_CITYU, is in case-insensitive or not. I would say this is a known limitation.

@mike-seekwell
Copy link
Author

mike-seekwell commented Dec 18, 2018

Any suggestions on a workaround? This is what I'm doing now, which works for my usecase, but I'm sure there are a different set of issues with it.

from sqlalchemy.sql.elements import quoted_name
def normalize_name(self, name):
    if name is None:
        return None
    if name.upper() == name:
        return name
    if name.upper() == name and not \
            self.identifier_preparer._requires_quotes(name.lower()):
        return name.lower()
    elif name.lower() == name:
        return quoted_name(name, quote=True)
    else:
        return name
import snowflake.sqlalchemy
snowflake.sqlalchemy.base.SnowflakeDialect.normalize_name = normalize_name

@villebro
Copy link

By using dbapi1 you should be able to get the correct case from your query (you seem to be using dbapi2). If you need further help let me know

@smtakeda
Copy link
Contributor

@mike-seekwell with the proposed change, lots of existing tests will fail, e.g., https://github.com/snowflakedb/snowflake-sqlalchemy/blob/master/test/test_core.py#L346 where the column names are expected to be in lower case.

A challenge is when running the following query, the column names are COL1 and COL1, which are not distinguishable by the clients:

select 1 as col1, 2 as 'COL1'

@smtakeda
Copy link
Contributor

@villebro could you elaborate it? I'm not sure how the clients can detect the cases.

@villebro
Copy link

@smtakeda if I remember correctly, the cursor description retains the correct case for all columns.

@smtakeda
Copy link
Contributor

Ok, I think I understand the issue here. A short answer is it cannot be addressed unless we have lower case metadata mode in the Snowflake DB.

Snowflake DB stores all object identifiers in upper case by default. But if the quotes are added, the cases are preserved. For example, the column names of the following query will be COL1, Col2 and COL3:

> create table tbl (col1 number, "Col2" number, "COL3" number);
> desc table tbl;
+------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+
| name | type         | kind   | null? | default | primary key | unique key | check | expression | comment |
|------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------|
| COL1 | NUMBER(38,0) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| Col2 | NUMBER(38,0) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| COL3 | NUMBER(38,0) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
+------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+

Note the table name tbl is stored in upper case, i.e., TBL.

>show tables like 'tbl';
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+
| created_on                      | name | database_name | schema_name | kind  | comment | cluster_by | rows | bytes | owner    | retention_time |
|---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------|
| Sun, 30 Dec 2018 10:12:00 -0800 | TBL  | TESTDB        | TESTSCHEMA  | TABLE |         |            |    0 |     0 | SYSADMIN | 1              |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+

This implementation is Snowflake DB's choice, and the SQL standard has no implementation details as long as the names are treated case insensitively by default.

On the other hand, SQLAlchemy expects the object identifiers are in lower case by default if not quoted. The above table can be referenced by tbl as well as col1, Col2, col3:

meta = MetaData()
tbl = Table(
    'tbl', meta,
     autoload=True, autoload_with=engine)
assert tbl.c.col1 is not None
assert tbl.c.Col2 is not None
assert tbl.c.col3 is not None

Then the following will raise an exception:

assert tbl.c.COL3 is not None

because Snowflake SQLAlchemy dialect does normalize and denormalize object identifiers:
https://github.com/snowflakedb/snowflake-sqlalchemy/blob/master/base.py#L501
https://github.com/snowflakedb/snowflake-sqlalchemy/blob/master/base.py#L512

In fact, this is exactly the same behaviors as Oracle:
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/dialects/oracle/base.py#L1153
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/dialects/oracle/base.py#L1167

Meanwhile Postgres and some others store them in lower cases.

Alternatively, Snowflake SQLAlchemy dialect could have had no normalize/denormalize functions so that the the following codes can pass:

assert tbl.c.COL1 is not None
assert tbl.c.Col2 is not None
assert tbl.c.COL3 is not None

But since we are sure this would break majority of existing SQLAlchemy applications migrated from other dbs, we didn't take the option.

@villebro
Copy link

It is my understanding that this can be closed, as the correct cases can be found from the cursor description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants