Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: SYS TYPES adjustments for ODBC #30386

Closed
bpintea opened this issue May 4, 2018 · 4 comments
Closed

SQL: SYS TYPES adjustments for ODBC #30386

bpintea opened this issue May 4, 2018 · 4 comments

Comments

@bpintea
Copy link
Contributor

bpintea commented May 4, 2018

Elasticsearch version: master @ 65dbc17

The ODBC driver uses now the results return by SYS TYPES query to answer app's inquiries on column types. The following is a list of proposed changes that would be needed to stay inline with the ODBC spec:

  • PRECISION and AUTO_INCREMENT columns:
    These have been renamed in ODBC 3.x to COLUMN_SIZE (just like already returned by SYS COLUMNS) and AUTO_UNIQUE_VALUE respectively. The plugin currently returns the JDBC nomenclature. Not positive what we should do here: in ODBC no application should ever look at the column names, because ODBC specifies what should be returned in each column by order, not name. If this is the case also for JDBC, I'd recommend renaming to the ODBC standard.
  • DATE data type:
    • This type is currently assigned the value 93 in column DATA_TYPE. In both ODBC and JDBC specs, this value corresponds to a TIMESTAMP type (SQL_TYPE_TIMESTAMP). Moreover, ES/SQL does return a timestamp representation of it (ISO8601). So I'm not sure what we want to have here, but I believe we should either correct the value to 91 (SQL_TYPE_DATE) or rename the type to 'TIMESTAMP'.
    • The value for column PRECISION(/COLUMN_SIZE) for this type should be 20 (not 19). This is because this indicates the “number of characters in the character representation” and len("1953-09-02T00:00:00Z")==20 (interestingly, SYS COLUMNS does return value 20 for COLUMN_SIZE column already). Note: this might need to be adjusted to 24, if SQL: inconsistent date(time) format handling. #30002 is implemented; alternatively, if we choose not to implement that, we need to make sure we always return the date/time representation without the milliseconds component.
  • SQL_DATA_TYPE column:
    The values in this column are always 0. The JDBC doesn't make use of this column ("unused"). The ODBC spec mandates that “[t]his column is the same as the DATA_TYPE column, except for interval and datetime data types.”
    Now, we don't have interval types, so that's fine. For the DATE data type, this column should have the value 9 (ODBC's SQL_DATETIME value) and column SQL_DATETIME_SUB should have either value 3 (ODBC's SQL_CODE_TIMESTAMP value) or 1 (ODBC's SQL_CODE_DATE value), according to the decision for the point above. Codes are concisely explained here and the values lifted from the specs.
  • SQL_DATETIME_SUB column:
    This is currently null. The JDBC doesn't make use of the column ("unused"). For ODBC, it should be 0 for all types, except for DATE/TIMESTAMP where it should 3 or 1 (as per above).
  • AUTO_INCREMENT(/AUTO_UNIQUE_VALUE) column:
    The column is currently always null. However, SYS COLUMNS returns NO in the equivalent IS_AUTOINCREMENT column, instead of a corresponding empty string for the unknown/not_applicable case (as null indicates). I believe we should synchronise these two columns to return either null/"" or preferably false/NO.
  • MINIMUM_SCALE and MAXIMUM_SCALE columns:
    The values here are always null, while they should be 0 for the exact numeric types (the ints); also if DATE is transitioned fully to TIMESTAMP, it's corresponding values should be 0 or 3, depending on how SQL: inconsistent date(time) format handling. #30002 is implemented.
  • PRECISION column:
    For numeric types, this column should contain “the maximum number of digits used by the data type”. The values currently returned match the ODBC spec for all the integral types and for the FLOAT data type (7). For DOUBLE, it is 1 higher (16 returned vs 15 in the spec) and HALF_ and SCALED_FLOAT have no direct equivalent since their data type (6) is reserved for variable precision floats. So I just wanted to make sure that the non-aligned values are accurate (which is possible).
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@costin
Copy link
Member

costin commented May 5, 2018

  1. It might be a mistake in the JDBC spec but according to the javadocs, the column names are important.
    The same page lists at the top that:

Additional columns beyond the columns defined to be returned by the ResultSet object for a given method can be defined by the JDBC driver vendor and must be accessed by their column label.

Which further indicates that naming (as oppose to indexing) is important. As such, I propose to leave this as is.

  1. my understand of the spec is the type name should be that of the type inside the storage while the value that of the SQL spec.
    In ES, the data type is called date which can be confusing since it's really date time but regardless, that's its name and that's what's being returned. And the mapping to the SQL code seems correct (namely timestamp).

  2. confirmed the PRECISION (JDBC) /COLUMN_SIZE (ODBC) bug
    For date different values are returned since a date has a 19 precision (the timezone is always UTC) while a size of 19+1(the trailing Z). However the JDBC spec indicates that:

The PRECISION column represents the maximum column size that the server supports for the given datatype. For numeric data, this is the maximum precision. For character data, this is the length in characters. For datetime datatypes, this is the length in characters of the String representation (assuming the maximum allowed precision of the fractional seconds component).

In SYS COLUMNS the COLUMN_SIZE is returned not the PRECISION hence why 20 is returned.
It seems that for Date, the precision needs to be the column size so to avoid special handling, I've made both the precision and column size 20.

  1. Fixed SQL_DATA_TYPE values (please confirm in the PR that the values are correct). Do note that these values are returned as int - I think this is another mistake in the JDBC spec (as oppose to ODBC); however it is consistent - even internally the DATA_TYPE appears as a integer instead of a smallint/short so it's likely JDBC users expect this as such despite the value being a smallint/short.
    Please handle this in the driver.

  2. SQL_DATETIME_SUB
    Fixed this value as well (see the note above). To confirm, ODBC considers NULL as 0 correct?

  3. AUTO_INCREMENT and IS_AUTOINCREMENT are different

This is another example of JDBC vs ODBC. The first is declared by getTableTypes while the latter by getColumns.
From what I can tell, columns 19+ are JDBC specific and do not appear in ODBC. Further more, the JDBC spec indicates that NO should be used as oppose to empty string (which means no idea).

  1. PRECISION
    I've fixed the precision for double (I was rounding instead of flooring the number).
    Regarding HALF/SCALED floats I wonder whether we should treat them as DECIMALS or NUMERIC types . This might better differentiate them from ODBC FLOAT/REAL.

@costin costin removed their assignment May 5, 2018
costin added a commit to costin/elasticsearch that referenced this issue May 7, 2018
Tweak the return data, in particular with regards for ODBC columns to
better align with the spec

Fix elastic#30386
@bpintea
Copy link
Contributor Author

bpintea commented May 7, 2018

  1. To confirm, ODBC considers NULL as 0 correct?

Not as a general rule. Split answer:

  • for columns type support, yes, the driver will convert a null to 0/false; so wherever there's a null value for an int column in SYS TYPES answers (eg. AUTO_INCREMENT), the driver will return 0/false when the client app inquires about a column data type (ex. if column is auto-increment'able).
  • null has an equivalence to empty string for catalog ODBC functions (where not otherwise specified).

But generally, null has the usual significance, no value available, and this is conveyed to the client app through interface specific means (the app can specify a buffer where the driver will write if a cell is null, which is different from the buffer that would hold the actual value).

  1. AUTO_INCREMENT and IS_AUTOINCREMENT are different

This is another example of JDBC vs ODBC. The first is declared by getTableTypes while the latter by getColumns.
From what I can tell, columns 19+ are JDBC specific and do not appear in ODBC. Further more, the JDBC spec indicates that NO should be used as oppose to empty string (which means no idea).

Sure, that's clear. The suggestion I was trying to make: if getColumns()/IS_AUTOINCREMENT returns non-empty - i.e. it has an idea -, then getTypeInfo()/AUTO_INCREMENT should probably also have an idea (and thus return non-null, i.e. false).
Only so that the answers are consistent. The end result, AFA the driver is concerned, is going to be the same.

  1. Regarding HALF/SCALED floats I wonder whether we should treat them as DECIMALS or NUMERIC types .

The DECIMAL and NUMERIC are exact types.
I think the current mapping should be correct.

@costin
Copy link
Member

costin commented May 7, 2018

Not as a general rule.

Hmm - so it depends. It means I'll keep pester you about it :)

Sure, that's clear. The suggestion I was trying to make: if getColumns()/IS_AUTOINCREMENT returns non-empty - i.e. it has an idea -, then getTypeInfo()/AUTO_INCREMENT should probably also have an idea (and thus return non-null, i.e. false).
Only so that the answers are consistent. The end result, AFA the driver is concerned, is going to be the same.

Ah, I see. I've updated AUTO_INCREMENT to return false (which is also consistent with the JDBC API).

The DECIMAL and NUMERIC are exact types.

Hmmm, This means it can be applied to SCALED float but not sure about the HALF one.

This note might be relevant:

[4] SQL_DECIMAL and SQL_NUMERIC data types differ only in their precision. The precision of a DECIMAL(p,s) is an implementation-defined decimal precision that is no less than p, whereas the precision of a NUMERIC(p,s) is exactly equal to p.
Got it - I've changed this to false (which is actually correct as the spec indicates a primitive should be returned).

costin added a commit that referenced this issue May 11, 2018
Tweak the return data, in particular with regards for ODBC columns to
better align with the spec
Fix order for SYS TYPES and TABLES according to the JDBC/ODBC spec

Fix #30386
Fix #30521
costin added a commit that referenced this issue May 11, 2018
Tweak the return data, in particular with regards for ODBC columns to
better align with the spec
Fix order for SYS TYPES and TABLES according to the JDBC/ODBC spec

Fix #30386
Fix #30521
fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants