Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundsError Issue when converting to DataFrames regression #306 #328

Closed
pankgeorg opened this issue Sep 24, 2021 · 5 comments · Fixed by #346
Closed

BoundsError Issue when converting to DataFrames regression #306 #328

pankgeorg opened this issue Sep 24, 2021 · 5 comments · Fixed by #346

Comments

@pankgeorg
Copy link
Contributor

pankgeorg commented Sep 24, 2021

#306

ODBC v1.0.4, julia 1.7
The violating character is \xce\0

image

Branch jq/fixdec works normally.
"Select @@version" gives
Microsoft SQL Server 2014 - 12.0.2269.0 (X64) \n\tJun 10 2015 03:35:45 \n\tCopyright (c) Microsoft Corporation\n\tExpress Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)\n"

Update: The violating character is not \xce\0 that's what the branch jq/fixdec shows. The correct string is much weirder, it's

"Α ΕΛΕΥΘΕΡΩΝ ΕΠΑΓΓΕΛΜΑΤΩΝ ΑΘΗΝΩΝ" vs "A ΕΛΕΥΘΕΡΩΝ ΕΠΑΓΓΕΛΜΑΤΩΝ Α\xce\0". Collation is Greek_CI_AS and max_size = 50.
Most likely this is an encoding issue. The error mentions something about accessing positions 1-59 of string with 50 characters.

@quinnj
Copy link
Member

quinnj commented Sep 24, 2021

I've opened the PR again; but you're saying it doesn't fix the issue?

@pankgeorg
Copy link
Contributor Author

I've opened the PR again; but you're saying it doesn't fix the issue?

It does work with the PR, meaning it doesn't explode, but it returns a truncated version of the string. I think there is an underlying encoding issue.

@jakobnissen
Copy link

I've just hit this issue. The string that triggers it is "Lokalt prøvenr.: IKKE ANGIVET Uoplyst prøvetagningsdato Overføres til konto 65", and the index error is ERROR: BoundsError: attempt to access 81-element Vector{UInt8} at index [1:82]. Indeed, there are 82 codeunits in the string. So the issue must be that somehow, not enough space is allocated to the string.

@jakobnissen
Copy link

jakobnissen commented Jan 3, 2022

Bump - is there any MWE or similar I can provide to have this resolved?

It seems to be a matter of characters vs. codeunits. The aforementioned field in my database has VARCHAR(80), i.e. a cap of 80 characters. For some reason ODBC.jl treats this as a cap of 80 bytes instead. Since SQL uses UTF16 by default, shouldn't it be 160 bytes max?

Supporting this hypothesis, changing line 355 in dbinterface.jl from bytes = data[1:b.totallen] to bytes = data[1:min(length(data), b.totallen)] means it no longer errors. Instead, however, the string listed above is truncated to "Lokalt prøvenr.: IKKE ANGIVET Uoplyst prøvetagningsdato Overføres til konto \0". This is the first 80 bytes of the original string plus an errant null byte terminator.

@pankgeorg
Copy link
Contributor Author

I've been able to work around this by casting the result into a TEXT type on the SQL side:

SELECT CAST(name AS TEXT), .... FROM t_cust instead of SELECT name, .... FROM t_cust

This is not a solution (As you can't use sql's *) but it got me going for now!

quinnj added a commit that referenced this issue Aug 11, 2022
Fixes #328. Ok, a bit nasty, but here's the rundown:
  * This is mostly outlined [here](https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdata-function?view=sql-server-ver16#retrieving-data-with-sqlgetdata)
  * The issue is that for our result set, the cursor may be told that a give column has a certain size in bytes, particularly for variable-sized columns like VARCHAR
  * BUT, the driver then may have to "convert to the target type" and this conversion may result in the _actual_ data size being larger than we orignally allocated for
  * But wait, you may ask, don't we already account for this case [here](https://github.com/JuliaDatabases/ODBC.jl/blob/f38f771557a5763f777ed0f441c2cc5a72f41c70/src/utils.jl#L328) by resizing the buffer and calling SQLGetData again to retrieve the rest of the data?
  * Why yes, we do, but only for "long" data types, which is the primary use case for the multiple SQLGetData calls
  * So basically this is _another_ case where, totally depending on the db driver, we might end up with truncated data which will result in `API.SQL_SUCCESS_WITH_INFO` being returned from SQLGetData

Thankfully the fix is relatively easy: we just need to use our buffer-resizing branch if the data type is long OR SQLGetData returns this success with info status code.
quinnj added a commit that referenced this issue Aug 11, 2022
* Fix case when SQLGetData data size exceeds column size

Fixes #328. Ok, a bit nasty, but here's the rundown:
  * This is mostly outlined [here](https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdata-function?view=sql-server-ver16#retrieving-data-with-sqlgetdata)
  * The issue is that for our result set, the cursor may be told that a give column has a certain size in bytes, particularly for variable-sized columns like VARCHAR
  * BUT, the driver then may have to "convert to the target type" and this conversion may result in the _actual_ data size being larger than we orignally allocated for
  * But wait, you may ask, don't we already account for this case [here](https://github.com/JuliaDatabases/ODBC.jl/blob/f38f771557a5763f777ed0f441c2cc5a72f41c70/src/utils.jl#L328) by resizing the buffer and calling SQLGetData again to retrieve the rest of the data?
  * Why yes, we do, but only for "long" data types, which is the primary use case for the multiple SQLGetData calls
  * So basically this is _another_ case where, totally depending on the db driver, we might end up with truncated data which will result in `API.SQL_SUCCESS_WITH_INFO` being returned from SQLGetData

Thankfully the fix is relatively easy: we just need to use our buffer-resizing branch if the data type is long OR SQLGetData returns this success with info status code.

* bump some versions and update ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants