Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to bind as SQLWCHAR #292

Open
kadler opened this issue Oct 3, 2022 · 20 comments
Open

Add feature to bind as SQLWCHAR #292

kadler opened this issue Oct 3, 2022 · 20 comments

Comments

@kadler
Copy link
Member

kadler commented Oct 3, 2022

Windows is very UTF-16 centric while Node.js is very UTF-8 centric. While it can also handle UTF-16, it does not deal with other encodings well at all (at least in the NAPI addon). As a result, the package has been built as UNICODE since July 2020. This has fixed many encoding issues, since various strings are now passed back and forth using UTF-16 without issue. However, when binding, SQL_CHAR, SQL_VARCHAR, and SQL_CLOB types are still bound as SQL_C_CHAR, not SQL_C_WCHAR. This works most of the time, but there have been many issues when databases are in the native Windows encoding (which outside of an experimental feature is never UTF-8). See #284 #290 and #291 for examples.

While some drivers have settings to use UTF-8 instead, that is driver-specific and not all do. It would be much better to either always bind as SQLWCHAR when UNICODE is defined (ie. Windows) or add a runtime option to enable it (enabled by default when UNICODE is?)

@fybo
Copy link

fybo commented Oct 17, 2022

I added this code in odbc_connection.cpp on line 4176 instead of usual SQL_CHAR process:

// Napi::String (char)
case SQL_CHAR :
case SQL_VARCHAR :
case SQL_LONGVARCHAR :
default:
// *******************
// Ajout DC
// *******************
static const wchar_t CP1252_UNICODE_TABLE[] =
L"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007"
L"\u0008\u0009\u000A\u000B\u000C\u000D\u000E\u000F"
L"\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017"
L"\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F"
L"\u0020\u0021\u0022\u0023\u0024\u0025\u0026\u0027"
L"\u0028\u0029\u002A\u002B\u002C\u002D\u002E\u002F"
L"\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037"
L"\u0038\u0039\u003A\u003B\u003C\u003D\u003E\u003F"
L"\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047"
L"\u0048\u0049\u004A\u004B\u004C\u004D\u004E\u004F"
L"\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057"
L"\u0058\u0059\u005A\u005B\u005C\u005D\u005E\u005F"
L"\u0060\u0061\u0062\u0063\u0064\u0065\u0066\u0067"
L"\u0068\u0069\u006A\u006B\u006C\u006D\u006E\u006F"
L"\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077"
L"\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F"
L"\u20AC\u0020\u201A\u0192\u201E\u2026\u2020\u2021"
L"\u02C6\u2030\u0160\u2039\u0152\u0020\u017D\u0020"
L"\u0020\u2018\u2019\u201C\u201D\u2022\u2013\u2014"
L"\u02DC\u2122\u0161\u203A\u0153\u0020\u017E\u0178"
L"\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7"
L"\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF"
L"\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7"
L"\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF"
L"\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7"
L"\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF"
L"\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7"
L"\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF"
L"\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7"
L"\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF"
L"\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7"
L"\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF";
int size = strlen((const char*)storedRow[j].char_data);
std::wstring unicode(size, ' ');
for (size_t i = 0; i < size; ++i){
unicode[i] = CP1252_UNICODE_TABLE[(uint8_t)storedRow[j].char_data[i]];
}
std::wstring_convert<std::codecvt_utf8<wchar_t>> unicode_to_utf8;
std::string unicodeencoded = unicode_to_utf8.to_bytes(unicode);
SQLCHAR* utf8encoded = (SQLCHAR*)(const_cast<char*>(unicodeencoded.c_str()));
// *******************
// Fin Ajout DC
// *******************

value = Napi::String::New(env, (const char*)utf8encoded, strlen((const char *)utf8encoded));
break;

Now it works fine !

Also, I changed the SQL_LONGVARBINARY process on line 3387 to be processed like a SQL_BINARY or a SQL_VARBINARY.

It works like a charm !

@stale
Copy link

stale bot commented Nov 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label Nov 16, 2022
@geofmigliacci
Copy link

Hi,
This will be added in an update ? I need it for one of my project.
Thank you.

@stale stale bot removed the stale This issue hasn't seen any interaction in 30 days. label Nov 17, 2022
@stale
Copy link

stale bot commented Jan 6, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label Jan 6, 2023
@stale stale bot closed this as completed Jan 20, 2023
@kadler kadler reopened this Jan 20, 2023
@stale stale bot removed the stale This issue hasn't seen any interaction in 30 days. label Jan 20, 2023
@TheCryptos
Copy link

Related to others issues (#284 , #290 , #291 ), we encounter the same problems here in our project. @kadler have you got an idea when it's going to be implemented ?

@SoporteMurgiverde
Copy link

Hello, do you need to recompile the code after change de .cpp?
I did the changes but the result it's wrong, I think is probably need recompiled it, but I don't sure. and if is necesary to do it, how do you do it?
thank you!

@stale
Copy link

stale bot commented Mar 26, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label Mar 26, 2023
@markdirish
Copy link
Contributor

Not stale

@stale stale bot removed the stale This issue hasn't seen any interaction in 30 days. label Mar 27, 2023
@stale
Copy link

stale bot commented Apr 26, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label Apr 26, 2023
@stale stale bot closed this as completed May 10, 2023
@markdirish
Copy link
Contributor

Not stale

@markdirish markdirish reopened this May 23, 2023
@stale stale bot removed stale This issue hasn't seen any interaction in 30 days. labels May 23, 2023
@stale
Copy link

stale bot commented Jun 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue hasn't seen any interaction in 30 days. label Jun 22, 2023
@CienappsDAG
Copy link

We use HFSQL database which uses native Windows encoding. Accented characters are lost. Big show stopper to work with NodeJS.

@stale stale bot removed the stale This issue hasn't seen any interaction in 30 days. label Jul 3, 2023
@TheCryptos
Copy link

TheCryptos commented Jul 4, 2023

@CienappsDAG We encountered the same issues in our project, here are how we solve it.

You need to get all columns of the table and iterate over it on the data type like so:

switch (column.dataType) {
  case -4:
  case -1:
  case 1:
  case 12:
    // Fixed issue when converting empty string
    // HFSQl odbc driver doesn't allow null
    // Converting to utf leads to null if string is empty

    concatQuery.push(
      `CASE WHEN TRIM(${column.name}) = '' THEN Char(32)+Char(32) ELSE CONVERT(${column.name
      } using 'utf8') END AS ${column.name}`
    );
    break;
  default:
    concatQuery.push(column.name);
    break;
}

Next, you need to decode the converted strings:

const enc = new TextDecoder("utf-8");

/**
 * Remove null bytes from a string
 * @param str - String to remove null bytes from
 * @returns
 */
export function removeNullBytesFromString(str: string): string {
  return str
    .split("")
    .filter((c) => c.charCodeAt(0))
    .join("");
}

/**
 * Format the object to remove null bytes and trim strings
 * @param obj - Object to format
 */
export function format(obj: any) {
  Object.keys(obj).forEach((key) => {
    if (obj[key] instanceof ArrayBuffer) {
      obj[key] = enc.decode(obj[key]);
    }
    if (typeof obj[key] === "string") {
      obj[key] = removeNullBytesFromString(obj[key]).trim();
    }
  });

Hope this works. If you have any questions, don't hesitate to ask me.

@CienappsDAG
Copy link

I tried your solution but i encountered the same issue in another layer.

The issue is that I have some accented characters in table and column names (I know it's very bad, gotta make due).

HFSQL doesn't provide a table like Information_Schema to query directly.

Therefore, I encounter the same encoding issue when using the odbc.tables() or odbc.columns() functions.

@CienappsDAG
Copy link

My solution for now is to revert back to version 1 of odbc unfortunately.

@louia
Copy link

louia commented Sep 27, 2023

Any news about this issue ?

@Relbot
Copy link

Relbot commented Nov 2, 2023

Would be awesome if this could be added :)

@JonathanCabezas
Copy link

I made a bruteforce fix in the PR #366

@kadler
Copy link
Member Author

kadler commented Nov 19, 2024

Here's how I think the solution would look like:

The main crux of the change is here: https://github.com/IBM/node-odbc/blob/main/src/odbc_connection.cpp#L3297-L3326 After the call to SQLDescribeCol, we'd check if the flag is enabled and if so, we'd check the described column type. If the column type is one of the character types, we convert it to the equivalent wide type, eg.

switch(column->DataType) {
    case SQL_CHAR:
        column->DataType = SQL_WCHAR;
        break;
     // check SQL_VARCHAR, SQL_LONGVARCHAR, etc
}

This should handle all the cases correctly, since we already handle the wide types below. The only thing is we need access to that flag.

From a user point of view, they would pass the option in on the connection or pool constructor, eg.

const connection = await odbc.connect({
    connectionString: 'DSN=MYDSN',
    utf16Binds: true, // not sure on what best name would be here
});

So then it's only a matter of figuring out how to store the flag so that it can be used later. It's likely a case of saving the option on the ODBCConnection object and then passing the option along to prepare_to_fetch and then on to bind_buffers, assuming that all of those callers have access to the ODBCConnection object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

11 participants