Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MariaDB] Error 1366: Incorrect string value #1567

Closed
solracsf opened this issue Mar 13, 2022 · 21 comments
Closed

[MariaDB] Error 1366: Incorrect string value #1567

solracsf opened this issue Mar 13, 2022 · 21 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@solracsf
Copy link
Contributor

solracsf commented Mar 13, 2022

What happened:

While rsync from local disk to juicefs mount, it suddenly (after several hours) stopped with

Failed to sync with 11 errors: last error was: open /mnt/juicefs/folder/file.xls: input/output error

On the jfs log, i can find these:

juicefs[187516] <ERROR>: error: Error 1366: Incorrect string value: '\xE9sa sa...' for column `jfsdata`.`jfs_edge`.`name` at row 1
goroutine 43510381 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/juicedata/juicefs/pkg/meta.errno({0x2df8860, 0xc0251d34a0})
        /go/src/github.com/juicedata/juicefs/pkg/meta/utils.go:76 +0xc5
github.com/juicedata/juicefs/pkg/meta.(*dbMeta).doMknod(0xc0000e0c40, {0x7fca7e165300, 0xc00ed28040}, 0x399f6f, {0xc025268160, 0x1f}, 0x1, 0x1b4, 0x0, 0x0, ...)
        /go/src/github.com/juicedata/juicefs/pkg/meta/sql.go:1043 +0x29e
github.com/juicedata/juicefs/pkg/meta.(*baseMeta).Mknod(0xc0000e0c40, {0x7fca7e165300, 0xc00ed28040}, 0x399f6f, {0xc025268160, 0x1f}, 0xc0, 0x7b66, 0x7fca, 0x0, ...)
        /go/src/github.com/juicedata/juicefs/pkg/meta/base.go:594 +0x275
github.com/juicedata/juicefs/pkg/meta.(*baseMeta).Create(0xc0000e0c40, {0x7fca7e165300, 0xc00ed28040}, 0x26b5620, {0xc025268160, 0x2847500}, 0x8040, 0xed2, 0x8241, 0xc025267828, ...)
        /go/src/github.com/juicedata/juicefs/pkg/meta/base.go:601 +0x109
github.com/juicedata/juicefs/pkg/vfs.(*VFS).Create(0xc000140640, {0x2e90348, 0xc00ed28040}, 0x399f6f, {0xc025268160, 0x1f}, 0x81b4, 0x22a4, 0xc0)
        /go/src/github.com/juicedata/juicefs/pkg/vfs/vfs.go:357 +0x256
github.com/juicedata/juicefs/pkg/fuse.(*fileSystem).Create(0xc000153900, 0xc024980101, 0xc022a48a98, {0xc025268160, 0x1f}, 0xc022a48a08)
        /go/src/github.com/juicedata/juicefs/pkg/fuse/fuse.go:221 +0xcd
github.com/hanwen/go-fuse/v2/fuse.doCreate(0xc022a48900, 0xc022a48900)
        /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/opcode.go:163 +0x68
github.com/hanwen/go-fuse/v2/fuse.(*Server).handleRequest(0xc00179c000, 0xc022a48900)
        /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/server.go:483 +0x1f3
github.com/hanwen/go-fuse/v2/fuse.(*Server).loop(0xc00179c000, 0x20)
        /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/server.go:456 +0x110
created by github.com/hanwen/go-fuse/v2/fuse.(*Server).readRequest
        /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/server.go:323 +0x534
 [utils.go:76]

Nothing was logged at MariaDB side.

Environment:

  • juicefs version 1.0.0-beta2+2022-03-04T03:00:41Z.9e26080
  • Ubuntu 20.04
  • MariaDB 10.4
@solracsf solracsf added the kind/bug Something isn't working label Mar 13, 2022
@zhijian-pro
Copy link
Contributor

This filename may contain special characters \xE9.
You can rename the file and try cp again .

@solracsf
Copy link
Contributor Author

I'll check it. so there are incompatibilities between a standard FS (zfs in this case) and JuiceFS ? Is this documented ?

@solracsf
Copy link
Contributor Author

solracsf commented Mar 14, 2022

@zhijian-pro filename is école-200311.xlsx
Please note that MySQL DB/tables collation is utf8mb4_unicode_ci

@solracsf solracsf changed the title Error 1366: Incorrect string value with MariaDB [MySQL] Error 1366: Incorrect string value Mar 14, 2022
@zhijian-pro
Copy link
Contributor

We think this is a compatibility issue with mariadb, the same logic can run normally on mysql.
In addition, the encoding of jfs_edge after run juicefs format should be utf8mb4 COLLATE utf8mb4_bin , it is not clear why yours is different. This encoding set can write special characters normally on mysql, but not on mariadb, so we think this is a mariadb compatibility problem.

@solracsf
Copy link
Contributor Author

solracsf commented Mar 15, 2022

I can confirm, but server is set to that collation.

MariaDB [jfs_data]> SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION
    -> FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_NAME = 'jfs_edge';
+--------------+------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION    |
+--------------+------------+--------------------+
| jfs_data     | jfs_edge   | utf8mb4_unicode_ci |
+--------------+------------+--------------------+
1 row in set (0.014 sec)

Global configuration

[mysql]
default-character-set = utf8mb4

[mysqld]
collation-server = utf8mb4_unicode_ci
init-connect = 'SET NAMES utf8mb4'
character-set-server = utf8mb4

But this can be normally be override at table creation, or after, like here below:

MariaDB [jfs_data]> ALTER TABLE jfs_edge CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.023 sec)

MariaDB [jfs_data]> SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION
    -> FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_NAME = 'jfs_edge';
+--------------+------------+-----------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION |
+--------------+------------+-----------------+
| jfs_data     | jfs_edge   | utf8mb4_bin     |
+--------------+------------+-----------------+
1 row in set (0.002 sec)
Records: 0  Duplicates: 0  Warnings: 0

@zhijian-pro
Copy link
Contributor

We did something similar after creating the table, you can see the corresponding logic here. When I tested mariadb, the encoding was utf8mb4_bin as we expected

@solracsf
Copy link
Contributor Author

solracsf commented Mar 15, 2022

Yeah but even after manually changing the collation, the same error is produced, and column jfsdata.jfs_edge.name is now utf8mb4_bin...

image

@markus456 can we get your input here why MariaDB (10.4) is not behaving correctly here?

@solracsf solracsf changed the title [MySQL] Error 1366: Incorrect string value [MariaDB] Error 1366: Incorrect string value Mar 15, 2022
@zhijian-pro
Copy link
Contributor

We think this is a compatibility issue with mariadb, you can insert unicode characters into the jfs_edge table using your familiar programming language. Our test result is that the same is utf8mb4_bin encoding, mysql can be inserted successfully, but mariadb cannot be inserted successfully. So we think it's a compatibility issue with mariadb.

Below is the test using go language.
Before testing, please prepare mariadb and mysql databases, and execute the juicefs format command respectively. Then run the test program and you can see the output is

mysql test insert unicode char success
mariadb test insert unicode char error: Error 1366: Incorrect string value: '\xE9sa' for column `dev`.`jfs_edge`.`name` at row 1

This is the test program

package main

import (
	"database/sql"
	"fmt"
	"math/rand"
	"time"

	_ "github.com/go-sql-driver/mysql"
)

func main() {

	mariadb := "user:password@(ip:port)/database"
	mysql := "user:password@(ip:port)/database"

	_, err := testInsertUnicodeChar(mysql)
	if err != nil {
		fmt.Println("mysql test insert unicode char error:", err)
	} else {
		fmt.Println("mysql test insert unicode char success")
	}

	_, err = testInsertUnicodeChar(mariadb)
	if err != nil {
		fmt.Println("mariadb test insert unicode char error:", err)
	} else {
		fmt.Println("mariadb test insert unicode char success")
	}
}

func testInsertUnicodeChar(jdbcStr string) (int64, error) {
	db, err := sql.Open("mysql", jdbcStr)
	if err != nil {
		return 0, err
	}
	// unicode char
	s := "\xE9sa"

	// random seed
	rand.Seed(time.Now().UnixNano())

	res, err := db.Exec("INSERT INTO jfs_edge VALUES (?,?,?,?)", rand.Intn(1000), s, 1, 1)
	if err != nil {
		return 0, err
	}
	id, err := res.LastInsertId()
	if err != nil {
		return 0, err
	}
	return id, nil
}

@markus456
Copy link

@markus456 can we get your input here why MariaDB (10.4) is not behaving correctly here?

@solracsf I've got no idea. Wrong person by any chance?

@solracsf
Copy link
Contributor Author

solracsf commented Mar 15, 2022

Ok I found the root cause. MariaDB sets sql_mode to:

STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,

Once set to:

ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION (so, removing STRICT_TRANS_TABLES)

After that, executing your program, you get

mysql test insert unicode char success
mariadb test insert unicode char success

But i don't know how to solve this JuiceFS side...
See https://mariadb.com/kb/en/sql-mode/

@solracsf
Copy link
Contributor Author

solracsf commented Mar 15, 2022

Test also passes by replacing, in your program, line

mariadb := "user:password@(ip:port)/database"

by

mariadb := "user:password@(ip:port)/database?sql_mode='ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'"

@solracsf
Copy link
Contributor Author

Also, take this simple test:

CREATE DATABASE test;
CREATE TABLE test (test varchar(255));

with the default sql_mode:

mysql -e "USE test; INSERT INTO test VALUES ('🅰');"
ERROR 1366 (22007) at line 1: Incorrect string value: '\xF0\x9F\x85\xB0' for column `test`.`test`.`col1` at row 1

but this works as expected:

mysql -e "SET NAMES utf8mb4; USE test; INSERT INTO test VALUES ('🅰');"

@grooverdan
Copy link

grooverdan commented Mar 15, 2022

I found this bit of the code that should alter it.

If you create the table with:

MariaDB [test]> CREATE TABLE test (test varchar(255)) character set=utf8mb4;
Query OK, 0 rows affected (0.002 sec)

It works also. Note character set is the range of what it can store. Collation is its sort ordering. Focus on character set.

Per @zhijian-pro 's second comment, what is your show create table jfs_edge? You can manually alter it per the alter table from the code.

Do filesystems actually enforce a correct encoding of utf8 filenames? If not maybe varbinary(255) would be a better database choice.

@zhijian-pro
Copy link
Contributor

zhijian-pro commented Mar 16, 2022

@solracsf It can indeed be solved by changing the sql_mode.
You can add the sql_mode parameter after the metaurl, and then mount.
example:

juicefs mount "user:password@(ip:port)/database?sql_mode='ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'"  /test1

@zhijian-pro
Copy link
Contributor

@grooverdan the varbinary(255) is a good solution, we will upgrade it on later version.

@grooverdan
Copy link

Keep in mind the varbinary is 1 byte per char and the utf8mb4 was obviously 4 bytes per char. Note that the 255 length, for all its common usage, is as far as modern MySQL variants goes a rather arbitrary length (the maximum is 65,532) and the only small overhead is the number of bytes represent the chosen.

@solracsf
Copy link
Contributor Author

solracsf commented Mar 16, 2022

@grooverdan

Per @zhijian-pro 's second comment, what is your show create table jfs_edge?

Here it is:

MariaDB [jfs_data]> show create table jfs_edge;
+----------+---------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                           |
+----------+---------------------------------------------------------------------------+
| jfs_edge | CREATE TABLE `jfs_edge` (
  `parent` bigint(20) NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_bin NOT NULL,
  `inode` bigint(20) NOT NULL,
  `type` int(11) NOT NULL,
  UNIQUE KEY `UQE_jfs_edge_edge` (`parent`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+----------+---------------------------------------------------------------------------+
1 row in set (0.001 sec)

@mariadb-RoelVandePaar
Copy link

Ok I found the root cause.

Removing STRICT_TRANS_TABLES (https://mariadb.com/kb/en/sql-mode/#strict_trans_tables) simply turns off strict mode. It is not the root cause. MySQL only sets it as default starting with 5.7. It would be interesting to run the test against MySQL 5.6 and MySQL 5.7. MariaDB sets it as default already on 10.2 and possibly earlier.

@mariadb-RoelVandePaar
Copy link

I second comments by @grooverdan to be mindful of the length of the field chosen and possible implications (off-by-one, buffer overflows, variable char lengths, special char handling, translations of file names between different file systems, etc.). When it comes to filenames, this is an area test engineers (and likely hackers, regrettably) love for a reason. You will want to test this well.

@solracsf
Copy link
Contributor Author

Closed by #1762

@davies
Copy link
Contributor

davies commented Apr 13, 2022

@solracsf @grooverdan @mariadb-RoelVandePaar We have changed it to use varbinary(255), thanks for the feedback! For existing JuiceFS volumes, please use the latest JuiceFS from main branch and change the column manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants