Skip to content

Commit

Permalink
remove uuid (#253)
Browse files Browse the repository at this point in the history
* fix: negative hex
improve performance of checkVidFormat

* fix: remove uuid
  • Loading branch information
veezhang authored Dec 15, 2022
1 parent 1e1882a commit 9a04c22
Show file tree
Hide file tree
Showing 10 changed files with 82 additions and 63 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ schema:

* `index`: **Optional**. The column number in the CSV file. Started with 0. The default value is 0.
* `concatItems`: **Optional**. The concat item can be `string`, `int` or mixed. `string` represents a constant, and `int` represents an index column. Then connect all items.If set, the above `index` will have no effect.
* `function`: **Optional**. Functions to generate the VIDs. Currently, we only support function `hash` and `uuid`.
* `function`: **Optional**. Functions to generate the VIDs. Currently, we only support function `hash`.
* `type`: **Optional**. The type for VIDs. The default value is `string`.
* `prefix`: **Optional**. Add prefix to the original vid. When `function` is specified also, `prefix` is applied to the original vid before `function`.

Expand Down Expand Up @@ -271,7 +271,7 @@ schema:
function: hash
dstVID:
index: 1
function: uuid
function: hash
rank:
index: 2
props:
Expand Down Expand Up @@ -348,7 +348,7 @@ Take vertex course as example:
```csv
:LABEL,:VID,course.name,building.name:string,:IGNORE,course.credits:int
+,"hash(""Math"")",Math,No5,1,3
+,"uuid(""English"")",English,"No11 B\",2,6
+,"hash(""English"")",English,"No11 B\",2,6
```

##### LABEL (optional)
Expand All @@ -367,10 +367,10 @@ Indicates the column is the insertion (+) or deletion (-) operation.
:VID
123,
"hash(""Math"")",
"uuid(""English"")"
"hash(""English"")"
```

In the `:VID` column, in addition to the common integer values (such as 123), you can also use the two built-in functions `hash` and `uuid` to automatically generate the VID for the vertices (for example, hash("Math")).
In the `:VID` column, in addition to the common integer values (such as 123), you can also use the two built-in function `hash` to automatically generate the VID for the vertices (for example, hash("Math")).

> **NOTE**: The double quotes (") are escaped in the CSV file. For example, `hash("Math")` must be written as `"hash(""Math"")"`.

Expand Down
10 changes: 5 additions & 5 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ schema:
- `index`:**可选**。在 CSV 文件中的列标,从 0 开始计数。默认值 0。
- `concatItems`: **可选**. 连接项可以是`string`、`int`或者混合。`string`代表常量,`int`表示索引列。然后连接所有的项。如果设置了,上面的`index`将不生效。
- `function`:**可选**。用来生成 VID 时的函数, `hash` 和 `uuid` 两种函数可选
- `function`:**可选**。用来生成 VID 时的函数,支持 `hash` 函数
- `prefix`: **可选**。给 原始vid 添加的前缀,当同时指定了 `function` 时, 生成 VID 的方法是先添加 `prefix` 前缀, 再用 `function`生成 VID。
##### `schema.vertex.tags`
Expand Down Expand Up @@ -243,7 +243,7 @@ schema:
function: hash
dstVID:
index: 1
function: uuid
function: hash
rank:
index: 2
props:
Expand Down Expand Up @@ -320,7 +320,7 @@ example 中 course 点的示例:
```csv
:LABEL,:VID,course.name,building.name:string,:IGNORE,course.credits:int
+,"hash(""Math"")",Math,No5,1,3
+,"uuid(""English"")",English,"No11 B\",2,6
+,"hash(""English"")",English,"No11 B\",2,6
```
##### LABEL (可选)
Expand All @@ -339,10 +339,10 @@ example 中 course 点的示例:
:VID
123,
"hash(""Math"")",
"uuid(""English"")"
"hash(""English"")"
```
在 `:VID` 这列除了常见的整数值(例如 123),还可以使用 `hash` 和 `uuid` 两个内置函数来自动计算生成点的 VID(例如 hash("Math"))。
在 `:VID` 这列除了常见的整数值(例如 123),还可以使用 `hash` 内置函数来自动计算生成点的 VID(例如 hash("Math"))。
> 需要注意的是在 CSV 文件中对双引号(")的转义处理。如 `hash("Math")` 要写成 `"hash(""Math"")"`。
Expand Down
1 change: 1 addition & 0 deletions examples/v1/choose-hex.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
0x201,00102,3
0X202,02102,3
0x2af,0,3
-0X202,-02102,6
2 changes: 1 addition & 1 deletion examples/v1/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ files:
type: int
dstVID:
index: 1
# function: uuid
# function: hash
type: int
rank:
index: 2
Expand Down
3 changes: 0 additions & 3 deletions examples/v2/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,6 @@ files:
vertex:
vid:
index: 1
# function: uuid
tags:
- name: student
props:
Expand Down Expand Up @@ -344,7 +343,6 @@ files:
# function: hash
dstVID:
index: 1
# function: uuid
rank:
index: 2
props:
Expand Down Expand Up @@ -383,7 +381,6 @@ files:
# function: hash
dstVID:
index: 1
# function: uuid
rank:
index: 2
props:
Expand Down
24 changes: 12 additions & 12 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,21 @@ import (
"net/url"
"os"
"path/filepath"
"regexp"
"strings"
"time"

"github.com/vesoft-inc/nebula-importer/pkg/base"
ierrors "github.com/vesoft-inc/nebula-importer/pkg/errors"
"github.com/vesoft-inc/nebula-importer/pkg/logger"
"github.com/vesoft-inc/nebula-importer/pkg/picker"
"github.com/vesoft-inc/nebula-importer/pkg/utils"
"gopkg.in/yaml.v2"
)

const (
dbNULL = "NULL"
)

var (
reTimestampInteger = regexp.MustCompile(`^(0[xX][0-9a-fA-F]+|0[0-7]+|\d+)$`)
)

type NebulaClientConnection struct {
User *string `json:"user" yaml:"user"`
Password *string `json:"password" yaml:"password"`
Expand Down Expand Up @@ -546,10 +542,9 @@ func (v *VID) FormatValue(record base.Record) (string, error) {
func (v *VID) checkFunction(prefix string) error {
if v.Function != nil {
switch strings.ToLower(*v.Function) {
// FIXME: uuid is not supported in nebula-graph-v2, and hash returns int which is not the valid vid type.
case "", "hash", "uuid":
case "", "hash":
default:
return fmt.Errorf("Invalid %s.function: %s, only following values are supported: \"\", hash, uuid", prefix, *v.Function)
return fmt.Errorf("Invalid %s.function: %s, only following values are supported: \"\", hash", prefix, *v.Function)
}
}
return nil
Expand Down Expand Up @@ -628,11 +623,16 @@ func (r *Rank) validateAndReset(prefix string, defaultVal int) error {
return nil
}

var re = regexp.MustCompile(`^(0[xX][0-9a-fA-F]+|0[0-7]+|[+-]?\d+|hash\(".+"\)|uuid\(".+"\))$`)

func checkVidFormat(vid string, isInt bool) error {
if isInt && !re.MatchString(vid) {
return fmt.Errorf("Invalid vid format: %s", vid)
if isInt {
if utils.IsInteger(vid) {
return nil
}
vidLen := len(vid)
if vidLen > 8 /* hash("") */ && strings.HasSuffix(vid, "\")") && strings.HasPrefix(vid, "hash(\"") {
return nil
}
return fmt.Errorf("Invalid vid format: " + vid)
}
return nil
}
Expand Down
9 changes: 9 additions & 0 deletions pkg/config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1181,3 +1181,12 @@ func TestParseFunction(t *testing.T) {
})
}
}

func Benchmark_checkVidFormat(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = checkVidFormat("-0xfedcba9876543210", true)
_ = checkVidFormat("-076543210", true)
_ = checkVidFormat("-9876543210", true)
_ = checkVidFormat("hash(\"abcdefg\")", true)
}
}
4 changes: 3 additions & 1 deletion pkg/picker/converter-type.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ package picker
import (
"fmt"
"strings"

"github.com/vesoft-inc/nebula-importer/pkg/utils"
)

var (
Expand Down Expand Up @@ -113,7 +115,7 @@ func (tc TypeStringConverter) Convert(v *Value) (*Value, error) {
}

func (tc TypeTimestampConverter) Convert(v *Value) (*Value, error) {
if isUnsignedInteger(v.Val) {
if utils.IsUnsignedInteger(v.Val) {
return tc.fc.Convert(v)
}
return tc.fsc.Convert(v)
Expand Down
36 changes: 0 additions & 36 deletions pkg/picker/utils.go

This file was deleted.

46 changes: 46 additions & 0 deletions pkg/utils/string.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package utils

func IsInteger(s string) bool {
if s == "" {
return false
}
if s[0] == '+' || s[0] == '-' {
s = s[1:]
}
return IsUnsignedInteger(s)
}

func IsUnsignedInteger(s string) bool {
switch len(s) {
case 0:
return false
case 1:
return IsDigit(s[0])
case 2:
return IsDigit(s[0]) && IsDigit(s[1])
}
return isUnsignedIntegerSlow(s)
}

func isUnsignedIntegerSlow(s string) bool {
f := IsDigit
if len(s) > 2 && s[0] == '0' && (s[1] == 'x' || s[1] == 'X') {
s = s[2:]
f = IsHexDigit
}

for _, b := range []byte(s) {
if !f(b) {
return false
}
}
return true
}

func IsDigit(b byte) bool {
return '0' <= b && b <= '9'
}

func IsHexDigit(b byte) bool {
return IsDigit(b) || ('a' <= b && b <= 'f') || ('A' <= b && b <= 'F')
}

0 comments on commit 9a04c22

Please sign in to comment.