Provide IMPORT DATA statement #476

sherman-the-tank · 2019-05-30T07:48:34Z

We want to support bulk load from the console. The statement could look like this

IMPORT DATA FROM // Will be executed in console

IMPORT DATA FROM SERVER // Will be executed on the query engine

spacewalkman · 2019-06-21T10:04:04Z

If this one is not taken, I want to give it a shot, The above pic is what's in my mind.

spacewalkman · 2019-06-21T10:34:58Z

IMPORT DATA FROM // Will be executed in console
IMPORT DATA FROM SERVER // Will be executed on the query engine

Some clarification needed here @sherman-the-tank :

We are supposed to import file in CSV format only when running in console, right?
When running in SERVER mode, Are we going to support BOTH csv & sst files format?

dangleptr · 2019-06-21T10:35:02Z

@darionyaphet has give a patch about download procedure.
There are some different points with your thoughts.

Meta server will control the whole process instead of GraphServer
All communication is through http.

dangleptr · 2019-06-21T10:37:11Z

You could go on the ingest procedure @spacewalkman

dangleptr · 2019-06-21T10:46:22Z

Because the patch has not been merged in, we could discuss which communication way is better, rpc or http?

spacewalkman · 2019-06-21T10:48:48Z

Meta server will control the whole process instead of GraphServer

Why MetaServer in charge? IMO, MetaServer is just a meta provider, should not involve in something like data manipulation procedure. It's QueryServer who receive the IMPORT request in the first place. Let GraphServer do it will keep MetaServer simple and tidy.

All communication is through http.

If it's all about communication, Why not let Thrift do it? Introducing an extra HTTP layer would suffer from security vulnerability and communication inefficiency.

WDYT? @dangleptr @darionyaphet

dangleptr · 2019-06-21T11:01:55Z

Why MetaServer in charge?

That's a good question. Not only download/ingest, some features in coming we could take into account too. For example, compaction, balance, snapshot etc.
IMO, all admin operations about storage servers should be in charged by Meta server. Because Meta server knows which storage server is still alive, and it has all information about each storage server.

Why not let Thrift do it?

For http, the only advantage is it could be accessed by different terminals, for example, web console.

darionyaphet · 2019-06-21T11:18:33Z

MetaServer hold the whole cluster view.

We will make sure the file number is same with nebula‘s partition and how to assign the SST Files for ingest.

spacewalkman · 2019-06-21T12:01:29Z

@darionyaphet @dangleptr
You both point out that MetaServer hold the information that need to do IMPORT, but that's doesn't prohibit it to tell someone else (like QueryEngine )that information to let them do the actual IMPORTing job.

dangleptr · 2019-06-24T04:34:41Z

@darionyaphet @dangleptr
You both point out that MetaServer hold the information that need to do IMPORT, but that's doesn't prohibit it to tell someone else (like QueryEngine )that information to let them do the actual IMPORTing job.

Not only the information, think about that some admin operations need a long procedure, maybe we want to record the state step by step, and do failover. For queryEngine, it has no states, no leader, if it crashed, we can not do failover with it.

spacewalkman · 2019-06-24T06:03:55Z

@darionyaphet @dangleptr
You both point out that MetaServer hold the information that need to do IMPORT, but that's doesn't prohibit it to tell someone else (like QueryEngine )that information to let them do the actual IMPORTing job.

Not only the information, think about that some admin operations need a long procedure, maybe we want to record the state step by step, and do failover. For queryEngine, it has no states, no leader, if it crashed, we can not do failover with it.

Fair enough

spacewalkman · 2019-06-24T06:23:45Z

Update concept pic to reflect the idea that MetaServer is in charge.

dangleptr · 2019-06-24T09:29:04Z

I have two question:

After typing the command "Import DATA xxx " in console, doest the console blocked?
Download And Ingest are two commands or one command for users?

spacewalkman · 2019-06-24T10:42:18Z

1.After typing the command "Import DATA xxx " in console, doest the console blocked?

IMO, It's a long running task, we should not block the console, but return a handle to periodically polling the task status. BUT it has following cons:

Normally, we like to notify user the progress(percentage). But that progress may be interleave with user's other conversation. Such as: use may issue use anothergrapsace and do some other query.
We need a way to abort the whole procedure, even if after use close the original console which has issue the IMPORT command(In blocking mode, we just ctrl-c to abort)

2.Download And Ingest are two commands or one command for users?

Download & Ingest are just 2 conceptual PHASE of the single IMPORT command. But make it two does no harm? WDYT?

dangleptr · 2019-06-25T02:20:12Z

Download & Ingest are just 2 conceptual PHASE of the single IMPORT command. But make it two does no harm? WDYT?

Currently, we'd better use two command to control the whole procedure.

We need a way to abort the whole procedure, even if after use close the original console which has issue the IMPORT command(In blocking mode, we just ctrl-c to abort)

Yes, we need this feature.

sherman-the-tank · 2019-06-30T03:05:22Z

Awesome discussion thread 👍 Way to go, guys!!

Here are some of my thoughts

IMPORT DATA ... statement has two modes:
- In LOCAL mode, the statement specifies a local CSV file path, the console will read the file and general bulk INSERT statements which will be sent to the Graph Engine to execute
- In SERVER mode, the execution process is very like @spacewalkman 's picture above. The IMPORT DATA statement will be sent to the Graph Engine, and the Graph Engine will contact Meta Service to kick off an asynchronous task (the task will orchestra the SST file download process. Other possible tasks include index repair and so on). The task ID will be returned to the console. The statement is non-blocking (We should never block in the distributed environment)
Users should be able to query the task list from the Meta Service using statement SHOW TASKS...
Users should be able to check the status for a specific task using statement SHOW TASK STATUS <taskid>

The last two points also apply to the index repair and other tasks

sherman-the-tank · 2019-06-30T03:07:54Z

Regard tasks, as soon as a task is created on the Meta Service, it is global, not associated with any space

Co-authored-by: endy.li <[email protected]>

sherman-the-tank added this to the v1_beta_release milestone May 30, 2019

dangleptr assigned spacewalkman and darionyaphet Jun 21, 2019

dangleptr added the feature label Jun 21, 2019

dangleptr mentioned this issue Jun 21, 2019

Implement download procedure #519

Merged

dangleptr closed this as completed Jun 24, 2019

dangleptr reopened this Jun 24, 2019

jude-zhu added the R201910_beta label Aug 2, 2019

dangleptr closed this as completed Aug 21, 2019

yixinglu pushed a commit to yixinglu/nebula that referenced this issue Mar 21, 2022

fix crash when the expression exceed the depth (vesoft-inc#476)

1f901a3

Co-authored-by: endy.li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide IMPORT DATA statement #476

Provide IMPORT DATA statement #476

sherman-the-tank commented May 30, 2019 •

edited by jude-zhu

Loading

spacewalkman commented Jun 21, 2019 •

edited

Loading

spacewalkman commented Jun 21, 2019

dangleptr commented Jun 21, 2019 •

edited

Loading

dangleptr commented Jun 21, 2019

dangleptr commented Jun 21, 2019

spacewalkman commented Jun 21, 2019 •

edited

Loading

dangleptr commented Jun 21, 2019 •

edited

Loading

darionyaphet commented Jun 21, 2019

spacewalkman commented Jun 21, 2019 •

edited

Loading

dangleptr commented Jun 24, 2019 •

edited

Loading

spacewalkman commented Jun 24, 2019

spacewalkman commented Jun 24, 2019

dangleptr commented Jun 24, 2019 •

edited

Loading

spacewalkman commented Jun 24, 2019 •

edited

Loading

dangleptr commented Jun 25, 2019

sherman-the-tank commented Jun 30, 2019 •

edited

Loading

sherman-the-tank commented Jun 30, 2019

Provide **IMPORT DATA** statement #476

Provide **IMPORT DATA** statement #476

Comments

sherman-the-tank commented May 30, 2019 • edited by jude-zhu Loading

spacewalkman commented Jun 21, 2019 • edited Loading

spacewalkman commented Jun 21, 2019

dangleptr commented Jun 21, 2019 • edited Loading

dangleptr commented Jun 21, 2019

dangleptr commented Jun 21, 2019

spacewalkman commented Jun 21, 2019 • edited Loading

dangleptr commented Jun 21, 2019 • edited Loading

darionyaphet commented Jun 21, 2019

spacewalkman commented Jun 21, 2019 • edited Loading

dangleptr commented Jun 24, 2019 • edited Loading

spacewalkman commented Jun 24, 2019

spacewalkman commented Jun 24, 2019

dangleptr commented Jun 24, 2019 • edited Loading

spacewalkman commented Jun 24, 2019 • edited Loading

dangleptr commented Jun 25, 2019

sherman-the-tank commented Jun 30, 2019 • edited Loading

sherman-the-tank commented Jun 30, 2019

Provide IMPORT DATA statement #476

Provide IMPORT DATA statement #476

sherman-the-tank commented May 30, 2019 •

edited by jude-zhu

Loading

spacewalkman commented Jun 21, 2019 •

edited

Loading

dangleptr commented Jun 21, 2019 •

edited

Loading

spacewalkman commented Jun 21, 2019 •

edited

Loading

dangleptr commented Jun 21, 2019 •

edited

Loading

spacewalkman commented Jun 21, 2019 •

edited

Loading

dangleptr commented Jun 24, 2019 •

edited

Loading

dangleptr commented Jun 24, 2019 •

edited

Loading

spacewalkman commented Jun 24, 2019 •

edited

Loading

sherman-the-tank commented Jun 30, 2019 •

edited

Loading