Skip to content

Commit

Permalink
Explain data structure change in 3.0 (#1119)
Browse files Browse the repository at this point in the history
* Explain data structure change in 3.0

* Update 4.storage-service.md

* Update 4.storage-service.md
  • Loading branch information
randomJoe211 authored Feb 24, 2022
1 parent 134624f commit 5ff6e1e
Showing 1 changed file with 15 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -70,37 +70,37 @@ Therefore, Nebula Graph develops its own KVStore with RocksDB as the local stora

- One Nebula Graph KVStore cluster supports multiple graph spaces, and each graph space has its own partition number and replica copies. Different graph spaces are isolated physically from each other in the same cluster.

<!--
## Data storage formats
## Data storage structure

Nebula Graph stores vertices and edges. Efficient property filtering is critical for a Graph Database. So, Nebula Graph uses keys to store vertices and edges, while uses values to store the related properties.
Graphs consist of vertices and edges. Nebula Graph uses key-value pairs to store vertices, edges, and their properties. Vertices and edges are stored in keys and their properties are stored in values. Such structure enables efficient property filtering.

Nebula Graph {{ nebula.base20 }} has changed a lot over its releases. The following will introduce the old and new data storage formats and cover their differences.
- The storage structure of vertices

- Vertex format
Different from Nebula Graph version 2.x, version 3.x added a new key for each vertex. Compared to the old key that still exists, the new key has no `TagID` field and no value. Vertices in Nebula Graph can now live without tags owing to the new key.

![The vertex format of storage service](https://docs-cdn.nebula-graph.com.cn/docs-2.0/1.introduction/2.nebula-graph-architecture/storage-vertex-format.png)
![The vertex structure of Nebula Graph](https://github.com/vesoft-inc/nebula-docs-cn/blob/{{nebula.branch}}/docs-2.0/1.introduction/3.nebula-graph-architecture/3.0-vertex-key.png?raw=true)

|Field|Description|
|:---|:---|
|`Type`|One byte, used to indicate the key type.|
|`PartID`|Three bytes, used to indicate the sharding partition and to scan the partition data based on the prefix when re-balancing the partition.|
|`VertexID`|Used to indicate vertex ID. For an integer VertexID, it occupies eight bytes. However, for a string VertexID, it is changed to `fixed_string` of a fixed length which needs to be specified by users when they create the space.|
|`VertexID`|The vertex ID. For an integer VertexID, it occupies eight bytes. However, for a string VertexID, it is changed to `fixed_string` of a fixed length which needs to be specified by users when they create the space.|
|`TagID`|Four bytes, used to indicate the tags that vertex relate with.|
|`SerializedValue`|The serialized value of the key. It stores the property information of the vertex.|

- Edge Format
- The storage structure of edges

![The edge format of storage service](https://docs-cdn.nebula-graph.com.cn/docs-2.0/1.introduction/2.nebula-graph-architecture/storage-edge-format.png)
![The edge structure of Nebula Graph](https://github.com/vesoft-inc/nebula-docs-cn/blob/{{nebula.branch}}/docs-2.0/1.introduction/3.nebula-graph-architecture/3.0-edge-key.png?raw=true)

|Field|Description|
|:---|:---|
|`Type`|One byte, used to indicate the key type.|
|`PartID`|Three bytes, used to indicate the sharding partition. This field can be used to scan the partition data based on the prefix when re-balancing the partition.|
|`VertexID`|Used to indicate vertex ID. The former VID refers to source VID in out-edge and dest VID in in-edge, while the latter VID refers to dest VID in out-edge and source VID in in-edge.|
|`Edge Type`|Four bytes, used to indicate edge type. Greater than zero means out-edge, less than zero means in-edge.|
|`PartID`|Three bytes, used to indicate the partition ID. This field can be used to scan the partition data based on the prefix when re-balancing the partition.|
|`VertexID`|Used to indicate vertex ID. The former VID refers to the source VID in the outgoing edge and the dest VID in the incoming edge, while the latter VID refers to the dest VID in the outgoing edge and the source VID in the incoming edge.|
|`Edge Type`|Four bytes, used to indicate the edge type. Greater than zero indicates out-edge, less than zero means in-edge.|
|`Rank`|Eight bytes, used to indicate multiple edges in one edge type. Users can set the field based on needs and store weight, such as transaction time and transaction number.|
|`PlaceHolder`|One byte. Reserved.|
-->
|`SerializedValue`|The serialized value of the key. It stores the property information of the edge.|

### Property descriptions

Expand All @@ -114,12 +114,11 @@ Since in an ultra-large-scale relational network, vertices can be as many as ten

![data partitioning](https://www-cdn.nebula-graph.com.cn/nebula-blog/DataModel02.png)

<!--
### Edge and storage amplification
### Edge partitioning and storage amplification

In Nebula Graph, an edge corresponds to two key-value pairs on the hard disk. When there are lots of edges and each has many properties, storage amplification will be obvious. The storage format of edges is shown in the figure below.

![edge storage](https://docs-cdn.nebula-graph.com.cn/docs-2.0/1.introduction/2.nebula-graph-architecture/two-edge-format.png)
![partitioning by edge](https://github.com/vesoft-inc/nebula-docs-cn/blob/{{nebula.branch}}/docs-2.0/1.introduction/3.nebula-graph-architecture/edge-division.png?raw=true)

In this example, ScrVertex connects DstVertex via EdgeA, forming the path of `(SrcVertex)-[EdgeA]->(DstVertex)`. ScrVertex, DstVertex, and EdgeA will all be stored in Partition x and Partition y as four key-value pairs in the storage layer. Details are as follows:

Expand Down

0 comments on commit 5ff6e1e

Please sign in to comment.