Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

Add gdrive for go-storage design #14

Merged
merged 5 commits into from
Jul 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions docs/rfcs/0-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
- Author: (fill me in with `name <mail>`, e.g., Xuanwo <[email protected]>)
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: [beyondstorage/go-service-gdrive#0](https://github.com/beyondstorage/go-service-gdrive/issues/0)
- Tracking Issue: [beyondstorage/go-service-gdrive#0](https://github.com/beyondstorage/go-service-gdrive/issues/0)

# RFC-0: <proposal name>

- Updates: (delete this part if not applicable)
- [RFC-20](./20-abc): Deletes something
- Updated By: (delete this part if not applicable)
- [RFC-10](./10-do-be-do-be-do): Adds something
- [RFC-1000](./1000-lalala): Deprecates this RFC

## Background

Explain why we are doing this.

Related issues and early discussions can be linked, but the RFC should try to be self-contained if possible.

## Proposal

<proposal's content>

## Rationale

<proposal's rationale content, other implementations>

Possible content:

- Design Principles
- Drawbacks
- Alternative implementations and comparison
- Possible Q&As

## Compatibility

<proposal's compatibility statement>

## Implementation

Explain what steps should be done to implement this proposal.
47 changes: 47 additions & 0 deletions docs/rfcs/14-gdrive-for-go-storage-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
- Author: Jun [email protected]
- Start Date: 2021-7-18
- RFC PR: [beyondstorage/go-service-gdrive#14](https://github.com/beyondstorage/go-service-gdrive/issues/14)
- Tracking Issue: [beyondstorage/go-service-gdrive#15](https://github.com/beyondstorage/go-service-gdrive/issues/15)

# RFC-14: Gdrive for go-storage design

## Background

Google drive API has so many different notions that differs from `go-storage`, and we have briefly discussed in [Gdrive use FileId to manipulate data instead of file name #11](https://github.com/beyondstorage/go-service-gdrive/issues/11). Now I would like to start a RFC so that we can make all things more clear.

In Google drive API, `FileID` is a critical attribute of a file(or directory). We will use it to manipulate data instead of by path. In fact, path is very trivial in gdrive, and we can create files with the same name in the same location. In other words, path can be duplicate in gdrive. This behavior can cause some problems to our path based API.

## Proposal

**We manually stipulate that every path is unique.**

When users try to call `Write` to an existing file, we update it's content instead of creating another file with the same name.

**We will do a conversion between path and `FileID`.**

In this way, every path can be converted to `FileID`, so we are able to build a good bridge between `go-storage` API and gdrive API.

**We will cache `path -> id` in memory with TTL.**

For performance reasons, we will cache the ids of the files as they are created, and we will only look up their ids when the cache expires.

## Implementation

When users try to call `Write("foo/bar/test.txt")`, we will do this:

First, we look up the `FileID` of `foo` in cache, and try to search it's `FileID` if it is expired. Then, we will do the same thing to `bar` and `test.txt`. Be aware that when we can not find the `FileId` of a directory, we won't continue to do the search to it's subdirectories. In this case, we can consider the file doesn't exist.

After that, there are two possibilities:

When `foo/bar/test.txt` doesn't exist, we will create folders one by one. At the same time, we will cache their `FileId`.

When `foo/bar/test.txt` already exist, then we will update it's content instead of creating another one.

Our significant point `pathToId` can be implement like this:

If the file is in the root folder, then we just do a simple search by using `drive.service.Files.List().Q(searchArgs).Do()`. The return value type is `*drive.File`, and it's attribute `ID` is what we need.

But if the file path is like `foo/bar/demo.txt`, it would be a little complex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the path is complex like a/b/c/d/e/f/h/q.txt, it looks like we need to repeat the search many times.

How about cache the path -> id in memory with TTL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @junaire for a look.

Copy link
Member Author

@junaire junaire Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll work on it. However, I don't have too much experience with this and I would like to ask if we should use some third library like go-cache? Or just simply use the map from the standard library?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the implementation, use a map with Mutex is OK for now.

We can implement the TTL logic later by https://github.com/dgraph-io/ristretto or other libs.


First, we get the `FileID` of directory `foo` like what we previously do, then we can use this `FileID` to list all of it's content. By this way, we can find a directory named `bar` and it's `FileID`. At last, we just repeat what we did before, and get the `FileID` we want.