Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive sync: Let's discuss it. #49

Closed
simc opened this issue Sep 12, 2019 · 23 comments
Closed

Hive sync: Let's discuss it. #49

simc opened this issue Sep 12, 2019 · 23 comments
Labels
enhancement New feature or request

Comments

@simc
Copy link
Member

simc commented Sep 12, 2019

In the future I want to support syncing Hive with a remote database.

It would be helpful if you could share your needs & ideas.

One of the most obvious use cases is backing up data (for example settings or messages etc.) I think Firebase should be one of the first supported remotes.

@simc simc added the enhancement New feature or request label Sep 12, 2019
@simc simc pinned this issue Sep 12, 2019
@ThinkDigitalSoftware
Copy link

I like the idea. If it has functions that need to be set up on initialization like FCM does, it would be easy for a custom solution to be added. But it still will require a lot of input from the user because of handling updating Hive when the app reopens but the remote db has changed, etc

@simc
Copy link
Member Author

simc commented Sep 13, 2019

But it still will require a lot of input from the user because of handling updating Hive when the app reopens but the remote db has changed, etc

Yes that's true. Much easier would be an implementation which just creates a backup of Hive.

The goal is to support full sync (including support for remote changes)

@ThinkDigitalSoftware
Copy link

ThinkDigitalSoftware commented Sep 13, 2019 via email

@simc
Copy link
Member Author

simc commented Sep 13, 2019

Yes, I'll start experimenting once queries and asset DBs are ready...

If anyone has time and wants to contribute, I'll be there to help.

@leedstyh
Copy link

Are s3, onedrive, google drive, dropbox on the plan?

@simc
Copy link
Member Author

simc commented Sep 13, 2019

I hope to make it very easy to write a sync solution for every service. This will take time tho.

@chemickypes
Copy link

You could provide an interface so every user can implement what he wants.

@simc
Copy link
Member Author

simc commented Sep 17, 2019

It would be great to know which information users actually need. Do you have something in mind?

@chemickypes
Copy link

chemickypes commented Sep 17, 2019

I'm thinking as I write, so forgive some reasoning holes.

My ideas is having a simple interface with 2 or 4 functions like these:

//to read and write only one element
bool writeToServer(String key, dynamic value, String boxName, Type runtimeType);
T readFromServer(String key, dynamic value, String boxName, Type runtimeType);


// to read and write all values in one time
bool writeToServer(Map<String, dynamic> maps, String boxName, Type runtimeType);
Map<String, dynamic> readFromServer(String boxName);

boxName parameter (and the others) can be useful to the user to know what service call.

I suppose that the first two functions can be used with lazy box.

Now user have to implement this interface and iinjects the implementation within hive, so hive can call this object to sync remotely.

This is just a raw idea.

PS. Sorry for my pseudocode

@simc
Copy link
Member Author

simc commented Sep 17, 2019

Looks good! Thanks.

@leedstyh
Copy link

leedstyh commented Sep 18, 2019

@chemickypes The question of this way is that we have to encrypt and decrypt the data both on client and server.

The best way is reuse the binary format of Hive as I post in this issue.

And also, if we sync to s3, onedrive, google drive, dropbox, we have to do the encryption the decryption on client.

@simc
Copy link
Member Author

simc commented Sep 18, 2019

@leedstyh
Since the data is stored unencrypted in memory, it will not be necessary to decrypt it before syncing.

I'm not sure about using the binary format. It would be necessary to run Hive on the server too since the binary format can only be used by Hive.

@leedstyh
Copy link

Nope, the server not necessary to run Hive. The server will not process the data, just store it. Think about syncing to google drive.

@chemickypes
Copy link

@leedstyh I think that we can split this problem into two smaller problems:

  • Use the remote server like an extension of locale hive
  • Use Hive like a delegate and its goal is get the data from the server when it needs, and the user will not distinguish where the data will come from because it has only one access point.

I don't know what @leisim wants to do with Hive.

@joeblew99
Copy link

If the goal is offline editing then we are into the CRDT and vector clocks territory maybe.

Typically the domain model needs to think in terms of Mutations or Ops. This works when no allowing offline editing is allowed because the Server Time is the Global TIME.
When you want to support offline editing you need a way to merge changes.
CRDT, OPS, and Vector clocks is this area. The changes are happening in different time domains now.

Here is something to get the ball rolling maybe..
It is a flutter example that has basic support for offline editing.
https://github.com/memspace/zefyr
It uses Operations and logs them.
But is does not have vector clock support.
The data model is using the quill approach.
https://github.com/memspace/zefyr/blob/master/packages/notus/lib/src/heuristics.dart#L6

https://github.com/pulyaevskiy/quill-delta-dart

  • this is where the real OT ( Operational Transform) guts is.

Now rather than use vector clocks, sometimes you can use Context within the data.
I think this is what zefyr uses, but am not sure.

@joeblew99
Copy link

I happened to stumble on this CRDT implemenation.

docs: https://cluster.ipfs.io/documentation/guides/consensus/
At the bottom its nice to see that the make the distinction between CRDT and RAFT properly.

It means that you can have data on the same types on many devices, and merge them independently.
You dont need to make OT's ( Operational Transforms ) which is very painful and limiting.

This is the Core lib.
https://github.com/ipfs/go-ds-crdt
That lib is used for IPFS Cluster to allow it to synchronise data.
https://github.com/ipfs/ipfs-cluster/blob/master/consensus/crdt/consensus.go

I really hope this is picked up with hive.

I think its an excellent basis for Hive Sync.

@simc
Copy link
Member Author

simc commented Sep 24, 2019

Thanks for your valuable input. I'll definitely take a look at these projects and try to implement something similar with hive.

@simc simc added this to the Stretch Goals milestone Oct 11, 2019
@simc simc added the hive label Oct 11, 2019
@simc simc unpinned this issue Oct 19, 2019
@zenkog
Copy link

zenkog commented Dec 3, 2019

Any plans to sync with Firestore? That would be awesome

@ghost
Copy link

ghost commented Dec 26, 2019

+1000 to this! #49 (comment)

@simc simc removed this from the Stretch Goals milestone Feb 7, 2020
@simc simc removed the hive label Feb 12, 2020
@Manuelbaun
Copy link

Manuelbaun commented Jul 3, 2020

are there any updates on this? For a uni project, I was looking into a synchronization layer design using CRDTs. Here is my repo in dart https://github.com/Manuelbaun/sync_layer_crdt_playground.

I actually use some form of delta-crdts, sending only the mutation instead of the operation or the full state. It works nicely, but it adds a lot of overhead. For time tracking, I use Hybrid logical clocks. In my use cases ( just prototyping) my design worked fine but has a lot of work todo. For instance, I am not deleting anything and garbage collection will be needed at some point.

any of cause, there are a lot of design issues 😆 and I didn't made it into a library just yet

@themisir themisir closed this as completed Aug 9, 2020
@RastislavMirek
Copy link

Is this still planned?

@CodingArcher
Copy link

Any further news on how the offline to online sync is coming along?
Has anyone been able to work something out?

@themisir
Copy link
Contributor

Any further news on how the offline to online sync is coming along? Has anyone been able to work something out?

There's no plans to implement this feature anytime soon. Why?

  1. It's best to keep things simple. It's nice to have something that does lots of things for you. But as system complexity increases it's hard to maintain them and also the software itself becomes "bloated" over time. Also having a Swiss army knife in a big projects usually causes issues because the bloated software usually has it's own workflow that doesn't play well with the existing software architecture. In general I'm currently preferring to keep current implementation stable rather than adding new features.

  2. Online sync implementation depends on how the server side is implemented and usually differs from project to project. Maybe one project need a static bearer key authentication while other one might require authentication token based on logged in user, or other one might consume XML while some can only consume JSON. Well we might provide library for back-end schema / rules, but it might not play well with the existing back-end in place.

  3. You can implement your own sync system based on whatever currently is available. You can either serialize data into json and send it to your server or send contents of database file. (I would prefer 1st option). It really ends up to your own design decision how you want to implement.

I'm open to feedback and suggestions on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants