Skip to content

Distribution framework for timeseries data access and distributed tasks

License

Notifications You must be signed in to change notification settings

fdieulle/Minotaur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minotaur

Distribution framework for timeseries data access and distributed tasks

Layers

|----------|--------|-----| | Recorder | Cursor | | |----------|--------| | | Column stream | | |-------------------| IDb | | Codec | | |-------------------| |

IO stream

Meta

  • One file by symbol.
  • Column's meta contains the Name, Type, and the timeline. (TotalSize or statitics ?)
  • Timeline indexer: Key[Symbol + Column] define the timeline and associated file by entry.

Read/Write concurrency access management

The concurrency access is mainly handled by the meta manager by locking file symbol by symbol. The collisions between reads, writes, and commit are handled by the FileTimeSeriesDb class.

Write

Insert new data

To insert new data for a symbol the Db create for you a ITimeSeriesRecorder. The recorder always works on the temporary folder, until you chooe to commit or revert.

Revert

The revert will simply remove the file generated by the recorder in the temporary folder.

Commit

If there is no merge to do with the existing data the file will be moved from the temporary folder to the data folder. If the new timeline overlaps the existing we will process to a merge.

Merge

The merge consumes many existing files and push the data sorted by their timestamp into new files. During this process all generated file will have the same size (500MB by default). Each reading file and new file will be locked(W) indepently during this process at the moment where they are used. If some readers are consuming a mergeable file the merge will wait with a timeout (todo: define the timeout). Once the timeout is hitted the file will be merge whathever is it's not unlock by readers. We prioritize the writes comparing to the reads. Because we never know if an unknown process has been put in pause for a while during a read. I prefer to crash this client and unlock all others.

Read

the db read access to the file by using a cursor. The cursor can access to many files depends of its requested time slice. the file are accessed in a lazy mode. each access will lock(R) the file. For each new file access the db checks it the meta has changed. That means that a write has been processed on the current symbol. if a change is detected, the meta are reloaded and the remaining timeslice is requested then continue the lazy loop through the new request.

Todo List

  • Move Meta data into a dedicated service and manage a synchronization point here to support multi threads and multi processes.
  • Complete the FileTimeSeries DB implementation by implementing Insert and Delete methods based on recorder and cursors.
  • Implement and use a multi-versions codec. Maybe found an entry point to let the user defines its own.
  • Raise a scan regularly to merge the timeseries on persisted files and optimize the btree and number of files.
  • Propose a remote implementation.
  • Propose a distributed implementation.

About

Distribution framework for timeseries data access and distributed tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages