Distribution framework for timeseries data access and distributed tasks
|----------|--------|-----| | Recorder | Cursor | | |----------|--------| | | Column stream | | |-------------------| IDb | | Codec | | |-------------------| |
IO stream |
---|
- One file by symbol.
- Column's meta contains the Name, Type, and the timeline. (TotalSize or statitics ?)
- Timeline indexer: Key[Symbol + Column] define the timeline and associated file by entry.
The concurrency access is mainly handled by the meta manager by locking file symbol by symbol.
The collisions between reads, writes, and commit are handled by the FileTimeSeriesDb
class.
To insert new data for a symbol the Db create for you a ITimeSeriesRecorder
.
The recorder always works on the temporary folder, until you chooe to commit or revert.
The revert will simply remove the file generated by the recorder in the temporary folder.
If there is no merge to do with the existing data the file will be moved from the temporary folder to the data folder. If the new timeline overlaps the existing we will process to a merge.
The merge consumes many existing files and push the data sorted by their timestamp into new files. During this process all generated file will have the same size (500MB by default). Each reading file and new file will be locked(W) indepently during this process at the moment where they are used. If some readers are consuming a mergeable file the merge will wait with a timeout (todo: define the timeout). Once the timeout is hitted the file will be merge whathever is it's not unlock by readers. We prioritize the writes comparing to the reads. Because we never know if an unknown process has been put in pause for a while during a read. I prefer to crash this client and unlock all others.
the db read access to the file by using a cursor. The cursor can access to many files depends of its requested time slice. the file are accessed in a lazy mode. each access will lock(R) the file. For each new file access the db checks it the meta has changed. That means that a write has been processed on the current symbol. if a change is detected, the meta are reloaded and the remaining timeslice is requested then continue the lazy loop through the new request.
- Move Meta data into a dedicated service and manage a synchronization point here to support multi threads and multi processes.
- Complete the FileTimeSeries DB implementation by implementing Insert and Delete methods based on recorder and cursors.
- Implement and use a multi-versions codec. Maybe found an entry point to let the user defines its own.
- Raise a scan regularly to merge the timeseries on persisted files and optimize the btree and number of files.
- Propose a remote implementation.
- Propose a distributed implementation.