-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage abstraction #169
Storage abstraction #169
Conversation
Per our discussion on Slack, I have updated this branch with changes to make certain destructive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall functionality looks great! Some minor changes requested, mainly to do with naming, documentation, and user-facing elements.
|
||
def reinitialize_from(self, other: Union["ConvoKitMeta", dict]): | ||
""" | ||
Reinitialize this ConvoKitMeta instance with the data from other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: from another ConvoKitMeta instance or a dictionary with metadata key-value pairs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go!
@jpwchang Can we merge this? |
Description
Introduces a new layer of abstraction between Corpus components (
Utterance
,Speaker
,Conversation
,ConvoKitMeta
) and concrete data storage. Data storage is now handled by aStorageManager
instance variable in the Corpus.StorageManager
defines an abstract class interface that can have concrete implementations with varying storage scheme; this pull request implementsMemStorageManager
which stores all data in in-memory dicts, thus replicating the existing behavior.Motivation and Context
In previous versions of ConvoKit, Corpus components have stored their data and metadata directly as instance variables. This scheme inherently means that the data must live directly in memory. Plans are currently underway to introduce alternative data backings such as database-backed storage. Introducing an abstraction layer for data storage will allow these alternative data models to be implemented without having to reimplement every Corpus component class from scratch.
How has this been tested?
Besides ensuring that all existing unit tests pass, tests have also been expanded to add new checks for
StorageManager
integrity after complex operations such as merges.