Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TinyDB as the default document database #15

Closed
signebedi opened this issue Mar 17, 2024 · 8 comments
Closed

Use TinyDB as the default document database #15

signebedi opened this issue Mar 17, 2024 · 8 comments

Comments

@signebedi
Copy link
Owner

Like using SQLite as the default relational database, using TinyDB by default makes the application plug-and-play and solves a number of issues with containerization (see #12). TinyDB is a pip package; like using uvicorn (which means we no longer need celery or rabbitmq) we can reduce the operating system dependencies and keep most (if not all) major dependencies within the python environment.

There are some obvious differences between TinyDB and mongodb. For one, mongodb uses ObjectIDs. I think we can use Pydantic to enforce standardized data structures and types across both databases.

@signebedi
Copy link
Owner Author

signebedi commented Mar 17, 2024

I think that this can be handled with a MONGODB_ENABLED bool config, paired with a MONGODB_URI string config. If the bool assesses to True, then we add an assumption that the URI is not an empty string and possibly validate it using pydantic, https://docs.pydantic.dev/2.1/api/networks/#pydantic.networks.MongoDsn.

from pydantic import validator, ValidationError
from pydantic.networks import MongoDsn

....


    MONGODB_ENABLED:bool = os.getenv('MONGODB_ENABLED:', 'False') == 'True'
    MONGODB_URI: str = "" # Default to empty string

    @validator('MONGODB_URI', pre=True, always=True)
    def validate_mongodb_uri(cls, v):
        # Attempt to read from environment variable if not set
        uri = os.getenv('MONGODB_URI', '')
        if uri == '':
            return uri  # Allow empty strings
        # Utilize MongoDsn for validation if not empty
        return MongoDsn.validate(uri)

And then we add the following assertion in utils/scripts:

def validate_mongodb_configuration(config):
    # If MongoDB is enabled, ensure the URI is not an empty string
    if config.MONGODB_ENABLED and config.MONGODB_URI == '':
        raise ConfigurationError("MongoDB URI cannot be an empty string ('MONGODB_URI') when MongoDB is enabled ('MONGODB_ENABLED' = True).")

@signebedi
Copy link
Owner Author

We should implement methods with the same names and that take the same params for both TinyDB and MongoDB... this means that the interface in the application logic will be the same. As of now, we've implemented the TinyDB logic but not the MongoDB. We also need to add a pydantic model in the document_database library, which both classes can import and use to validate data on write.

Additionally, I continued the tradition of representing metadata field names as variables, like self.is_deleted_field = "_is_deleted". This has pros and cons, but at the very least allows us to maintain the same interface between mongodb and tinydb despite differences eg. in exact field naming and data structure.

@signebedi
Copy link
Owner Author

TinyDB has this weird Query() struct that is used in query conditions. The risk here is that there is significant different logic than mongodb... and I'd like to keep the application logic *** as close to identical *** as possible. One way to handle this is to instantiate a class variable for Query() within TinyDB and then to create a unique method as follows:

from typing import Callable

    self.Form = Query()

    ....

    def _check_condition(self, condition: Callable):
        if condition(self.Form):
            return True
        else:
            return False


# Call using a lambda function

result = YourObject._check_condition(lambda x: x > 5)

Of course, we want this to return the results of the query, not a bool... but it's a start, I think.

@signebedi
Copy link
Owner Author

I am going to go ahead with creating a ManageDocumentDatabase parent class because the situation is well aligned to the open/closed principle. We want a single interface but need to account for differences between MongoDB and TinyDB.

@signebedi
Copy link
Owner Author

signebedi commented Mar 21, 2024

TinyDB's use of integers for the document_id is somewhat vexing. I reviewed the source and found that document_id is handled in the tables script. Looking at a past issue (msiemens/tinydb#351), it looks like you can override the int default:

TinyDB uses the type specified in Table.document_id_class to convert the document ID before storing/after loading. If you use a custom table class that sets this to string and also provides a custom _get_next_id implementation, strings will be supported as document IDs.

Great. Because the app is async and we ideally run the "create_document" method as a background task, it may be ideal to set the document_id in the client, but of course to fall back to a self._generate_document_id method. MongoDB would support this, too, so there is an opportunity to standardize and permit data acceptance, run in the background, but immediately give the user a reference to the document when it is ready.

If subclassing the Table class doesn't work correctly. the alternative is to add a custom document_id metadata field that we can query against.

@signebedi
Copy link
Owner Author

signebedi commented Mar 22, 2024

Ok, we've implemented the overrides in 90f9a54. This may not be maintainable if there are major changes to the TinyDB codebase (cf. it's been 8 months since the last commit against that library, at the time of writing this comment) but I might think about making a pull request against the TinyDB codebase that allows users to choose between a string or int document ID and pass a callable override for the value setting.

@signebedi
Copy link
Owner Author

Replicated the TinyDB logic in a ManageMongoDB class
This is a natural follow on to #15. Once we get our data model to a place of greater maturity, see eg. #20 but also think about the class methods we want to use, then we should implement a ManageMongoDB class.

@signebedi
Copy link
Owner Author

Is it worth consolidating TinyDB collections into a single file?
Right now, each form gets its own file. Is there value in putting them all in a single file, and then adding form_name as one of the query params (or just filtering iteratively) . This is possible because we now store form name in the document metadata field. Does this get us the benefits we might hope? Reduced sprawl? Reduced complexity? Or, does it increase complexity and make it more difficult to distinguish data sets from one another?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant