Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0] Re-implement GridFS #1697

Closed
alcaeus opened this issue Dec 21, 2017 · 1 comment
Closed

[2.0] Re-implement GridFS #1697

alcaeus opened this issue Dec 21, 2017 · 1 comment

Comments

@alcaeus
Copy link
Member

alcaeus commented Dec 21, 2017

GridFS has been removed from 2.0.x-dev to limit the complexity of the switch to ext-mongodb. GridFS support needs to be readded for 2.0.

Here are the notes from previous hangouts collecting ideas for a new implementation of GridFS support:

Mapping:

/**
 * @ODM\File(bucketName=””, chunkSizeBytes=””, readConcern=””)
 * @ODM\WriteConcern() - optional
 * @ODM\ReadPreference() - optional
 */
private $file;

ODM will no longer inject additional properties like md5 or uploadDate to an owning document - thanks to this we’ll enable having more than one @odm\File annotated field in a document.

A field mapped as File can accept several types. From user perspective, they can assign a string or stream to the field for ODM to transform it during persist (similar to collections that are array - they are transformed to a persistent collection). When the data is hydrated (or said transformation happens) ODM will use specialized class that wraps MongoDB\GridFS\Bucket, limiting its original API to what is needed to interact with particular file and exposing access to properties like MD5 hash, chunkFileSize et al. Effectively the class will look similar to:

class GridFSFile
{
    public function __construct(Bucket $bucket, ObjectId $id, array $options);

    public function downloadToStream();

    public function openDownloadStream();

    public function getMD5Hash();

    // other getters for file’s metadata
}

Whether we want to expose methods like delete or rename is subject to a discussion, best would be to base it on real life usages which, unfortunately, are not known to me. Removing a file could also happen any time user changes value of a field mapped as File, controlled by a mapping property.

UPDATE: Actually we might provide another implementation of said GridFSFile that would accept a string or a stream. This way end-user could rely on an (internal) interface and storage would be transparent to him:

class NotPersistedFile implements File
{
    private function __construct();

    public static function fromPath($path);

    public static function fromStream();

    // other methods from the interface (thus GridFSFile)
}

Issues with current GridFS implementation

  • Files are mutable - we should no longer allow changing the contents of a file document and instead treat them as write-once.
  • It’s possible to store data in the root document in fs.files - this is not supported by the new driver and needs to go
  • Mapping a single property as file feels wrong, the whole document is the “file”. What is mapped with @File is merely a placeholder for whatever currently holds the data (e.g. a string containing a path)
  • The “pass a string, get an object” mechanism of the file property leads to an inconsistent API. Instead, users should be able to construct the file object themselves to have a clean API. See https://gist.github.com/alcaeus/4114209a272c869c45cf7b6ed176ec06 for a sample implementation around that problem
  • The fields stored by MongoDB (e.g. length, md5, etc.) need to be mapped separately - this should not be necessary. Standardizing the documents involved in GridFS would solve that problem
  • The fields mentioned above are not filled automatically after a flush operation, instead the document needs to be refreshed once

Wish list for ODM 2.0 GridFS API

  • By default, don’t allow arbitrary fields in the fs.files document to be mapped
  • Consider allowing a legacyMode (or similarly named) flag to allow other fields to be mapped (preferably with NotSaved)
  • Reserved fields for in the fs.files document should be NotSaved (e.g. length, md5, filename)
  • Use an interface and corresponding trait for the fs.files document
  • The top-level fs.files document can be mapped with a GridFSDocument or FileDocument annotation.
    Behaviorally, it is a subset of the generic Document type
  • Allow a custom class to be specified as the EmbedOne targetDocument for the “metadata” property
  • When you first create a GridFS file from a readable stream, we return a proxy for the fs.files document
  • Thereafter, the fs.files document class does not allow the stream to be modified
  • WriteConcern and ReadPreference are top-level document mappings. Hopefully we also add ReadConcern.
@alcaeus
Copy link
Member Author

alcaeus commented Aug 17, 2018

GridFS support has landed in master, closing here.

@alcaeus alcaeus closed this as completed Aug 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant