You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose I want to use git to store and version a dataset. How would I open a particular version of that dataset?
One option mentioned elsewhere by @pfitzseb would be to have dataset("name") load the latest version and have some syntax like dataset("name@v2") / dataset("name#hk98s2") load a specific version/hash, much like Pkg.
Another similar idea would be to add keyword arguments to dataset(). But I do think there's some benefits to using syntax within the string rather than keywords. URLs show how useful a standard string representation of resources-with-parameters can be.
r-component - parameters passed to the name resolver. (For us, this corresponds to passing parameters to the AbstractDataProject.)
q-component - parameters passed to either the named resource or a system that can supply the requested service. The q-component is specified to have the same syntax as the query part of a URL. (For us, I guess this corresponds to passing parameters to the storage backend when it open()s the dataset.)
f-component - interpreted by the client as a specification for a location within, or region of, the named resource; similar to the fragment of a URL. (For us, this would be parameters applied to the object which comes from open()ing a dataset. For example, to supply a relative path within a BlobTree.)
While I think the URN RFC has some useful concepts I'm not super keen on their syntax which is like URI syntax but confusingly subtly different, with the normal query part prefixed with ?= as name?+rcomponent?=qcomponent#fcomponent.
But I'm also not sure the Pkg syntax is quite what we want. For packages it's useful to make versioning very central in the syntax to the extent of taking up two different types of syntax just to specify versions. Unlike Pkg I think there could be other parameters we might want to pass when addressing data storage, not just a version.
The text was updated successfully, but these errors were encountered:
Suppose I want to use
git
to store and version a dataset. How would I open a particular version of that dataset?One option mentioned elsewhere by @pfitzseb would be to have
dataset("name")
load the latest version and have some syntax likedataset("name@v2")
/dataset("name#hk98s2")
load a specific version/hash, much likePkg
.Another similar idea would be to add keyword arguments to
dataset()
. But I do think there's some benefits to using syntax within the string rather than keywords. URLs show how useful a standard string representation of resources-with-parameters can be.The URN RFC is a good source of inspiration here. In particular they specify three sets of parameters, the
r-component
,q-component
andf-component
- see https://datatracker.ietf.org/doc/html/rfc8141#page-12 :r-component
- parameters passed to the name resolver. (For us, this corresponds to passing parameters to theAbstractDataProject
.)q-component
- parameters passed to either the named resource or a system that can supply the requested service. The q-component is specified to have the same syntax as the query part of a URL. (For us, I guess this corresponds to passing parameters to the storage backend when itopen()
s the dataset.)f-component
- interpreted by the client as a specification for a location within, or region of, the named resource; similar to the fragment of a URL. (For us, this would be parameters applied to the object which comes fromopen()
ing a dataset. For example, to supply a relative path within aBlobTree
.)While I think the URN RFC has some useful concepts I'm not super keen on their syntax which is like URI syntax but confusingly subtly different, with the normal query part prefixed with
?=
asname?+rcomponent?=qcomponent#fcomponent
.But I'm also not sure the
Pkg
syntax is quite what we want. For packages it's useful to make versioning very central in the syntax to the extent of taking up two different types of syntax just to specify versions. UnlikePkg
I think there could be other parameters we might want to pass when addressing data storage, not just a version.The text was updated successfully, but these errors were encountered: