-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use metadata schemas #82
Conversation
Whoops - there's a number of remaining errors. I see that one problem is that SLiM itself is counting on empty metadata as signifying "not really a population", so can't reload these tree sequences. We'll have to come up with a way to specify "not really a population" in the metadata. |
This is interesting that SLiM uses empty metadata as a signal. It would be possible to modify tskit's metadata handling to allow b'' to be decoded to |
This wasn't a design choice we thought much about, but it makes a bit of sense - it avoids us having to make up e.g. sex ratios for populations that didn't exist. Allowing null values would avoid some breakage out there in the wild; in particular everyone's scripts that are doing |
This sounds like a good idea to me - could we specify that the top-level object can be null directly for SLiM's schema, rather than in the metaschema? |
Yes, JSON schema allows union types so how I saw this working is that the metaschema allows the schema to specify a union of I'm not sure we have discussed the restriction on "fixed objects" (all keys present) enough. It's related to this as I can imagine software modifying a tree sequence only adding a particular metadata key to some rows, and the current |
Sounds good to me. I think it's fine to have all keys required in the struct codec for now (with this one exception). |
Filed as tskit-dev/tskit#659 |
Thanks a bunch. I'll have a go at this. |
0af6ae1
to
e44f268
Compare
e44f268
to
c0254d4
Compare
Closing in favor of #89 |
The interface for metadata is unchanged, but under the hood uses the new metadata schemas in tskit. We shouldn't merge this until SLiM writes out metadata schemas to the tables: mostly, everything is the same, except that previously, we allowed non-SLiM metadata (eg empty metadata), but that will no longer be allowed; this may break previous code that relied on checking the presence of metadata to see if mutations or nodes were added by SLiM or if populations were used.
Furthermore,
msprime.mutate( )
will no longer return a trivially SlimTreeSequence-able tree sequence, because newly added mutations will need SLiM metadata. I suppose this calls for apyslim.mutate( )
method, and previous code, which didpyslim.SlimTreeSequence(msprime.mutate(...))
will break - unless we havepyslim.SlimTreeSequence( )
add default metadata to entries in need of such. We'll probably want to write apyslim.mutate( )
method using the slim mutation generation ability of msprime, anyhow.Note that the code in this PR already adds default metadata for populations without metadata.
Similarly,
pyslim.recapitate( )
will need to be modified to add metadata to newly added nodes.It would also be very nice to ditch the metadata processing we do here enitrely - leave the metadata as dicts as returned by tskit instead of translating them to objects. That would be a more serious change, although if we're making breaking changes, then perhaps it's OK - it would only require changing code from e.g.
node.metadata.slim_id
tonode.metadata['slim_id']
.