-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Deprecate CollectionDescription.Schema #1939
refactor: Deprecate CollectionDescription.Schema #1939
Conversation
d5a02d4
to
f578d13
Compare
Codecov ReportAttention:
@@ Coverage Diff @@
## develop #1939 +/- ##
===========================================
- Coverage 74.94% 74.81% -0.13%
===========================================
Files 241 240 -1
Lines 23616 23645 +29
===========================================
- Hits 17699 17689 -10
- Misses 4715 4751 +36
- Partials 1202 1205 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
cli/collection_describe.go
Outdated
@@ -16,6 +16,11 @@ import ( | |||
"github.com/sourcenetwork/defradb/client" | |||
) | |||
|
|||
type CollectionDefinition struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does it need to be a public definition of cli
package?
Seems like it doesn't belong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the public output type of the command. And this is the only place that outputs it.
Where were you thinking it should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct doesn't need to be public. Only its fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct doesn't need to be public. Only its fields.
That is technically correct, but, IMO conceptually incorrect. This is a public return type, it is just that it is returned as json - declaring it as private would be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's specific to this CLI command and not used anywhere else. I really think this struct shouldn't be public. If you absolutely want it to be public, please at some documentation to the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is returned from a public endpoint but that public endpoint isn't a public Go function. That endpoint won't show up in the godocs and the function itself is returning an error. The type/shape should be documented elsewhere but not with a public Go type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to define it in the client
package and reference it in the cli
and http
. That would also get rid of the duplicate definition in the http
client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy with that option :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice okay - I'll try and figure something out. Some other stuff might get re-jigged slightly as we already have a CollectionDefinition (interface) in client, but hopefully everything will be the better for it.
- Bring http and cli CollectionDefinitions into client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorted, client.CollectionDefinition is now a struct
f578d13
to
965a7e4
Compare
if field.ID == id { | ||
return field, true | ||
} | ||
func (col CollectionDescription) GetFieldByID(id FieldID, schema *SchemaDescription) (FieldDescription, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: This and the couple other methods bellow no longer need the CollectionDescription
. Do you intend on turning those into methods of SchemaDescription
in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this and the others remain here as whilst they currently only access stuff on the schema, that is because the stuff they access is incorrectly on the schema.
For example, in this case we are getting fields by their ID, FieldID is a local concept that should exist only on collection, not on schema.
The function is located in the correct place, but for historical reasons the data structures are incorrect and need to be changed along with the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying
db/collection.go
Outdated
existingDescriptionsByName map[string]client.CollectionDescription, | ||
proposedDescriptionsByName map[string]client.CollectionDescription, | ||
existingSchemaByName map[string]client.SchemaDescription, | ||
proposedDescriptionsByName map[string]client.SchemaDescription, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought(out of scope): I would like to point out that this in a really confusing set of params. We are updateing one collection but we are passing all collection, all schemas and all proposed schemas. In my opinion, only what is directly relevant should be passed to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is fiddly, and I agree that it is a bit of an eyesore. However they are all used and all need to be there (or within this func, we could probably (re)fetch them) due to relational fields.
It does get improved slightly in a later PR when we are able to change all their types to SchemaDescriptions (allowing us to only pass in two maps instead of 3). Might be worth re-visiting this in #1965
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a todo in that PR, as one of the params still makes no sense there: https://github.com/sourcenetwork/defradb/pull/1965/files#r1358767728
Removing it leaves us with:
func (db *db) updateCollection(
ctx context.Context,
txn datastore.Txn,
existingSchemaByName map[string]client.SchemaDescription,
proposedDescriptionsByName map[string]client.SchemaDescription,
schema client.SchemaDescription,
setAsDefaultVersion bool,
) (client.Collection, error) {
existingSchemaByName
could be re-fetched inside updateCollection
, but that feels fairly wasteful atm (although a planned repository mechanic would remove that issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used but in a wasteful way as we only care about one in each of the maps. The only proposedDescription that we care about is already passed as schema
so that one can be removed completely. It should probably look like this:
func (db *db) updateCollection(
ctx context.Context,
txn datastore.Txn,
existingSchema client.SchemaDescription,
schema client.SchemaDescription,
desc client.CollectionDescription
setAsDefaultVersion bool,
) (client.Collection, error) {
Although the checks on the schema should probably be done before calling updateCollection
and we would end up with just:
func (db *db) updateCollection(
ctx context.Context,
txn datastore.Txn,
schema client.SchemaDescription,
desc client.CollectionDescription
setAsDefaultVersion bool, // if we call updateCollection, this should be implicit and removed from params.
) (client.Collection, error) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only proposedDescription that we care about is already passed as schema so that one can be removed completely
This is not true, we must validate that all relational fields point to a valid schema. That schema is rarely the one being updated.
Although the checks on the schema should probably be done before calling updateCollection
I'me very much against that, as it moves the validation out of a request-language agnostic part of the codebase (db.collection.go) into a request-language specific area (patchSchema) of the codebase. It would mean that if we add an alterative to patchSchema (including a typed embedded Go call) we'd have to rework the validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the problem then is that the name of the function is updateCollection
but it handles both schema and collection in the same function. Maybe once we get further in cleaning this up it won't be a problem anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for-reference: We spoke about this last Friday, and there is a task on a later PR to rename this function (when it stops being updateCollection)
request/graphql/schema/collection.go
Outdated
@@ -24,9 +24,32 @@ import ( | |||
"github.com/graphql-go/graphql/language/source" | |||
) | |||
|
|||
type collectionDefinition struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: You'll be able to use the one in the client package after you move it there instead of redefining it here.
db/collection_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: you removed so many tests. Are they already covered by some other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, our integration tests cover all of this, in significantly more depth.
@@ -324,28 +304,6 @@ func TestCreateIndex_IfFieldHasNoDirection_DefaultToAsc(t *testing.T) { | |||
assert.Equal(t, client.Ascending, newDesc.Fields[0].Direction) | |||
} | |||
|
|||
func TestCreateIndex_IfNameIsNotSpecified_Generate(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why did you remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is documented in the commit message:
One test was removed, TestCreateIndex_IfNameIsNotSpecified_Generate - this is because it only pretends to test what it tests - the name is set later by the unit test framework, and as such is identical to other tests here.
http/handler_store.go
Outdated
@@ -22,6 +22,11 @@ import ( | |||
|
|||
type storeHandler struct{} | |||
|
|||
type CollectionDefinition struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: whatever solution is for identical struct in cli/collection_describe.go
that we discussed should also apply here I believe
It adds no value in any location, and will always return false anyway. Function likely originates from a time where schemaless collections were still being considered a possibility.
Unrelated to main body of work, but might as well fix it here.
Note: The only place that calls this only wants the field name, which long term will be available on the CollectionFieldDescription anyway. I hope this param can be removed again soon. Additionally all properties of Schema used in that calling location will soon be on Collection itself, so the reference to Schema there is hopefully a very temporary thing too.
This will be temporary, as the properties required by this function will medium-long term on the Collection field set, not the schema.
This was validating stuff on the schema. This stuff has already been validated on the schema before reaching this code block.
It should no longer be accessed via [collection] desc
Just get it from col.Schema instead of having to maintain the same prop in multiple places - this caused a bug briefly.
Just get it from col.Schema instead of having to maintain the same prop in multiple places.
The private funtion is not the one under test, and so we should not be forced to change the P2P tests every time we wish to change this private function. May be a historical artifact.
The main change in this commit is the removal of the call to `db.createCollection`, which is a private function used as a utility _only_ in these tests, it is not a function under test, but a means to an end. Using it coupled the index tests to the function, whose only reason-to-be is from within the `db.addSchema` (GQL SDL stuff), this coupling is unwanted and needs to be removed in order to change the behaviour/signature of that private function in a later commit. The unit tests now use the public `db.AddSchema` and `db.GetCollectionByName` functions. One test was removed, TestCreateIndex_IfNameIsNotSpecified_Generate - this is because it only pretends to test what it tests - the name _is_ set later by the unit test framework, and as such is identical to other tests here.
Similar to the changes to the fetcher(s), this commit replaces CollectionDescription objects with Collections. It also removes the DescriptionsRepo, replacing calls to it with calls to `store.GetCollectionByName`. At some point the repo was caching descriptions to avoid fetching them from storage multiple times, but this has since been removed, making the type fairly pointless.
Splits the collection definition/metadata out from the interface that declares what a Collection can _do_. This allows the trimmed down metadata to be passed around to code-locations which should not have access to database operations, this will be used in later commit(s).
Uses the new(er) client.CollectionDefinition interface to minimize the number of references to collectionDescription.Schema when creating new collections/schema from a GQL SDL. I do not believe the gql parser code should have access to the database, so CollectionDefinition is passed around instead of Collection. Some more work will need to be made to this function and its children as the splitting of collection and schema description is completed, however this should get us a lot of the way there without changing public behaviour/signatures.
Previously the return values would include schema within collectionDescription. This commit makes them siblings. This is a public behaviour change (I think the only one in the PR).
This function will be broken up further outside of this PR, the remaining changes will be breaking and will change both storage and the public interfaces.
09f67a4
to
95edff0
Compare
95edff0
to
cf14555
Compare
@@ -16,24 +16,34 @@ import ( | |||
"github.com/sourcenetwork/defradb/datastore" | |||
) | |||
|
|||
// CollectionDefinition contains the metadata defining what a Collection is. | |||
type CollectionDefinition struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I'm a little worried that the name is so close to CollectionDescription
that it will be a source of confusion. What do you think of CollectionInfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike CollectionInfo
as it suffers from being a little ambiguous - could be read as containing info about what data is in the collection too (e.g. doc count), CollectionDefinition
suffers less from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably think of something better then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably think of something better then.
I tried, and CollectionDefinition
was the best I could come up with without spending an unreasonable amount of time on it. I'm fairly happy with it given the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting here for future reference:
I feel like client.SchemaDescription
should just be client.Schema
and client.CollectionDescription
should be client.Collection
. The interface should have a different name. Probably something like client.CollectionAPI
or client.CollectionHandler
.
This way, client.CollectionDefinition
would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work Andy!
schema := def.Schema | ||
desc := def.Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why reallocate when you've already passed a copy of the definition? Using def
directly would be clearer and you could assign to col.def
directly from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a struct, and its children are structs, mutating them does not mutate the one on the parent, the parent must instead later be overwritten with a new inner.
// foo.bar.foobar is false
foo.bar.foobar = true
// foo.bar.foobar is still false, a new (and unreachable) bar instance has value true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true: https://go.dev/play/p/FHce-ajgFyr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 Maybe not, but something was very broken before I made that change. It is very deliberate and it doesnt work without it. Could be something similar in the more complex real-world code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested it and all tests are passing. Not sure what would have been broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see what you have changed, most likely it was schema that was important here, as that is what is being mutated most.
UpdateCollection already does this, it is safer, and we dont care about the performance cost here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Andy!
## Relevant issue(s) Resolves sourcenetwork#1955 ## Description Deprecate CollectionDescription.Schema. Schema is not a sub-property of collection. Removes as many references to CollectionDescription.Schema as possible without making any breaking changes. Breaking changes will be made in a later PR. An exception has been made to the http API, which does have a breaking change in this PR.
Relevant issue(s)
Resolves #1955
Description
Deprecate CollectionDescription.Schema. Schema is not a sub-property of collection.
Removes as many references to CollectionDescription.Schema as possible without making any breaking changes. Breaking changes will be made in a later PR. An exception has been made to the http API, which does have a breaking change in this PR - it can be pulled out of here if people want it more obvious in the develop commit history/change log.
This should minimise the conflicts generated by getting the large broad changes sorted out early, instead of being bogged down behind the more complex, breaking, changes coming later.
Commits should be clean, and should bundle up the work into more readable chunks, would recommend reviewing them one by one.