-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple-create: Order determination for insertions #2056
Merged
ayush3797
merged 195 commits into
main
from
dev/agarwalayush/nestedInsertionOrderHelper
Apr 22, 2024
Merged
Multiple-create: Order determination for insertions #2056
ayush3797
merged 195 commits into
main
from
dev/agarwalayush/nestedInsertionOrderHelper
Apr 22, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…hether entity is linking or not
…h/AuthZForNestedInsertions
…h/AuthZForNestedInsertions
…ithub.com/Azure/data-api-builder into dev/agarwalayush/AuthZForNestedInsertions
…ue (#2116) ## Why make this change? - Closes #1951 - PR #1983, #2103 add CLI options to enable or disable multiple mutation/multiple create operation through CLI. With the changes introduced in the mentioned PRs, the configuration properties successfully gets written to the config file. Also, during deserialization, the properties are read and the `MultipleMutationOptions`, `MultipleCreateOptions`, `GraphQLRuntimeOptions` objects are created accordingly. - The above-mentioned PRs do not introduce any change in DAB engine behavior depending on the configuration property values. - This PR introduces changes to read these fields and enable/disable multiple create operation depending on whether the feature is enabled/disabled through the config file. This is achieved by introducing behavior changes in the schema generation. ## What is this change? - This PR builds on top of a) Schema generation PR #1902 b) Rename nested-create to multiple create PR #2103 - When multiple create operation is disabled, > i) Fields belonging to the related entities are not created in the input object type are not created. > ii) Many type multiple create mutation nodes (ex: `createbooks`, `createpeople_multiple` ) are not created. > iii) ReferencingField directive is not applied on relationship fields, so they continue to remain required fields for the create mutation operation. > iv) Entities for linking objects are not created as they are relevant only in the context of multiple create operations. ## How was this tested? - [x] Unit Tests and Integration Tests - [x] Manual Tests **Note:** At the moment, multiple create operation is disabled in the config file generated for integration tests. This is because of the plan to merge in the Schema generation, AuthZ/N branches separately to the main branch. With just these 2 PRs, a multiple create operation will fail, hence, the disabling multiple create operation. At the moment, tests that perform validations specific to multiple create feature enable it by i) updating the runtime object (or) ii) creating a custom config in which the operation is enabled. ## Sample Request(s) ### When Multiple Create operation is enabled - MsSQL #### Related entity fields are created in the input object type  #### Multiple type create operation is created in addition to point create operation  #### Querying related entities continue to work successfully  ### When Multiple Create operation is disabled - MsSQL #### Only fields belonging to the given entity are created in the input object type  #### Multiple type create operation is not created ### When Multiple Create operation is enabled - Other relational database types #### Only fields belonging to the given entity are created in the input object type  #### Multiple type create operation is not created --------- Co-authored-by: Ayush Agarwal <[email protected]>
seantleonard
added a commit
that referenced
this pull request
Apr 1, 2024
## Why make this change? If there is a relationship defined for a (source,target) in the database with right cardinality as `ONE`, we add the relationship in both the directions (source->target and target->source) to the source's `RelationshipMetadata` corresponding to the target entity's entry in `RelationshipMetadata.TargetEntityToFkDefinitionMap` dictionary. We do this because the relationship can be 1:1 or N:1. In case of N:1, the first FK (from source->target) would be present. In case of 1:1, either or both of them can be present. But all this is known to us only at a later stage when we collect metadata from the database and actually infer which of the two relationships (or both) is/are valid. After validating the correct relationship, we don't remove the other relationship which we added earlier. The redundant relationship jadds a redundant `JOIN` condition (I think this is something which is visible in the sql query generated by DAB highlighted in this issue: #1940). ## What is this change? The functioning of `SqlMetadatProvider.FillInferredFkInfo()` is changed such that for all the FK definitions present for a target entity in the source entity's definition, the FK definitions for a target entity in the source entity's definition contain: 1. One entry for relationships defined between source to target in the database (via FK constraint) in the right order of referencing/referenced entity. 2. Two entries for relationships defined in the config between source to target (which don't exist in the database). The above is achieved via a call to a newly added method `SqlMetadataProvider.GetValidatedFKs()` which loops over all the foreign key definitions defined for the target entity in the source entity's definition and adds to the set of validated FK definitions: 1. All the FK definitions which actually map to a foreign key constraint defined in the database. In such a case, if the source/target fields are also provided in the config, they are given precedence over the FK constraint. 2. FK definitions for custom relationships defined by the user in the configuration file where no FK constrain t exists between the pair of (source, target) entities. ## How was this tested? - Added tests to `SqlMetadataProviderUnitTests` for Pg/Ms/MySql for the cases when an FK constraint exists between the source/target entities. ## Manual Testing 1. Did manual testing to ensure the redundant join condition is not added in the query. For a nested query like: <img width="422" alt="image" src="https://github.com/Azure/data-api-builder/assets/34566234/e49c98ab-5c6a-49d9-8cfd-97382dd1bf94"> where the configuration for Book entity looks like: <img width="299" alt="image" src="https://github.com/Azure/data-api-builder/assets/34566234/ab83e96e-8d13-4d71-b6d1-b89d20b13612"> the query generated before the fix (in main): ```tsql SELECT TOP 100 [table0].[title] AS [title], JSON_QUERY ([table1_subq].[data]) AS [publishers] FROM [dbo].[books] AS [table0] OUTER APPLY ( SELECT TOP 1 [table1].[name] AS [name] FROM [dbo].[publishers] AS [table1] WHERE [table0].[publisher_id] = [table1].[id] AND [table1].[id] = [table0].[publisher_id] ORDER BY [table1].[id] ASC FOR JSON PATH, INCLUDE_NULL_VALUES,WITHOUT_ARRAY_WRAPPER ) AS [table1_subq]([data]) WHERE 1 = 1 ORDER BY [table0].[id] ASC FOR JSON PATH, INCLUDE_NULL_VALUES ``` after the fix: ```tsql SELECT TOP 100 [table0].[title] AS [title], JSON_QUERY ([table1_subq].[data]) AS [publishers] FROM [dbo].[books] AS [table0] OUTER APPLY ( SELECT TOP 1 [table1].[name] AS [name] FROM [dbo].[publishers] AS [table1] WHERE [table0].[publisher_id] = [table1].[id] ORDER BY [table1].[id] ASC FOR JSON PATH, INCLUDE_NULL_VALUES,WITHOUT_ARRAY_WRAPPER ) AS [table1_subq]([data]) WHERE 1 = 1 ORDER BY [table0].[id] ASC FOR JSON PATH, INCLUDE_NULL_VALUES ``` <Observe the removal of redundant join condition in the 4th line in the second query> Similar observations for MySql: ```mysql SELECT COALESCE(JSON_ARRAYAGG(JSON_OBJECT(@param3, `subq6`.`title`, @param4, `subq6`.`publishers`)), JSON_ARRAY()) AS `data` FROM ( SELECT `table0`.`title` AS `title`, `table1_subq`.`data` AS `publishers` FROM `books` AS `table0` LEFT OUTER JOIN LATERAL (SELECT JSON_OBJECT(@Param2, `subq5`.`name`) AS `data` FROM ( SELECT `table1`.`name` AS `name` FROM `publishers` AS `table1` WHERE `table0`.`publisher_id` = `table1`.`id` ORDER BY `table1`.`id` ASC LIMIT 1 ) AS `subq5`) AS `table1_subq` ON TRUE WHERE 1 = 1 ORDER BY `table0`.`id` ASC LIMIT 100 ) AS `subq6` ``` For pgSql: ```psql SELECT COALESCE(jsonb_agg(to_jsonb("subq6")), '[]') AS "data" FROM ( SELECT "table0"."title" AS "title", "table1_subq"."data" AS "publishers" FROM "public"."books" AS "table0" LEFT OUTER JOIN LATERAL (SELECT to_jsonb("subq5") AS "data" FROM ( SELECT "table1"."name" AS "name" FROM "public"."publishers" AS "table1" WHERE "table0"."publisher_id" = "table1"."id" ORDER BY "table1"."id" ASC LIMIT 1 ) AS "subq5") AS "table1_subq" ON TRUE WHERE 1 = 1 ORDER BY "table0"."id" ASC LIMIT 100 ) AS "subq6" ``` 2. Did manual testing to confirm that the relationship fields provided by user take precedence: (Creating issue here to add a test later:#2104) Config: <img width="242" alt="image" src="https://github.com/Azure/data-api-builder/assets/34566234/3bf60082-309c-4fc2-bd6b-d307061d61ba"> GraphQL query: <img width="282" alt="image" src="https://github.com/Azure/data-api-builder/assets/34566234/6f9bf50d-f679-4c6c-87a3-377fc86c423b"> Generated SQL query: <img width="954" alt="image" src="https://github.com/Azure/data-api-builder/assets/34566234/9b082354-1217-4ecf-b05b-263d5e2e5a13"> 3. Manually tested that for relationships which are not backed by an FK constraint, we infer both the source/target entities as the referencing entity. Test to be added post the merge of : #2056 --------- Co-authored-by: Shyam Sundar J <[email protected]> Co-authored-by: Sean Leonard <[email protected]>
severussundar
approved these changes
Apr 2, 2024
/azp run |
seantleonard
approved these changes
Apr 19, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, please incorporate suggestions where you deem appropriate. Clarifying comments and such.
src/Service.Tests/Unittests/MultipleCreateUnitTests/MultipleCreateOrderHelperUnitTests.cs
Outdated
Show resolved
Hide resolved
src/Service.Tests/Unittests/MultipleCreateUnitTests/MultipleCreateOrderHelperUnitTests.cs
Outdated
Show resolved
Hide resolved
src/Service.Tests/Unittests/MultipleCreateUnitTests/MultipleCreateOrderHelperUnitTests.cs
Outdated
Show resolved
Hide resolved
src/Service.Tests/Unittests/MultipleCreateUnitTests/MultipleCreateOrderHelperUnitTests.cs
Outdated
Show resolved
Hide resolved
src/Service.Tests/Unittests/MultipleCreateUnitTests/MultipleCreateOrderHelperUnitTests.cs
Outdated
Show resolved
Hide resolved
/azp run |
/azp run |
seantleonard
approved these changes
Apr 21, 2024
ayush3797
added a commit
that referenced
this pull request
Apr 25, 2024
## Why make this change? With support for multiple-create, we introduce additional flexibility in the GraphQL schema where we make the referencing fields in an entity optional because their value can be derived from insertions in the referenced entity. So to say, _the values of referencing fields in a referencing entity can have 3 sources of truth_. The value for a referencing field can be derived via: 1. User input data 2. Insertion in the parent referenced entity (if current entity is a referencing entity for a relationship with its parent entity) 3. Insertion in the child referenced entity (if the current entity is a referencing entity for a relationship with its child entity) **_They can all exist at the same time, or none of them can. Presence of more than one sources of truth leads to possible conflicting values for the referencing column. Thus, at a time, we want exactly one of them to be present - so that we exactly one source of truth_**. This needs us to do request validation during request execution, so that only valid GQL mutations reach the query generation/query execution stage. This PR adds the logic for the same. ## What is this change? - Added class `MultipleMutationInputValidator.cs` to perform the validations for a multiple-create mutation. - Added class `MultipleMutationEntityInputValidationContext` to store relevant information while performing validations on a particular entity. - A call to the method `MultipleMutationInputValidator.ValidateGraphQLValueNode()` will be made _just after Authorization_ - request validation only occurs when the user is authorized to execute the mutation. - The method `MultipleMutationInputValidator.ValidateGraphQLValueNode()` makes recursive call to itself to parse out the entire create mutation request body based on an entity or relationship basis. Meaning, this method will be called on the top-level entity, followed by all the relationships for that entity, followed by all the relationships for the entity's relationships and so on.... - The actual validation for the input data for an entity (top-level entity or a nested entity) which includes column fields and relationship fields is done by the method `GraphQLRequestValidator.ValidateObjectFieldNodes()` which is called from the method `ValidateGraphQLValueNode()`. - The method `ValidateObjectFieldNodes()` contains logic to ensure that when the current entity acts as the referencing entity for the relationships with its target entity, then the current entity should not specify a value for the referencing fields - as the values for the referencing fields will be derived from the insertion in the referenced target entity. Note: The determination of referencing entity is done via a call to `MultipleCreateOrderHelper.GetReferencingEntityName()` method added via: #2056. - The method `ValidateObjectFieldNodes()` makes calls to 1. `ValidateAbsenceOfReferencingColumnsInTargetEntity()`: To validate that the current entity does not specify value for referencing columns to its parent entity (when parent entity is the referenced entity). 2. `ProcessRelationshipField()`: To process a relationship field and does several things. Please refer to method summary. 3. `ValidateAbsenceOfRepeatedReferencingColumn()`: This method is called for all the relationships with the current entity which are included in the request body. This ensures that referencing columns in the referencing entity (whichever entity, either source or target turns out as the referencing entity) does not hold references to multiple columns in the referenced entity - to avoid multiple sources of truth for the referencing column's value. (Issue: #2019) 4. `ValidatePresenceOfRequiredColumnsForInsertion()`: To ensure that we have one source of truth for values for referencing fields. 5. `ValidateRelationshipFields()`: Recursively perform the same validations for all the relationships with the current entity included in the mutation request body. ` ## How was this tested? - [x] Integration Tests - Added tests to `MultipleMutationIntegrationTests.cs` to validate: 1. Throwing exception for absence of source of truth for referencing column for create one mutations 2. Throwing exception for absence of source of truth for referencing column for create multiple mutations. 3. Throwing exception for presence of multiple sources of truth for referencing column for create one mutations. 4. Throwing exception for presence of multiple sources of truth for referencing column for create one mutations. ## Sample Request(s) 1. Request:  Response:  2. Request:  Response:  3. Request and response:  --------- Co-authored-by: Shyam Sundar J <[email protected]> Co-authored-by: Sean Leonard <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why make this change?
Using multiple-create, we want to perform insertions in two related entities. The insertion is first performed on the referenced entity followed by insertion in the referencing entity, and we want to determine exactly this - which entities for the mutation act as the referencing/referenced entity, respectively. This PR's changes adds this very logic to determine the referencing/referenced entities.
What is this change?
MultipleCreateOrderHelper
which will expose public methodGetReferencingEntityName()
to get the referencing entity name for a related pair of source -> target entities for which insertion is to be performed in both the source and target entities.GetReferencingEntityName()
tries to determine the referencing entity name based on the metadata collected at startup using a call to the methodDetermineReferencingEntityBasedOnEntityRelationshipMetadata()
. This is achievable for 1:N relationships and for relationships backed by an FK constraint.DetermineReferencingEntityBasedOnEntityRelationshipMetadata()
method fails to determine the referencing entity name, a subsequent call is made to theDetermineReferencingEntityBasedOnRequestBody()
method which tries to determine the referencing entity name using the input data for the source and target entities.DetermineReferencingEntityBasedOnRequestBody()
method fails to determine the referencing entity name.How was this tested?
NOTE:
SqlMetadatProvider.cs
changes as those are from a different PR : Removing wrong FK definition #2051 but were needed for correct functioning of the code changes done in this PR.SqlInsertQueryStructure.cs::73