Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHS e2e example changes #1585

Merged
merged 9 commits into from
Dec 21, 2018
Merged

Conversation

bsrikan
Copy link
Contributor

@bsrikan bsrikan commented Nov 27, 2018

This PR contains changes to the DHS e2e example. Renaming flows so they make sense.
Formatting and adding more info in the README to look up inserted docs etc.



___You can verify via REST port 8004 for ingested/harmonized docs___
# Running Data Hub Service end-end #

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Running Data Hub Service end-end #
# Running Data Hub Service end-to-end #

___You can verify via REST port 8004 for ingested/harmonized docs___
# Running Data Hub Service end-end #

There are 2 projects one for each DHS and DSF. DHS is the project to run DHF flows in the stack spun up by DHS. Whereas DSF is the project that consumes curated data in the FINAL database. Commands to be executed in sequence are as under:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are 2 projects one for each DHS and DSF. DHS is the project to run DHF flows in the stack spun up by DHS. Whereas DSF is the project that consumes curated data in the FINAL database. Commands to be executed in sequence are as under:
There are two projects — one for running DHF flows in Data Hub Service (DHS), and one for consuming data from the final database with Data Services (DSF).


There are 2 projects one for each DHS and DSF. DHS is the project to run DHF flows in the stack spun up by DHS. Whereas DSF is the project that consumes curated data in the FINAL database. Commands to be executed in sequence are as under:

## Pre-req ##

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Pre-req ##
## Prerequisites ##

There are 2 projects one for each DHS and DSF. DHS is the project to run DHF flows in the stack spun up by DHS. Whereas DSF is the project that consumes curated data in the FINAL database. Commands to be executed in sequence are as under:

## Pre-req ##
Gradle 4.x+ installed globally.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Gradle 4.x+ installed globally.
Gradle 4.x+ is installed globally.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you should use a global gradle install step. gradlew is much safer to use and less invasive. I actually have never installed gradle on my laptop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I added it because I was running into issues with gradle version, since DHF expect 3.4+ and DSF expected 3.8+ I think. So I installed a 4.x+ and just used that and thought it will be easier. I can include the wrapper within these projects so folks trying this out there dont have to install gradle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DHF requires a minimum of 3.4, but we support 4.x. Packaging one would be even easier, I agree, and help insulate the DHS from 'DHF-esque' deployment specifics.

* `export PATH=$PATH:<unzipped dir>/gradle4.2.1/bin`
* source ~/.bash_profile

There is a gradle task “importAllCustomers” to ingest source documents. So you can either run that task or use your locally installed mlcp.sh as described later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is a gradle task “importAllCustomers” to ingest source documents. So you can either run that task or use your locally installed mlcp.sh as described later.
Install MarkLogic Content Pump (mlcp). Note: This example uses mlcp. If you prefer not to use mlcp, you can use the gradle task “importAllCustomers” to ingest source documents.


## Pre-req ##
Gradle 4.x+ installed globally.
* wget https://services.gradle.org/distributions/gradle-4.2.1-bin.zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 'wget' mean something or is it a typo?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wget is a common utility to get things from the web, alternate to curl

## Pre-req ##
Gradle 4.x+ installed globally.
* wget https://services.gradle.org/distributions/gradle-4.2.1-bin.zip
* unzip to a dir of choice

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* unzip to a dir of choice
* Unzip to a directory of your choice

Gradle 4.x+ installed globally.
* wget https://services.gradle.org/distributions/gradle-4.2.1-bin.zip
* unzip to a dir of choice
* update env var PATH in bash_profile

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* update env var PATH in bash_profile
* Update the PATH environment variable in bash_profile

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you documented everything so well, but this is specific to bash, and .bash_profile is specific so some operating systems. not important of course.

* unzip to a dir of choice
* update env var PATH in bash_profile
* `export PATH=$PATH:<unzipped dir>/gradle4.2.1/bin`
* source ~/.bash_profile

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a command they have to run or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its a command to run so the changes in .bash_profile can take effect.

* source ~/.bash_profile


After creating stack in AWS make sure you have created users with appropriate roles. Update DHS/gradle.properties and DSF/gradle.properties to use them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After creating stack in AWS make sure you have created users with appropriate roles. Update DHS/gradle.properties and DSF/gradle.properties to use them.
After creating your stack in AWS, make sure you have created users with appropriate roles. Update DHS/gradle.properties and DSF/gradle.properties to use them.


After creating stack in AWS make sure you have created users with appropriate roles. Update DHS/gradle.properties and DSF/gradle.properties to use them.

* For DHS project

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For DHS project
* For the DHS project

* For DHS project
* User with flowDeveloper role can create, load into modules DB and run flows
* User with flowOperator role can only run flows
* For DSF project

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For DSF project
* For the DSF project

After creating stack in AWS make sure you have created users with appropriate roles. Update DHS/gradle.properties and DSF/gradle.properties to use them.

* For DHS project
* User with flowDeveloper role can create, load into modules DB and run flows

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* User with flowDeveloper role can create, load into modules DB and run flows
* Users with the flowDeveloper role can load flows into the modules database and run them


* For DHS project
* User with flowDeveloper role can create, load into modules DB and run flows
* User with flowOperator role can only run flows

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* User with flowOperator role can only run flows
* Users with the flowOperator role can only run flows

* User with flowDeveloper role can create, load into modules DB and run flows
* User with flowOperator role can only run flows
* For DSF project
* User with endpointDeveloper role can load into modules DB and call the DSF API
Copy link

@sbayatpur sbayatpur Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* User with endpointDeveloper role can load into modules DB and call the DSF API
* Users with endpointDeveloper role can load documents into the modules database and call the DSF API

* User with flowOperator role can only run flows
* For DSF project
* User with endpointDeveloper role can load into modules DB and call the DSF API
* User with endpointUser role can only call the DSF API

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* User with endpointUser role can only call the DSF API
* Users with the endpointUser role can only call the DSF API



## Assumptions and things to note ##
* This project assumes that DHS environment is already provisioned. All the app servers, databases are provisioned and required roles created as described in pre-req above

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This project assumes that DHS environment is already provisioned. All the app servers, databases are provisioned and required roles created as described in pre-req above
* This project assumes that DHS environment is already provisioned. All the app servers and databases should be provisioned and required roles created as described in prerequisites above

## Assumptions and things to note ##
* This project assumes that DHS environment is already provisioned. All the app servers, databases are provisioned and required roles created as described in pre-req above
* This example was executed from bastion host
* DHS enables you to configure the endpoints to be private or public. If they are public, you can run this project from your laptop. If the endpoints are private, then these hosts are only accessible from the VPC that is peered to the MarkLogic VPC (This can be accessed from your local by ssh tunneling). In either case please update mlHost in DHS/gradle.properties and DSF/gradle.properties to use Flows endpoint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ssh tunneling in addition to using a bastion host?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an addition but an alternate to bastion host, so you can run things from your laptop.

task importAllCustomers(type: com.marklogic.gradle.task.MlcpTask) {
doFirst {
classpath = configurations.mlcp
input_file_path = "/Users/sbalasub/git/marklogic-data-hub/examples/DHS-e2e/DHS/input/json/customers/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you accommodate this use of your home directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have asked folks to edit the input file path in the README.

import com.fasterxml.jackson.databind.JsonNode;


public class testCustomer {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java always capitalize class names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

There are 2 projects one for each DHS and DSF. DHS is the project to run DHF flows in the stack spun up by DHS. Whereas DSF is the project that consumes curated data in the FINAL database. Commands to be executed in sequence are as under:

## Pre-req ##
Gradle 4.x+ installed globally.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you should use a global gradle install step. gradlew is much safer to use and less invasive. I actually have never installed gradle on my laptop.


## Pre-req ##
Gradle 4.x+ installed globally.
* wget https://services.gradle.org/distributions/gradle-4.2.1-bin.zip
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wget is a common utility to get things from the web, alternate to curl

Gradle 4.x+ installed globally.
* wget https://services.gradle.org/distributions/gradle-4.2.1-bin.zip
* unzip to a dir of choice
* update env var PATH in bash_profile
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you documented everything so well, but this is specific to bash, and .bash_profile is specific so some operating systems. not important of course.

* DHS enables you to configure the endpoints to be private or public. If they are public, you can run this project from your laptop. If the endpoints are private, then these hosts are only accessible from the VPC that is peered to the MarkLogic VPC (This can be accessed from your local by ssh tunneling). In either case please update mlHost in DHS/gradle.properties and DSF/gradle.properties to use Flows endpoint.


# Steps to run end to end #

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Steps to run end to end #
# Steps #

## Getting your flows in DHS ##
1. cd `<path to DHS>`
2. Install data-hub core MODULES
1. gradle hubInstallModules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. gradle hubInstallModules
1. Run gradle hubInstallModules

1. cd `<path to DHS>`
2. Install data-hub core MODULES
1. gradle hubInstallModules
2. data-hub-MODULES should have 133 documents. You can verify this in your browser:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. data-hub-MODULES should have 133 documents. You can verify this in your browser:
2. Verify that data-hub-MODULES has 133 documents from your browser:

2. data-hub-MODULES should have 133 documents. You can verify this in your browser:
___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-MODULES___
3. Load your modules for input/harmoninzation flows
1. gradle mlLoadModules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. gradle mlLoadModules
1. Run gradle mlLoadModules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a command so didnt want to include it in a sentence. The sub bullet is the thing to run to get the thing described in main bullet done.

___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-MODULES___
3. Load your modules for input/harmoninzation flows
1. gradle mlLoadModules
2. data-hub-MODULES should now have 145 documents. You can verify this in your browser:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. data-hub-MODULES should now have 145 documents. You can verify this in your browser:
2. Verify that data-hub-MODULES has 145 documents from your browser:

1. gradle mlLoadModules
2. data-hub-MODULES should now have 145 documents. You can verify this in your browser:
___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-MODULES___
4. Run input flow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Run input flow
4. Run the input flow

___Alternately you can run___

2. gradle importAllCustomers
1. Ensure to update path to input documents in DHS/build.gradle where the task is defined

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Ensure to update path to input documents in DHS/build.gradle where the task is defined
- Make sure to update the path to the input documents in DHS/build.gradle, where the task is defined


2. gradle importAllCustomers
1. Ensure to update path to input documents in DHS/build.gradle where the task is defined
3. Post ingestion, there should be 11 documents in data-hub-STAGING:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Post ingestion, there should be 11 documents in data-hub-STAGING:
3. Post ingestion, verify there are 11 documents in data-hub-STAGING:

1. Ensure to update path to input documents in DHS/build.gradle where the task is defined
3. Post ingestion, there should be 11 documents in data-hub-STAGING:
___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-STAGING___
5. Run harmonization flow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Run harmonization flow
5. Run the harmonization flow

3. Post ingestion, there should be 11 documents in data-hub-STAGING:
___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-STAGING___
5. Run harmonization flow
1. gradle hubRunFlow -PentityName=Customer -PflowName=customerHarmonize

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. gradle hubRunFlow -PentityName=Customer -PflowName=customerHarmonize
1. Run gradle hubRunFlow -PentityName=Customer -PflowName=customerHarmonize

___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-STAGING___
5. Run harmonization flow
1. gradle hubRunFlow -PentityName=Customer -PflowName=customerHarmonize
2. Post harmonization, there should be 11 documents in data-hub-FINAL:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Post harmonization, there should be 11 documents in data-hub-FINAL:
2. Verify there are 11 documents in data-hub-FINAL:

2. Post harmonization, there should be 11 documents in data-hub-FINAL:
___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-FINAL___

## Consuming curated data from FINAL database ##

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Consuming curated data from FINAL database ##
## Consuming curated data from the final database ##

___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-FINAL___

## Consuming curated data from FINAL database ##
1. cd `<path to DSF>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. cd `<path to DSF>`
1. Navigate to the DSF directory (cd `<path to DSF>`)

## Consuming curated data from FINAL database ##
1. cd `<path to DSF>`
2. Load your APIs into data-hub-MODULES database

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Load your APIs into data-hub-MODULES database
2. Load your APIs into the data-hub-MODULES database

## Consuming curated data from FINAL database ##
1. cd `<path to DSF>`
2. Load your APIs into data-hub-MODULES database
1. gradle mlLoadModules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. gradle mlLoadModules
1. Run gradle mlLoadModules

1. cd `<path to DSF>`
2. Load your APIs into data-hub-MODULES database
1. gradle mlLoadModules
1. If you are using same project to run against another AWS stack, delete `module-timestamps.properties` under `build/ml-javaclient-util` dir

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they do this after or before running gradle mlLoadModules?

2. Load your APIs into data-hub-MODULES database
1. gradle mlLoadModules
1. If you are using same project to run against another AWS stack, delete `module-timestamps.properties` under `build/ml-javaclient-util` dir
2. data-hub-MODULES should now have 152 documents. You can verify this in your browser:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. data-hub-MODULES should now have 152 documents. You can verify this in your browser:
2. Verify that data-hub-MODULES should has 152 documents from your browser:

2. data-hub-MODULES should now have 152 documents. You can verify this in your browser:
___http://CURATION_ENDPOINT:8004/v1/search?database=data-hub-MODULES___
3. Call the API. The API runs a query on FINAL database to return all the Customers who have "Sales" in their title
1. gradle runMain
Copy link

@sbayatpur sbayatpur Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to indent without adding a bullet point or number in front of 'gradle runMain'? Also, for all these commands, can we format them so they appear differently than normal text?

4. Run input flow
1. mlcp.sh import -mode "local" -host "`Ingest/Flows endpoint`" -port "8006" -username "xx" -password "yy" -input_file_path "`path to DHS/input/json/customers/`" -input_file_type "documents" -output_collections "Customer,DHS" -output_permissions "rest-reader,read,rest-writer,update" -output_uri_replace "`path to DHS/input/json`,''" -document_type "json" -transform_module "/data-hub/4/transforms/mlcp-flow-transform.sjs" -transform_namespace "http://marklogic.com/data-hub/mlcp-flow-transform" -transform_param "entity-name=Customer,flow-name=customerInput" -restrict_hosts true

___Alternately you can run___

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
___Alternately you can run___
___Alternatively you can run___


___Alternately you can run___

2. gradle importAllCustomers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be #2 if it's not a sequential step...you can try adding a bullet underneath 1. instead

@bsrikan bsrikan force-pushed the develop branch 2 times, most recently from a29b8c5 to 593ee3d Compare December 5, 2018 23:35
@bsrikan bsrikan force-pushed the develop branch 4 times, most recently from 203898d to 4fab171 Compare December 20, 2018 17:52
@aebadirad
Copy link
Contributor

@bsrikan Are we good on this?

@bsrikan
Copy link
Contributor Author

bsrikan commented Dec 21, 2018

@aebadirad All ready to merge.

@aebadirad aebadirad merged commit af5dddd into Marklogic-retired:develop Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants