Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#622 - update to MarkLogic Spring Batch 1.7.0 #625

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/spring-batch/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
build/
.gradle
.tmp
build/
.gradle
.tmp
87 changes: 19 additions & 68 deletions examples/spring-batch/README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,19 @@
# Spring Batch Job using the Data Hub

This example demonstrates how to run a custom Spring Batch job against the Data Hub Framework.

Learning [Spring Batch](http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html) is beyond the scope of this README. But let's pretend you know enough to be dangerous.

This example loads data from the [Invoice Database](./invoices-sql-diagram.jpg) to use Spring Batch to load from a relational database into MarkLogic.

## What's the Big Idea?
The idea is pretty simple. You read the data into a tabular format using a SQL query (SELECT * FROM TABLE), transform the row into an XML document, and then write it into MarkLogic. But to properly integrate with the Data Hub Framework you need to run your data through an [input flow](https://github.com/marklogic-community/marklogic-data-hub/wiki/The-MarkLogic-Data-Hub-Overview#ingest). The MarkLogic Spring Batch project provides an interface called the DataHubItemWriter that runs the appropriate input flow.

## How does it work?
This example includes a sample Spring Batch Configuration [SqlDbToHubJobConfig.java](https://github.com/marklogic-community/marklogic-data-hub/blob/develop/examples/spring-batch/src/main/java/com/marklogic/hub/job/SqlDbToHubJobConfig.java) that configures a job. To execute the job, we are utilizing the CommandLineJobRunner from the MarkLogic Spring Batch project.

This example depends on a properties file called job.properties. This project provides a sample job.properties file but you may need to change the host or port numbers for your environment.

## How do I Run this Example?

1. [Deploy the MarkLogic Job Repository](https://github.com/marklogic-community/marklogic-spring-batch/wiki/MarkLogicJobRepository). When the Spring Batch application starts it needs to persist the job into MarkLogic.
1. Modify ./gradle.properties to meet your needs.
1. Deploy the Data Hub Framework `./gradlew mlDeploy`
1. Deploy Hub Framework Modules `./gradlew mlLoadModules`
1. Modify job.properties to point to your staging database and the job repo
1. Execute the job with the following gradle command. `./gradlew ingestInvoices` This reads invoice, customer, item, and customer data from a relational database called H2 and ingests it into MarkLogic. During ingest the data is passed through the Invoice:ingest-invoice-db input flow.

### Optional
If you want to view the SQL data in H2 then you can run the following command:

```
./gradlew runH2
```

This command should launch your web browser. If the IP address in the address bar is not loading, replace it with localhost.

## How do I add this to my existing Data Hub Project?

This is an example where the Data Hub Project artifacts have already been initialized. If you are wanting to add this ability to existing Data Hub Projects then you simply need to modify the build.gradle file.

```gradle
plugins {
// existing ids...
...

// add application
id 'application'
}

dependencies {
// existing dependencies

// add this one:
compile "com.marklogic:marklogic-spring-batch-core:1.0.1"
}

task ingestInvoices(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
main = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"
args = [
"--job_path", "com.marklogic.hub.job.MigrateInvoicesConfiguration",
"--job_id", "job",
"--entity", "Invoice",
"--flow", "ingest-invoice-db"
]
}


```

# What does this example do?

This example demonstrates how to execute a simple DataHub input flow using the Spring Batch Framework. It provides an alternative from MarkLogic Content Pump.

Check out [Spring Batch](http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html) to learn more on the framework.

This example loads relational data from an [Invoice Database](./invoices-sql-diagram.jpg), transforms each row into an XML document and loads each XML document into the input flow of a MarkLogic Data Hub.

## How do I run this example?

1. [Deploy the MarkLogic Job Repository](https://github.com/marklogic-community/marklogic-spring-batch/wiki/MarkLogicJobRepository). When the application starts it needs to persist the job metadata into MarkLogic. Note that this metadata is different than datahub job metadata.
1. Start up the Data Hub Quick Start, deploy the project from the examples/spring-batch directory. Verify that the Customer entity and customer-flow exist.
1. Verify the property values in src/test/resources/job.properties for your environment.
1. Verify the hosts for the STAGING database and mlJobRepo
1. Verify the destination port where the data will be written - marklogic.port, this should match the data-hub-STAGING app server.
1. Make sure that the port specified in Step 1 for your MarkLogic JobRepository is the same as the marklogic.jobrepo.port property.
1. Execute the job with the following gradle command. `./gradlew ingestCustomers` This task will deploy the H2 database and kick off the input-flow job
1. Browse the data in the STAGING database in the QuickStart.

124 changes: 62 additions & 62 deletions examples/spring-batch/build.gradle
Original file line number Diff line number Diff line change
@@ -1,62 +1,62 @@
plugins {
id 'java'
id 'application'
id 'idea'
id 'net.saliman.properties' version '1.4.6'
id 'com.marklogic.ml-data-hub' version '2.0.4'
}
repositories {
mavenLocal()
jcenter()
maven {url 'http://developer.marklogic.com/maven2/'}
}
dependencies {
compile 'com.marklogic:marklogic-data-hub:2.0.4'
compile "com.marklogic:marklogic-spring-batch-core:1.0.1"
testCompile "com.h2database:h2:1.4.193"
testCompile "com.marklogic:marklogic-spring-batch-test:1.0.1"
runtime "com.h2database:h2:1.4.193"
runtime "ch.qos.logback:logback-classic:1.1.8"
runtime "org.slf4j:jcl-over-slf4j:1.7.22"
runtime "org.slf4j:slf4j-api:1.7.22"
}
mainClassName = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"
task ingestInvoices(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
main = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"
args = [
"--job_path", "com.marklogic.hub.job.MigrateInvoicesConfiguration",
"--job_id", "job",
"--entity", "Invoice",
"--flow", "ingest-invoice-db",
"--hubJobId", UUID.randomUUID().toString()
]
}
// This task is for running the examples
task runH2DataManager(type: JavaExec) {
classpath = configurations.runtime
main = "org.h2.tools.Console"
args = [
"-url", "jdbc:h2:file:./input/sample",
"-user", "sa"
]
}
task loadH2Data(type: JavaExec) {
classpath = configurations.runtime
main = "org.h2.tools.RunScript"
args = [
"-url", "jdbc:h2:file:./input/sample",
"-user", "sa",
"-script", "./src/test/resources/db/sampledata.sql"
]
}
ingestInvoices.dependsOn loadH2Data
plugins {
id 'java'
id 'application'
id 'idea'
id 'net.saliman.properties' version '1.4.6'
id 'com.marklogic.ml-data-hub' version '2.0.4'
}

repositories {
jcenter()
maven {url 'http://developer.marklogic.com/maven2/'}
}

dependencies {
compile 'com.marklogic:marklogic-data-hub:2.0.4'
compile "com.marklogic:marklogic-spring-batch-core:1.7.0"
compile 'com.marklogic:spring-batch-rdbms:1.7.0'

testCompile "com.h2database:h2:1.4.193"
testCompile "com.marklogic:marklogic-spring-batch-test:1.7.0"

runtime "com.h2database:h2:1.4.193"
runtime "ch.qos.logback:logback-classic:1.1.8"
runtime "org.slf4j:jcl-over-slf4j:1.7.22"
runtime "org.slf4j:slf4j-api:1.7.22"
}

mainClassName = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"

task runInputCustomersFlow(type: JavaExec) {
classpath = sourceSets.test.runtimeClasspath
main = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"
args = [
"--job_path", "com.marklogic.hub.job.SqlDbToHubJobConfig",
"--job_id", "sqlDbToHubJob",
"--entity", "customer",
"--flow", "customerInput",
"--hubJobId", UUID.randomUUID().toString()
]
}

// This task is for running the examples
task runH2DataManager(type: JavaExec) {
classpath = configurations.runtime
main = "org.h2.tools.Console"
args = [
"-url", "jdbc:h2:file:./input/sample",
"-user", "sa"
]
}

task loadH2Data(type: JavaExec) {
classpath = configurations.runtime
main = "org.h2.tools.RunScript"
args = [
"-url", "jdbc:h2:file:./input/sample",
"-user", "sa",
"-script", "./src/test/resources/db/sampledata.sql"
]
}

runInputCustomersFlow.dependsOn loadH2Data

This file was deleted.

This file was deleted.

28 changes: 0 additions & 28 deletions examples/spring-batch/entity-config/entity-options.xml

This file was deleted.

72 changes: 0 additions & 72 deletions examples/spring-batch/gradle.properties

This file was deleted.

12 changes: 6 additions & 6 deletions examples/spring-batch/gradle/wrapper/gradle-wrapper.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#Thu Aug 03 23:06:41 EDT 2017
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.0-bin.zip
#Thu Aug 03 23:06:41 EDT 2017
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.0-bin.zip
Loading