Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/473 msb example #480

Merged
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,7 @@ node_modules/
.tmp/
*.sublime-*
.vscode/
*.iml
*.ipr
*.iws
*.db
43 changes: 17 additions & 26 deletions examples/spring-batch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,23 @@ This example demonstrates how to run a custom Spring Batch job against the Data

Learning [Spring Batch](http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html) is beyond the scope of this README. But let's pretend you know enough to be dangerous.

Now you want to use Spring Batch to load a bunch of data into MarkLogic. Maybe that data is coming from a message queue. Maybe it's a bunch of files in a folder. It can be anything really.
Now you want to use Spring Batch to load a bunch from a relational database into MarkLogic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this "to load data from a relational database into MarkLogic."


## What's the Big Idea?
The idea is pretty simple. You read the data, do a little processing (maybe), and then write it into MarkLogic. But to properly integrate with the Data Hub Framework you will want to run your data through an [input flow](https://github.com/marklogic-community/marklogic-data-hub/wiki/The-MarkLogic-Data-Hub-Overview#ingest).
The idea is pretty simple. You read the data into a tabular format using a SQL query (SELECT * FROM TABLE), transform the row into an XML document, and then write it into MarkLogic. But to properly integrate with the Data Hub Framework you need to run your data through an [input flow](https://github.com/marklogic-community/marklogic-data-hub/wiki/The-MarkLogic-Data-Hub-Overview#ingest). The MarkLogic Spring Batch project provides an interface called the DataHubItemWriter that runs the appropriate input flow.

## How does it work?
This example includes a sample Spring Boot Configuration [LoadAndRunFlow.java](https://github.com/marklogic-community/marklogic-data-hub/blob/develop/examples/spring-batch/src/main/java/example/LoadAndRunFlow.java) that configures a job to ingest some xml files and run a flow.
This example includes a sample Spring Boot Configuration [SqlDbToHubJobConfig.java](https://github.com/marklogic-community/marklogic-data-hub/blob/develop/examples/spring-batch/src/main/java/com/marklogic/hub/job/SqlDbToHubJobConfig.java) that configures a job.

This example depends on a runtime class **com.marklogic.spring.batch.hub.HubJobRunner** that is responsible for reading command line parameters and connects to the Data Hub by reading your gradle project files.
This example depends on a properties file called job.properties. This project provides a sample job.properties file but you may need to change the host or port numbers for your environment.

## How do I Run this Example?

First you compile it.
`gradle ingestInvoices`

`gradle installDist`

Then you launch it.

`./run.sh`
This reads invoice, customer, item, and customer data from a relational database called H2. The data can be viewed in H2 by calling the following command.

`gradle runH2`

## How do I add this to my existing Data Hub Project?

Expand All @@ -42,26 +39,20 @@ dependencies {
// existing dependencies

// add this one:
compile "com.marklogic:marklogic-spring-batch-core:0.6.0"
compile "com.marklogic:marklogic-spring-batch-core:1.0.1"
}


// add the distributions section
distributions {
main {
baseName = 'baseJob'
}
task ingestInvoices(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
main = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"
args = [
"--job_path", "com.marklogic.hub.job.MigrateInvoicesConfiguration",
"--job_id", "job",
"--entity", "Invoice",
"--flow", "ingest-invoice-db"
]
}

// add the mainClassName to specify the HubJobRunner
mainClassName = "com.marklogic.spring.batch.hub.HubJobRunner"

```

Then drop in your custom Java Config class in src/main/java/.....

Next you simply Compile your code with `gradle installDist`.

Then you can run take a look at the [run.sh script](https://github.com/marklogic-community/marklogic-data-hub/blob/develop/examples/spring-batch/run.sh) to see how to run your custom config.

Note that this is not the only way to run it. It's merely the easiest. Java Ninjas can directly call the main() function of the [HubJobRunner class](https://github.com/marklogic-community/marklogic-data-hub/blob/develop/marklogic-data-hub/src/main/java/com/marklogic/spring/batch/hub/HubJobRunner.java). Or you can make your own class to start up Spring Batch by reading the HubJobRunner code and doing something similar.
53 changes: 38 additions & 15 deletions examples/spring-batch/build.gradle
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
plugins {
id 'java'
id 'application'
id 'idea'
id 'net.saliman.properties' version '1.4.6'
id 'com.marklogic.ml-data-hub' version '2.0.0-rc.1'
}
Expand All @@ -13,26 +14,48 @@ repositories {

dependencies {
compile 'com.marklogic:marklogic-data-hub:2.0.0-rc.1'
compile "com.marklogic:marklogic-spring-batch-core:0.7.4"
compile 'com.marklogic:ml-javaclient-util:4.0.alpha4'
compile "com.marklogic:marklogic-spring-batch-core:1.0.1"

testCompile "com.marklogic:marklogic-spring-batch-test:0.7.4"
runtime "com.marklogic:marklogic-spring-batch-core:0.7.2"
testCompile "com.h2database:h2:1.4.193"
testCompile "com.marklogic:marklogic-spring-batch-test:1.0.1"

runtime "com.h2database:h2:1.4.193"
runtime "ch.qos.logback:logback-classic:1.1.8"
runtime "org.slf4j:jcl-over-slf4j:1.7.22"
runtime "org.slf4j:slf4j-api:1.7.22"
}

mainClassName = "com.marklogic.spring.batch.Main"
mainClassName = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"

task importMonsters(type: JavaExec) {
task ingestInvoices(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
main = "com.marklogic.spring.batch.Main"
main = "com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner"
args = [
"--job_path", "com.marklogic.hub.job.MigrateInvoicesConfiguration",
"--job_id", "job",
"--entity", "Invoice",
"--flow", "ingest-invoice-db"
]
}

// This task is for running the examples
task runH2DataManager(type: JavaExec) {
classpath = configurations.runtime
main = "org.h2.tools.Console"
args = [
"--config", "example.LoadAndRunFlow",
"--project_dir", ".",
"--env", "local",
"--input_file_path", "./input",
"--input_file_pattern", ".*\\.xml",
"--entity_name", "Monster",
"--flow_name", "ingest-monster",
"--chunk", "100"
"-url", "jdbc:h2:file:./input/sample",
"-user", "sa"
]
}

task loadH2Data(type: JavaExec) {
classpath = configurations.runtime
main = "org.h2.tools.RunScript"
args = [
"-url", "jdbc:h2:file:./input/sample",
"-user", "sa",
"-script", "./src/test/resources/db/sampledata.sql"
]
}

ingestInvoices.dependsOn loadH2Data
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"path-namespace":[{"prefix":"es", "namespace-uri":"http://marklogic.com/entity-services"}], "element-word-lexicon":[], "range-path-index":[], "range-element-index":[]}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"path-namespace":[{"prefix":"es", "namespace-uri":"http://marklogic.com/entity-services"}], "element-word-lexicon":[], "range-path-index":[], "range-element-index":[]}
28 changes: 28 additions & 0 deletions examples/spring-batch/entity-config/entity-options.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<?xml version="1.0" encoding="UTF-8"?>
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="Collection">
<collection/>
</constraint>
<constraint name="entity-type" xmlns:search="http://marklogic.com/appservices/search">
<value>
<element ns="http://marklogic.com/entity-services" name="title"/>
</value>
</constraint>
<!--Uncomment to return no results for a blank search, rather than the default of all results
<term xmlns="http://marklogic.com/appservices/search">
<empty apply="no-results"/>
</term>
-->
<values name="uris">
<uri/>
</values>
<!--Change to 'filtered' to exclude false-positives in certain searches-->
<search-option>unfiltered</search-option>
<!--Modify document extraction to change results returned-->
<extract-document-data selected="include">
<extract-path>/*:envelope/*:instance/(Invoice)</extract-path>
</extract-document-data>
<return-facets>true</return-facets>
<!--To return snippets, comment out or remove this option-->
<transform-results apply="empty-snippet"/>
</options>
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#Thu Aug 03 23:06:41 EDT 2017
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.0-bin.zip
172 changes: 172 additions & 0 deletions examples/spring-batch/gradlew
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
#!/usr/bin/env sh

##############################################################################
##
## Gradle start up script for UN*X
##
##############################################################################

# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`"/$link"
fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/" >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null

APP_NAME="Gradle"
APP_BASE_NAME=`basename "$0"`

# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS=""

# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"

warn () {
echo "$*"
}

die () {
echo
echo "$*"
echo
exit 1
}

# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "`uname`" in
CYGWIN* )
cygwin=true
;;
Darwin* )
darwin=true
;;
MINGW* )
msys=true
;;
NONSTOP* )
nonstop=true
;;
esac

CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar

# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
else
JAVACMD="$JAVA_HOME/bin/java"
fi
if [ ! -x "$JAVACMD" ] ; then
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
else
JAVACMD="java"
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi

# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
MAX_FD_LIMIT=`ulimit -H -n`
if [ $? -eq 0 ] ; then
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
MAX_FD="$MAX_FD_LIMIT"
fi
ulimit -n $MAX_FD
if [ $? -ne 0 ] ; then
warn "Could not set maximum file descriptor limit: $MAX_FD"
fi
else
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
fi
fi

# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi

# For Cygwin, switch paths to Windows format before running java
if $cygwin ; then
APP_HOME=`cygpath --path --mixed "$APP_HOME"`
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
JAVACMD=`cygpath --unix "$JAVACMD"`

# We build the pattern for arguments to be converted via cygpath
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
SEP=""
for dir in $ROOTDIRSRAW ; do
ROOTDIRS="$ROOTDIRS$SEP$dir"
SEP="|"
done
OURCYGPATTERN="(^($ROOTDIRS))"
# Add a user-defined pattern to the cygpath arguments
if [ "$GRADLE_CYGPATTERN" != "" ] ; then
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
fi
# Now convert the arguments - kludge to limit ourselves to /bin/sh
i=0
for arg in "$@" ; do
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option

if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
else
eval `echo args$i`="\"$arg\""
fi
i=$((i+1))
done
case $i in
(0) set -- ;;
(1) set -- "$args0" ;;
(2) set -- "$args0" "$args1" ;;
(3) set -- "$args0" "$args1" "$args2" ;;
(4) set -- "$args0" "$args1" "$args2" "$args3" ;;
(5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
(6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
(7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
(8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
(9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
esac
fi

# Escape application args
save () {
for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
echo " "
}
APP_ARGS=$(save "$@")

# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"

# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong
if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then
cd "$(dirname "$0")"
fi

exec "$JAVACMD" "$@"
Loading