Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHFPROD-646 3.x documentation, Understanding #924

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _data/sidenav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ main:
text: 'Understanding DHF',
childPages: [
{ url: '/understanding/concepts/', text: 'Concepts' },
{ url: '/understanding/architecture/', text: 'Architecture' },
{ url: '/understanding/architecture/', text: 'Organization' },
{ url: '/understanding/how-it-works/', text: 'How It Works'},
{ url: '/understanding/flowtracing/', text: 'Flow Tracing'},
{ url: '/understanding/project-structure/', text: 'Project Structure'}
Expand Down
16 changes: 8 additions & 8 deletions _pages/understanding/architecture.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
---
layout: inner
title: Data Hub Framework Architecture
title: Data Hub Framework Organization
permalink: /understanding/architecture/
---

# At a Glance
The Data Hub Framework (DHF) consists of three tools:
1. [QuickStart User Interface](#quickstart-user-interface)
1. [Gradle plugin: ml-data-hub](#gradle-plugin-ml-data-hub)
1. [DHF Java library](#dhf-java-library)
1. [Gradle Plugin: ml-data-hub](#gradle-plugin-ml-data-hub)
1. [DHF Java Library](#dhf-java-library)

### Which one should I use?

**Just Getting Started** - If you are brand new to the DHF then we recommend you start with the QuickStart UI. This is the easiest way to get up and running because you don't need to install dependencies (except for an Oracle Java 8 runtime).
**Just Getting Started** - If you are brand new to DHF then we recommend you start with the QuickStart UI. This is the easiest way to get up and running because you don't need to install dependencies (except for Oracle Java 8).

**Command Line Ninjas** - If you fancy yourself a command line ninja then you may want to start with the ml-data-hub Gradle plugin. This approach is the second easiest approach and you can be up and running in seconds if you already have gradle installed.

**Production and Beyond** - If you are running in production then you will definitely want to be using the ml-data-hub Gradle plugin or the Jar file via your own custom Java code. You may still use the Quickstart for development tasks, but to run your harmonize flows you will want to go ninja.
**Production and Beyond** - If you are running in production then you will definitely want to be using the ml-data-hub Gradle plugin or the JAR file via your own custom Java code. You may still use the Quickstart for development tasks, but to run your harmonize flows you will want to go ninja.

### QuickStart User Interface
The Quickstart UI is a visual development tool. It's great for showing off the DHF functionality. The QuickStart is meant for development and not for running in production. Think of it as a code editor and scaffold generator, not something that runs your enterprise. QuickStart is the easiest way to get started using the DHF.
The Quickstart UI is a visual development tool. It's great for showing off DHF functionality. QuickStart is meant for development and not for running in production. Think of it as a code editor and scaffold generator, not something that runs your enterprise. QuickStart is the easiest way to get started using DHF.

### Gradle Plugin: ml-data-hub
This Gradle plugin allows you to interact with the DHF Java Library from a command line. The plugin runs inside of Gradle and inherits functionality from the ml-gradle project.

Everything you need to do with the DHF you can do via the ml-data-hub gradle plugin.
Everything you need to do with DHF you can do via the ml-data-hub gradle plugin.

### DHF Java Library
The Data Hub Framework Java Library is the core of the DHF. This library handles the MarkLogic setup and deploy as well as the running of Harmonization flows.
The Data Hub Framework Java Library is the core of DHF. This library handles the MarkLogic setup and deploy as well as the running of harmonization flows.
8 changes: 4 additions & 4 deletions _pages/understanding/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ The following diagram illustrates the architectural view of an ODH.

### An ODH performs four key functions:

1. **Ingest** - Load Data from upstream system
1. **Ingest** - Load data from upstream system
1. **Govern** - Provides trust about your data. Where did it come from? Is the data valid?
1. **Harmonize** - Harmonize the incoming Data into consistent, usable formats
1. **Harmonize** - Harmonize the incoming data into consistent, usable formats
1. **Serve** - Serve the harmonized data to other systems

### Ingest
First thing is first. Load all of your data into MarkLogic... every last bit. Upon ingest, data is stored in a staging area. During the ingest phase you can enhance your data with extra metadata like provenance. _Where did this data come from and when did it get ingested?_ See our [Ingest page](../ingest/ingest.md) for more details on ingesting data.
First thing is first. Load all of your data into MarkLogic... every last bit. Upon ingest, data is stored in a staging area. During the ingest phase you can enhance your data with extra metadata like provenance. _Where did this data come from and when did it get ingested?_ See our [ingest page](../ingest/ingest.md) for more details on ingesting data.

### Govern
In order to trust your data you need to know where it came from, how it maps to the sources, how and when it was transformed, if there were errors on ingest or harmonize, and if the data is valid.
Expand All @@ -37,7 +37,7 @@ Harmonization is the process of creating a canonical model of your data using on
- Enrich data with additional information
- Extract important data into indexes for faster searching
- Leverage semantic triples to enrich your data
- Denormalizing multiple data sources into one document
- Denormalize multiple data sources into one document

While not all of these are explicitly "harmonization" tasks, they do tend to happen during this phase.

Expand Down
49 changes: 25 additions & 24 deletions _pages/understanding/flowtracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,64 +5,65 @@ lead_text: ''
permalink: /understanding/flowtracing/
---

# Flow Tracing Overview
Flow Tracing produces a detailed view of the flows as they happened. For each plugin in a flow the inputs and outputs to that plugin are recorded into the `Traces` database. Flow Tracing is great for debugging your flows because you can see exactly what came in and went out of each step. You can use Flow Tracing to track down bugs in your flows as well as performance issues.
<!--- DHFPROD-646 TODO since this is primarily a debugging thing, does it make more sense under Using the DHF than under Concepts? -->

Flow Tracing can be enabled and disabled as needed. We recommend you disable Flow Tracing in production as there is a performance penalty for writing the additional trace information. Any uncaught exception will always result in a Flow Tracing event, regardless of whether tracing is currently enabled.
Flow tracing produces a detailed view of the flows as they happened. For each plugin in a flow, the inputs and outputs to that plugin are recorded into the Traces database. Flow tracing is great for debugging your flows because you can see exactly what came in and went out of each step. You can use flow tracing to track down bugs in your flows as well as performance issues.

Flow Tracing can be viewed with several UIs, described below.
Flow tracing can be enabled and disabled as needed. We recommend you disable flow tracing in production as there is a performance penalty for writing the additional trace information. Any uncaught exception will always result in a flow tracing event, regardless of whether tracing is currently enabled.

# Controlling Flow Tracing Events
A flag in the Modules database controls whether flow tracing is turned on. There are two ways to enable and disable Flow Tracing: using a gradle task, or via the Quickstart UI.
Flow tracing can be viewed with several UIs, described below.

## Enabling and Disabling via Gradle
## Controlling Flow Tracing Events
A flag in the Modules database controls whether flow tracing is turned on. There are two ways to enable and disable flow tracing: using a gradle task, or via the QuickStart UI.

### Enabling and Disabling via Gradle
In the directory where the project framework code lives, run these commands to enable/disable flow tracing:

### Enable
#### Enable
{% include ostabs.html linux="./gradlew hubEnableTracing" windows="gradlew.bat hubEnableTracing" %}

### Disable:
#### Disable
{% include ostabs.html linux="./gradlew hubDisableTracing" windows="gradlew.bat hubDisableTracing" %}

## Enabling and Disabling via the Quickstart UI
Navigate to the Settings screen in Quickstart. On that page, you will find a selector to enable and disable Flow Tracing.
### Enabling and Disabling via the Quickstart UI
Navigate to the Settings screen in QuickStart. On that page, you will find a selector to enable and disable flow tracing.

![Quickstart enable disable]({{site.baseurl}}/images/traces/FlowTracingEnableDisableViaQuickStart.png)

# Flow Tracing Database
All Flow Tracing events are stored to a separate database created when you initialized your project. By default, the database is called _your-project-name_-TRACING. An Application server is created that is associated with this database, which provides a UI you can use to view the trace events. The default port for this Application server is 8012.
## Flow Tracing Database
All flow tracing events are stored to a separate database created when you initialized your project. By default, the database is called _your-project-name_-TRACING. An application server is created that is associated with this database, which provides a UI you can use to view the trace events. The default port for this application server is 8012.

# Viewing Flow Tracing
## Viewing with Quickstart
You can view Flow Tracing events with Quickstart.
## Viewing Flow Tracing
### Viewing with QuickStart
You can view flow tracing events with QuickStart.

From the main Quickstart Dashboard, select Traces.
From the main QuickStart dashboard, select Traces.

![Displaying traces Quickstart 1]({{site.baseurl}}/images/traces/DisplayingTracingInQuickstartScreen1.png)

This will show a list of all events currently in the database. Note that you can search the text of the Trace events via the search bar. All text in the trace events is indexed and searchable.
This will show a list of all events currently in the database. Note that you can search the text of the trace events via the search bar. All text in the trace events is indexed and searchable.

![Displaying traces Quickstart 2]({{site.baseurl}}/images/traces/DisplayingTracingInQuickstartScreen2.png)

Selecting a single trace event will display the detailed flow.

![Displaying single trace Quickstart]({{site.baseurl}}/images/traces/DisplayingSingleTraceInQuickstart.png)

## Viewing with Flow Tracing Viewer
You can also view Flow Tracing events with a Trace Viewer provided in the application server associated with the TRACING database (by default installed on port 8012). This UI is installed into MarkLogic and you do not need a separate tool to view it.
### Viewing with the Trace Viewer
You can also view flow tracing events with a Trace Viewer provided in the application server associated with the TRACING database (by default installed on port 8012). This UI is installed into MarkLogic and you do not need a separate tool to view it.

Navigate your browser to the port running the `TRACES` Application server, by default on port 8012. You will be presented with the dedicated Trace Viewer application.
Navigate your browser to the port running the `TRACES` application server, by default on port 8012. You will be presented with the dedicated Trace Viewer application.

![Displaying all traces dedicated]({{site.baseurl}}/images/traces/DisplayingTracingInDedicatedApp.png)

Selecting a single tracing event will display the detailed flow.

![Displaying single trace dedicated]({{site.baseurl}}/images/traces/DisplayingSingleTraceInDedicatedApp.png)

# Cleaning up Traces
## Cleaning up Traces
You can delete traces by deleting the job that created them. To do so, go to the Jobs page, click the checkboxes for the Jobs you wish to delete, click `ACTION`, then select "Delete Jobs and Traces". After confirming, the selected jobs and the associated traces will be removed from the Jobs and Traces databases.

![Displaying deletion of a job]({{site.baseurl}}/images/traces/DeleteJobs.png)

# Exporting TRACES
You can export jobs and traces associated with those jobs. Go to the Jobs page, click the checkboxes for the Jobs you wish to export, click `ACTION`, then select "Export Jobs and Traces". After confirming, the selected jobs and their associated traces will be exported to a zip file, which your browser will download. This feature is generally used to help communicate with MarkLogic's Support team.
## Exporting TRACES
You can export jobs and traces associated with those jobs. Go to the Jobs page, click the checkboxes for the Jobs you wish to export, click `ACTION`, then select "Export Jobs and Traces". After confirming, the selected jobs and their associated traces will be exported to a zip file, which your browser will download. This feature is generally used to help communicate with MarkLogic's support team.
Loading