Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aws-POC merge to develop #1632

Merged
merged 38 commits into from
Jan 29, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
18cb47f
1422 and 1423 Remove HDFS and Oozie from Menas
Zejnilovic Jul 2, 2020
dcde55e
#1422 Fix HDFS location validation
Zejnilovic Jul 13, 2020
a817cfe
#1424 Add Menas Dockerfile
Zejnilovic Jul 13, 2020
e4432d1
Merge branch 'develop' into aws-poc
dk1844 Aug 13, 2020
ff10529
Merge pull request #1484 from AbsaOSS/aws-poc-update-from-develop-2.1…
dk1844 Aug 13, 2020
60d2948
#1416 hadoop-aws 2.8.5 + s3 aws sdk 2.13.65 compiles.
dk1844 Aug 5, 2020
d41cc7e
#1416 - enceladus on S3:
dk1844 Aug 12, 2020
67e4012
#1416 - enceladus on S3 - (crude) conformance works on s3 (s3 std inp…
dk1844 Aug 12, 2020
ac0785d
ref issue = 1416
dk1844 Aug 13, 2020
a8d53f9
related test cases ignored (issue reference added)
dk1844 Aug 13, 2020
85570d7
PR updates
dk1844 Aug 17, 2020
7be3afb
Merge spline 0.5.3 into aws-poc
Zejnilovic Aug 19, 2020
aa69593
Update spline to 0.5.4 for AWS PoC
Zejnilovic Aug 24, 2020
0bd704c
Merge branch 'aws-poc' into feature/1416-aws-emr-poc
dk1844 Aug 24, 2020
0b60b1a
Merge pull request #1483 from AbsaOSS/feature/1416-aws-emr-poc
dk1844 Aug 24, 2020
0450459
#1503 Remove HDFS url Validation
Zejnilovic Aug 27, 2020
2b3c39c
New dockerfile - smaller image
Zejnilovic Aug 27, 2020
5dfc40f
s3 persistence (atum, sdk fs usage, ...) (#1526)
dk1844 Oct 16, 2020
5b5628e
Feature/1556 file access PoC using Hadoop FS API (#1586)
dk1844 Nov 9, 2020
4598946
1554 Tomcat with TLS in Docker container (#1585)
AdrianOlosutean Nov 13, 2020
a9efecd
#1499 Add authentication to /lineage + update spline to 0.5.5
Adrian-Olosutean Nov 23, 2020
5da8e36
Merge pull request #1606 from AbsaOSS/feature/1499-spline-in-menas-aws
lokm01 Nov 24, 2020
34aefbb
#1618 - fixes failing spline 0.5.5 integration by providing compatibl…
dk1844 Dec 14, 2020
b4debed
Merge branch 'aws-poc' into aws-merge-to-develop
Zejnilovic Dec 22, 2020
c6094e1
WIP fixing merge issues
Zejnilovic Dec 22, 2020
9d39ca0
* Merge compiles
benedeki Dec 29, 2020
9e277b2
* put back HDFS browser
benedeki Jan 7, 2021
4689308
* AWS SDK Exclusion
benedeki Jan 8, 2021
0d95c42
* New ATUM version
benedeki Jan 11, 2021
267a457
* Adding missing files
benedeki Jan 11, 2021
105904b
Merge branch 'develop' into aws-merge-to-develop
benedeki Jan 11, 2021
8702f88
1622: Merge of aws-poc to develop brach
benedeki Jan 20, 2021
7999644
Merge branch 'develop' into aws-merge-to-develop
benedeki Jan 20, 2021
2391788
* comments improvement
benedeki Jan 22, 2021
3050cbc
1434 Add new way of serving properties to Docker
Zejnilovic Jan 25, 2021
88bcd9a
Merge branch 'develop' into aws-merge-to-develop
benedeki Jan 28, 2021
d0af27b
* Scopt 4.0.0
benedeki Jan 29, 2021
8b0634d
Merge branch 'aws-merge-to-develop' of https://github.com/AbsaOSS/enc…
benedeki Jan 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,11 @@ The coverage reports are written in each module's `target` directory and aggrega
#### Menas requirements:
- [**Tomcat 8.5/9.0** installation](https://tomcat.apache.org/download-90.cgi)
- [**MongoDB 4.0** installation](https://docs.mongodb.com/manual/administration/install-community/)
- [**Spline service deployment**](https://absaoss.github.io/spline/#get-spline)
- [**Spline UI deployment**](https://absaoss.github.io/spline/) - place the [spline.war](https://search.maven.org/remotecontent?filepath=za/co/absa/spline/spline-web/0.3.9/spline-web-0.3.9.war)
in your Tomcat webapps directory (rename after downloading to _spline.war_); NB! don't forget to set up the `spline.mongodb.url` configuration for the _war_
- **HADOOP_CONF_DIR** environment variable, pointing to the location of your hadoop configuration (pointing to a hadoop installation)

The _Spline service_ can be omitted; in such case the **Standardization** and **Conformance** `spline.producer.url` setting
as well as **Menas** `menas.lineage.readApiUrl` and `menas.oozie.lineageWriteApiUrl` settings should be all set to empty string.
The _Spline UI_ can be omitted; in such case the **Menas** `spline.urlTemplate` setting should be set to empty string.

#### Deploying Menas
Simply copy the **menas.war** file produced when building the project into Tomcat's webapps directory.
Expand Down Expand Up @@ -106,7 +106,7 @@ password=changeme
--deploy-mode <client/cluster> \
--driver-cores <num> \
--driver-memory <num>G \
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_for_standardized_output>-{0}-{1}-{2}-{3} -Dspline.producer.url=<url_for_spline_consumer> -Dhdp.version=<hadoop_version>" \
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_for_standardized_output>-{0}-{1}-{2}-{3} -Dspline.mongodb.url=<mongo_url_for_spline> -Dspline.mongodb.name=<spline_database_name> -Dhdp.version=<hadoop_version>" \
--class za.co.absa.enceladus.standardization.StandardizationJob \
<spark-jobs_<build_version>.jar> \
--menas-auth-keytab <path_to_keytab_file> \
Expand All @@ -130,7 +130,7 @@ password=changeme
--driver-cores <num> \
--driver-memory <num>G \
--conf 'spark.ui.port=29000' \
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_of_standardized_input>-{0}-{1}-{2}-{3} -Dconformance.mappingtable.pattern=reportDate={0}-{1}-{2} -Dspline.producer.url=<url_for_spline_consumer> -Dhdp.version=<hadoop_version>" \
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_of_standardized_input>-{0}-{1}-{2}-{3} -Dconformance.mappingtable.pattern=reportDate={0}-{1}-{2} -Dspline.mongodb.url=<mongo_url_for_spline> -Dspline.mongodb.name=<spline_database_name>" -Dhdp.version=<hadoop_version> \
--packages za.co.absa:enceladus-parent:<version>,za.co.absa:enceladus-conformance:<version> \
--class za.co.absa.enceladus.conformance.DynamicConformanceJob \
<spark-jobs_<build_version>.jar> \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,9 @@ import za.co.absa.enceladus.dao.auth.MenasKerberosCredentials
import za.co.absa.enceladus.dao.rest.RestDaoFactory
import za.co.absa.enceladus.examples.interpreter.rules.custom.UppercaseCustomConformanceRule
import za.co.absa.enceladus.model.Dataset
import za.co.absa.enceladus.utils.testUtils.HadoopFsTestBase
import za.co.absa.enceladus.utils.time.TimeZoneNormalizer

object CustomRuleSample1 extends HadoopFsTestBase {
object CustomRuleSample1 extends CustomRuleSampleFs {

case class ExampleRow(id: Int, makeUpper: String, leave: String)
case class OutputRow(id: Int, makeUpper: String, leave: String, doneUpper: String)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,9 @@ import za.co.absa.enceladus.dao.auth.MenasKerberosCredentials
import za.co.absa.enceladus.dao.rest.{MenasConnectionStringParser, RestDaoFactory}
import za.co.absa.enceladus.examples.interpreter.rules.custom.LPadCustomConformanceRule
import za.co.absa.enceladus.model.Dataset
import za.co.absa.enceladus.utils.testUtils.HadoopFsTestBase
import za.co.absa.enceladus.utils.time.TimeZoneNormalizer

object CustomRuleSample2 extends HadoopFsTestBase {
object CustomRuleSample2 extends CustomRuleSampleFs {

case class ExampleRow(id: Int, addPad: String, leave: String)
case class OutputRow(id: Int, addPad: String, leave: String, donePad: String)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,9 @@ import za.co.absa.enceladus.dao.auth.MenasKerberosCredentials
import za.co.absa.enceladus.dao.rest.{MenasConnectionStringParser, RestDaoFactory}
import za.co.absa.enceladus.examples.interpreter.rules.custom.{LPadCustomConformanceRule, UppercaseCustomConformanceRule}
import za.co.absa.enceladus.model.Dataset
import za.co.absa.enceladus.utils.testUtils.HadoopFsTestBase
import za.co.absa.enceladus.utils.time.TimeZoneNormalizer

object CustomRuleSample3 extends HadoopFsTestBase {
object CustomRuleSample3 extends CustomRuleSampleFs {
implicit val spark: SparkSession = SparkSession.builder
.master("local[*]")
.appName("CustomRuleSample3")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,9 @@ import za.co.absa.enceladus.dao.auth.MenasKerberosCredentials
import za.co.absa.enceladus.dao.rest.{MenasConnectionStringParser, RestDaoFactory}
import za.co.absa.enceladus.examples.interpreter.rules.custom.{LPadCustomConformanceRule, UppercaseCustomConformanceRule}
import za.co.absa.enceladus.model.Dataset
import za.co.absa.enceladus.utils.testUtils.HadoopFsTestBase
import za.co.absa.enceladus.utils.time.TimeZoneNormalizer

object CustomRuleSample4 extends HadoopFsTestBase {
object CustomRuleSample4 extends CustomRuleSampleFs {
TimeZoneNormalizer.normalizeJVMTimeZone() //normalize JVM time zone as soon as possible

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright 2018 ABSA Group Limited
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package za.co.absa.enceladus.examples

import org.apache.hadoop.fs.FileSystem
import org.apache.spark.sql.SparkSession
import za.co.absa.enceladus.utils.fs.HadoopFsUtils

trait CustomRuleSampleFs {
def spark: SparkSession

implicit val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration)
implicit val fsUtils: HadoopFsUtils = HadoopFsUtils.getOrCreate(fs)
}
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,6 @@ class WebSecurityConfig @Autowired()(beanFactory: BeanFactory,
.and()
.addFilterBefore(kerberosFilter, classOf[UsernamePasswordAuthenticationFilter])
.addFilterAfter(jwtAuthFilter, classOf[SpnegoAuthenticationProcessingFilter])
.headers()
.frameOptions()
.sameOrigin()
}

@Bean
Expand Down
14 changes: 7 additions & 7 deletions menas/ui/css/style.css
Original file line number Diff line number Diff line change
Expand Up @@ -44,25 +44,25 @@ html, body {
}

.monitoringTitle {
textAlign: Center;
text-align: Center;
width: 100%;
}

.monitoringRecordspopover {
.monitoringRecordsPopover {
top: 5rem !important;
left: calc(50% - 18rem) !important;
}

.auditTrailNestedListItem {
max-width: 100%;
background-color: inherit !important;
border-bottom: 0px !important;
border-bottom: 0 !important;
}

.auditTrailNestedList > ul > .sapMListNoData {
background-color: inherit !important;
border-bottom: 0px !important;
padding-left: 0px !important;
border-bottom: 0 !important;
padding-left: 0 !important;
}

.lineageIframe {
Expand All @@ -72,8 +72,8 @@ html, body {

.auditTrailNestedList > ul > .sapMListNoData {
background-color: inherit !important;
border-bottom: 0px !important;
padding-left: 0px !important;
border-bottom: 0 !important;
padding-left: 0 !important;
}

.lineageIframe {
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@
<scala.version>2.11.12</scala.version>
<scalatest.maven.version>2.0.0</scalatest.maven.version>
<scalatest.version>3.2.2</scalatest.version>
<scopt.version>4.0.0-RC2</scopt.version>
<scopt.version>4.0.0</scopt.version>
<spark.compat.version>2.4</spark.compat.version>
<spark.hats.version>0.2.1</spark.hats.version>
<spark.hofs.version>0.4.0</spark.hofs.version>
Expand Down
5 changes: 3 additions & 2 deletions scripts/bash/enceladus_env.template.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@
# Environment configuration
STD_HDFS_PATH="/bigdata/std/std-{0}-{1}-{2}-{3}"

# Configuration for Spline
# MongoDB connection configuration for Spline
# Important! Special characters should be escaped using triple backslashes (\\\)
SPLINE_PRODUCER_URL="http://localhost:8080/spline/producer"
SPLINE_MONGODB_URL="mongodb://localhost:27017"
SPLINE_MONGODB_NAME="spline"

export SPARK_HOME="/opt/spark-2.4.4"
SPARK_SUBMIT="$SPARK_HOME/bin/spark-submit"
Expand Down
2 changes: 1 addition & 1 deletion scripts/bash/run_enceladus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ if [ "$DRA_ENABLED" = true ] ; then
fi

JVM_CONF="spark.driver.extraJavaOptions=-Dstandardized.hdfs.path=$STD_HDFS_PATH \
-Dspline.producer.url=$SPLINE_PRODUCER_URL -Dhdp.version=$HDP_VERSION \
-Dspline.mongodb.url=$SPLINE_MONGODB_URL -Dspline.mongodb.name=$SPLINE_MONGODB_NAME -Dhdp.version=$HDP_VERSION \
$MT_PATTERN"

if [ "$HELP_CALL" == "1" ]; then
Expand Down
22 changes: 14 additions & 8 deletions spark-jobs/src/main/resources/spline.properties.template
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
Expand All @@ -13,12 +12,19 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Spline mode - the way how Spline is integrated. For details see Spline documentation
# possible values (default is BEST_EFFORT):
# DISABLED - no Spline integration (no lineage will be recorded)
# REQUIRED - Spline service has to be running on the spline.producer.url address; if not, job exits without execution
# BEST_EFFORT - job tries to connect to the provided Spline service (spline.producer.url address); but if that fails, job will still execute
spline.mode=BEST_EFFORT

#
# Spline properties template.
# Uncomment the following lines to override corresponding Hadoop environment configuration properties.
#
# Set of properties for setting up persistence to MongoDB.
#
spline.persistence.factory=za.co.absa.spline.persistence.api.composition.ParallelCompositeFactory
spline.persistence.composition.factories=za.co.absa.spline.persistence.mongo.MongoPersistenceFactory,za.co.absa.spline.persistence.hdfs.HdfsPersistenceFactory

spline.mongodb.url=mongodb://localhost:27017
spline.mongodb.name=spline

#
spline.producer.url=http://localhost:8080/spline/producer
# A property for setting up persistence to Apache Atlas. Additional properties defining connectivity to Atlas are required to be part of this configuration file. (see Atlas configuration file)
# spline.persistence.factory=za.co.absa.spline.persistence.atlas.AtlasPersistenceFactory