-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka-value-parser #29
Conversation
Codecov Report
@@ Coverage Diff @@
## master #29 +/- ##
============================================
- Coverage 28.24% 28.04% -0.21%
Complexity 6 6
============================================
Files 24 24
Lines 2078 2111 +33
Branches 388 396 +8
============================================
+ Hits 587 592 +5
- Misses 1389 1412 +23
- Partials 102 107 +5
Continue to review full report at Codecov.
|
class KafkaReader(override val session: SparkSession, kafkaConfig: KafkaSourceConfigEntry) | ||
class KafkaReader(override val session: SparkSession, | ||
kafkaConfig: KafkaSourceConfigEntry, | ||
fields: List[String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please distinct the fields before using it.
.selectExpr("CAST(value AS STRING)") | ||
.as[(String)] | ||
.withColumn("value", from_json(col("value"), jsonSchema)) | ||
.select("value.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need alias the dataframe's column name to name in fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had ever print the col name which is the name of fields. It's workable in my machine..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i tested it and the schema is the same with fields. Great work~
@@ -172,7 +173,8 @@ object Exchange { | |||
LOG.info(s"field keys: ${fieldKeys.mkString(", ")}") | |||
val nebulaKeys = edgeConfig.nebulaFields | |||
LOG.info(s"nebula keys: ${nebulaKeys.mkString(", ")}") | |||
val data = createDataSource(spark, edgeConfig.dataSourceConfigEntry) | |||
val fields = edgeConfig.sourceField::edgeConfig.targetField::edgeConfig.fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edgeConfig.rankField should also be added.
batch: 10 | ||
interval.seconds: 10 | ||
} | ||
# { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please roll back these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new check is added in the config parse that an exception would be throw if any other config define after kafka, see Config.scala. However, there is two kafka defined in the application.conf. If I don't comment this section, the test would not pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll split the config of Kafka out and use a single config file for Kafka later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR ~
issue #8
import data from kafka to nebula