-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve][Connector-V2][Kafka] Support to specify multiple partition keys #3230
Conversation
@@ -51,7 +51,7 @@ In AT_LEAST_ONCE, producer will wait for all outstanding messages in the Kafka b | |||
|
|||
NON does not provide any guarantees: messages may be lost in case of issues on the Kafka broker and messages may be duplicated. | |||
|
|||
### partition_key [string] | |||
### partition_key [array] | |||
|
|||
Configure which field is used as the key of the kafka message. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add changed logs
reference https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/connector-v2/source/HdfsFile.md
} | ||
SeaTunnelRowType keySeaTunnelRowType = new SeaTunnelRowType(keyFieldNames, keyFieldTypes); | ||
return new DefaultSeaTunnelRowSerializer(this.topic, keySeaTunnelRowType, seaTunnelRowType); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest
if () {
return xx;
} else if () {
return xx;
} else {
return xx;
}
replace to
if () {
return xx;
}
if () {
return xx;
}
return xx;
.filter(f -> { | ||
if (Arrays.asList(seaTunnelRowType.getFieldNames()).contains(f)) { | ||
return true; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
is redundant.
1753424
to
002c227
Compare
@EricJoy2048 Done. |
26d77a8
to
8cf9685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @hailin0 PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@TestTemplate | ||
public void testSinkKafka(TestContainer container) throws IOException, InterruptedException { | ||
Container.ExecResult execResult = container.executeJob("/kafkasink_fake_to_kafka.conf"); | ||
Assertions.assertEquals(0, execResult.getExitCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. In addition to checking that the job exits properly, you also need to check that data is successfully written to kafka
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the consumer data size and key message schema check
.build(); | ||
generateTestData(row -> new ProducerRecord<>("test_topic_text", null, serializer.serialize(row))); | ||
Container.ExecResult execResult = container.executeJob("/kafkasource_text_to_console.conf"); | ||
Assertions.assertEquals(0, execResult.getExitCode(), execResult.getStderr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, it is suggest to use AssertSink or fileSink to check whether the output data is as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already use AssertSink
DefaultSeaTunnelRowSerializer serializer = new DefaultSeaTunnelRowSerializer("test_topic_json", SEATUNNEL_ROW_TYPE); | ||
generateTestData(row -> serializer.serializeRow(row)); | ||
Container.ExecResult execResult = container.executeJob("/kafkasource_json_to_console.conf"); | ||
Assertions.assertEquals(0, execResult.getExitCode(), execResult.getStderr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already use AssertSink
@TaoZex please help to review |
Kafka { | ||
bootstrap.servers = "kafkaCluster:9092" | ||
topic = "test_topic" | ||
partition_key = ["c_map","c_string"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to use a fixed string partition key value? eg: partition_key = 'aaa'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to implement in this way, which is simpler for users to use
partition_key = "${c_map},${c_string}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of partition_key can be a normal string or an expression to extract the field content. We have already supported single field extraction before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hailin0 You want to keep the logic of following code snippet, right?
if (!fieldNames.contains(partitionKey)) { return row -> partitionKey; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the fixed string partition key value will cause to write all of data to a partition of topic, specify or omit partition key(s) is good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
I'm worried about breaking the released feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename config key to partition_key_fields
is more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
3cb846d
to
30a2305
Compare
@@ -0,0 +1,22 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
This is the common configuration of e2e
https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-e2e/seatunnel-e2e-common/src/test/resources/log4j2.properties
kafkaContainer = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1")) | ||
.withNetwork(NETWORK) | ||
.withNetworkAliases(KAFKA_HOST) | ||
.withLogConsumer(new Slf4jLogConsumer(log)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.withLogConsumer(new Slf4jLogConsumer(log)); | |
.withLogConsumer(new Slf4jLogConsumer(DockerLoggerFactory.getLogger("confluentinc/cp-kafka:6.2.1"))); |
private List<String> createPartitionKeys(Config pluginConfig, SeaTunnelRowType seaTunnelRowType) { | ||
if (pluginConfig.hasPath(PARTITION_KEY_FIELDS)) { | ||
return pluginConfig.getStringList(PARTITION_KEY_FIELDS).stream() | ||
.filter(f -> Arrays.asList(seaTunnelRowType.getFieldNames()).contains(f)) | ||
.collect(Collectors.toList()); | ||
} | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
private List<String> getPartitionKeyFields(Config pluginConfig, SeaTunnelRowType seaTunnelRowType) {
if (pluginConfig.hasPath(PARTITION_KEY_FIELDS)) {
List<String> partitionKeyFields = pluginConfig.getStringList(PARTITION_KEY_FIELDS);
List<String> rowTypeFieldNames = Arrays.asList(seaTunnelRowType.getFieldNames());
for (String partitionKeyField : partitionKeyFields) {
if (!rowTypeFieldNames.contains(partitionKeyField)) {
throw new IllegalArgumentException(String.format(
"Partition key field not found: %s, rowType: %s", partitionKeyField, rowTypeFieldNames));
}
}
return partitionKeyFields;
}
return Collections.emptyList();
}
private Function<SeaTunnelRow, String> createPartitionExtractor(Config pluginConfig, | ||
SeaTunnelRowType seaTunnelRowType) { | ||
if (!pluginConfig.hasPath(PARTITION_KEY)){ | ||
private Function<SeaTunnelRow, SeaTunnelRow> createPartitionExtractor(SeaTunnelRowType seaTunnelRowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
remove this method
@@ -162,12 +168,25 @@ private Properties getKafkaProperties(Config pluginConfig) { | |||
|
|||
// todo: parse the target field from config | |||
private SeaTunnelRowSerializer<byte[], byte[]> getSerializer(Config pluginConfig, SeaTunnelRowType seaTunnelRowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
private SeaTunnelRowSerializer<byte[], byte[]> getSerializer(Config pluginConfig, SeaTunnelRowType seaTunnelRowType) {
if (pluginConfig.hasPath(PARTITION)){
return new DefaultSeaTunnelRowSerializer(pluginConfig.getString(TOPIC),
pluginConfig.getInt(PARTITION), seaTunnelRowType);
} else {
return new DefaultSeaTunnelRowSerializer(pluginConfig.getString(TOPIC),
getPartitionKeyFields(pluginConfig, seaTunnelRowType), seaTunnelRowType);
}
}
* @param row seatunnel row | ||
* @return kafka record. | ||
*/ | ||
ProducerRecord<K, V> serializeRowByKey(String key, SeaTunnelRow row); | ||
ProducerRecord<K, V> serializeRowByKey(SeaTunnelRow key, SeaTunnelRow row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
remove this method
ProducerRecord<K, V> serializeRowByKey(SeaTunnelRow key, SeaTunnelRow row); |
@@ -67,13 +72,12 @@ public class KafkaSinkWriter implements SinkWriter<SeaTunnelRow, KafkaCommitInfo | |||
// check config | |||
@Override | |||
public void write(SeaTunnelRow element) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
public void write(SeaTunnelRow element) {
ProducerRecord<byte[], byte[]> producerRecord = seaTunnelRowSerializer.serializeRow(element);
kafkaProducerSender.send(producerRecord);
}
@@ -25,34 +25,42 @@ | |||
|
|||
public class DefaultSeaTunnelRowSerializer implements SeaTunnelRowSerializer<byte[], byte[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
public class DefaultSeaTunnelRowSerializer implements SeaTunnelRowSerializer<byte[], byte[]> {
private Integer partition;
private final String topic;
private final SerializationSchema keySerialization;
private final SerializationSchema valueSerialization;
public DefaultSeaTunnelRowSerializer(String topic, SeaTunnelRowType seaTunnelRowType) {
this(topic, element -> null, createSerializationSchema(seaTunnelRowType));
}
public DefaultSeaTunnelRowSerializer(String topic, Integer partition, SeaTunnelRowType seaTunnelRowType) {
this(topic, seaTunnelRowType);
this.partition = partition;
}
public DefaultSeaTunnelRowSerializer(String topic,
List<String> keyFieldNames,
SeaTunnelRowType seaTunnelRowType) {
this(topic, createKeySerializationSchema(keyFieldNames, seaTunnelRowType),
createSerializationSchema(seaTunnelRowType));
}
public DefaultSeaTunnelRowSerializer(String topic,
SerializationSchema keySerialization,
SerializationSchema valueSerialization) {
this.topic = topic;
this.keySerialization = keySerialization;
this.valueSerialization = valueSerialization;
}
@Override
public ProducerRecord<byte[], byte[]> serializeRow(SeaTunnelRow row) {
return new ProducerRecord<>(topic, partition,
keySerialization.serialize(row), valueSerialization.serialize(row));
}
private static SerializationSchema createSerializationSchema(SeaTunnelRowType rowType) {
return new JsonSerializationSchema(rowType);
}
private static SerializationSchema createKeySerializationSchema(List<String> keyFieldNames,
SeaTunnelRowType seaTunnelRowType) {
int[] keyFieldIndexArr = new int[keyFieldNames.size()];
SeaTunnelDataType[] keyFieldDataTypeArr = new SeaTunnelDataType[keyFieldNames.size()];
for (int i = 0; i < keyFieldNames.size(); i++) {
String keyFieldName = keyFieldNames.get(i);
int rowFieldIndex = seaTunnelRowType.indexOf(keyFieldName);
keyFieldIndexArr[i] = rowFieldIndex;
keyFieldDataTypeArr[i] = seaTunnelRowType.getFieldType(rowFieldIndex);
}
SeaTunnelRowType keyType = new SeaTunnelRowType(keyFieldNames.toArray(new String[0]), keyFieldDataTypeArr);
SerializationSchema keySerializationSchema = createSerializationSchema(keyType);
Function<SeaTunnelRow, SeaTunnelRow> keyDataExtractor = row -> {
Object[] keyFields = new Object[keyFieldIndexArr.length];
for (int i = 0; i < keyFieldIndexArr.length; i++) {
keyFields[i] = row.getField(keyFieldIndexArr[i]);
}
return new SeaTunnelRow(keyFields);
};
return row -> keySerializationSchema.serialize(keyDataExtractor.apply(row));
}
}
cc5bf74
to
053b177
Compare
@hailin0 Great suggestions! Actually, you have refactor the sink serialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Pleaes resolve conflict, Thanks! |
Hi, @harveyyue . Resolve the conflicts please. |
c7eaae7
to
96ce728
Compare
Sorry late response, have fixed conflicts. |
Please merge the dev code after #3401 merged. |
96ce728
to
7715ba5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hailin0 @Hisoka-X @EricJoy2048 PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Purpose of this pull request
Check list
New License Guide