Spark Library for Bulk Loading into Cassandra
This project refers to Spark2Cassandra
Upgrade utility(spark, cassandra) version.
- Convert rdd or dataframe to SSTableFile.
- Stream the SSTableFile to Cassandra nodes.
Spark2CassandraBulkLoad supports Spark 2.x and above.
Spark2CassandraBulkLoad Version | Spark Cassandra connector Version | Cassandra Java Driver Version | JDK Version |
---|---|---|---|
1.X.X |
[2.0, 2.5) |
[,4.0) |
1.8+ |
libraryDependencies += "com.joswlv.spark.cassandra.bulk" %% "Spark2CassandraBulkLoad" % "1.0.3"
<dependency>
<groupId>com.joswlv.spark.cassandra.bulk</groupId>
<artifactId>Spark2CassandraBulkLoad</artifactId>
<version>1.0.3</version>
</dependency>
compile 'com.joswlv.spark.cassandra.bulk:Spark2CassandraBulkLoad:1.0.3'
// Import the following to have access to the `bulkLoadToCass()` function for RDDs or DataFrames.
import com.joswlv.spark.cassandra.bulk.rdd._
import com.joswlv.spark.cassandra.bulk.sql._
// Specify the `keyspaceName` and the `tableName` to write.
rdd.bulkLoadToCass(
keyspaceName = "keyspaceName",
tableName = "tableName"
)
// Specify the `keyspaceName` and the `tableName` to write.
df.bulkLoadToCass(
keyspaceName = "keyspaceName",
tableName = "tableName"
)