Skip to content

Implementation of hadoop FileSystem on Cassandra to store Structured Streaming checkpoints

Notifications You must be signed in to change notification settings

QuentinAmbard/cassandracheckointingfs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cassandra Checkointing File System

Example of an implementation of hadoop FileSystem on Cassandra. Allow faster checkpointing for structured streaming (<100ms)

alt text

usage

Create C* table:

create table $keyspace.$table (path text, name text, is_dir boolean, length bigint, value blob, primary key ((path), name));

Spark configuration:

SparkSession.builder
    .config("spark.hadoop.fs.ckfs.impl", "exactlyonce.CassandraSimpleFileSystem")
    //optional:
    .config("spark.hadoop.cassandra.host", "127.0.0.1") 
    .config("spark.hadoop.cassandra.checkpointfs.keyspace", "checkpointfs")
    .config("spark.hadoop.cassandra.checkpointfs.table", "file")
    
    
val query = ds.writeStream
    .option("checkpointLocation", "ckfs://127.0.0.1/checkpointing/exactlyOnce")
    .queryName("exactlyOnce").foreach(writer).start

About

Implementation of hadoop FileSystem on Cassandra to store Structured Streaming checkpoints

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages