cassandra Checkointing File System

Example of an implementation of hadoop FileSystem on Cassandra. Allow faster checkpointing for structured streaming (<100ms)

usage

Create C* table:

create table $keyspace.$table (path text, name text, is_dir boolean, length bigint, value blob, primary key ((path), name));

Spark configuration:

SparkSession.builder
    .config("spark.hadoop.fs.ckfs.impl", "exactlyonce.CassandraSimpleFileSystem")
    //optional:
    .config("spark.hadoop.cassandra.host", "127.0.0.1") 
    .config("spark.hadoop.cassandra.checkpointfs.keyspace", "checkpointfs")
    .config("spark.hadoop.cassandra.checkpointfs.table", "file")
    
    
val query = ds.writeStream
    .option("checkpointLocation", "ckfs://127.0.0.1/checkpointing/exactlyOnce")
    .queryName("exactlyOnce").foreach(writer).start

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
project		project
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
checkpointing-speed.png		checkpointing-speed.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cassandra Checkointing File System

usage

About

Releases

Packages

Languages

QuentinAmbard/cassandracheckointingfs

Folders and files

Latest commit

History

Repository files navigation

cassandra Checkointing File System

usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages