Skip to content

Connection Management

Michael Nitschinger edited this page Feb 17, 2015 · 3 revisions

Introduction

Before you can work with Couchbase you need to provide the appropriate credentials as configuration params. Since spark needs to fan out the configuration to all the workers, the only sane way to handle configuration is to add them to the SparkConf.

In the background the configuration is passed over to the executors and the couchbase connections are lazily created when they are needed.

Default Settings

If you do not provide any settings, the following params will be applied:

  • nodes: 127.0.0.1
  • bucket: default
  • password: <empty>

This is helpful during development for getting up and running quickly, but of course in a production setup you want something different. If you just want to stick with the defaults, this is enough:

// No couchbase-specific config needed
val conf = new SparkConf().setMaster("local[*]").setAppName("myapp")

// Start your spark context
val sc = new SparkContext(conf)

Connecting To One Bucket

In general you only want to connect to one bucket and provide a number of bootstrap nodes for the cluster. The proper way to configure this is:

val conf = new SparkConf()
  // spark specific params
  .setMaster("local[*]")
  .setAppName("myapp")
  // couchbase specific params
  .set("com.couchbase.nodes", "192.168.56.101;192.168.56.102")
  .set("com.couchbase.bucket.mybucket", "password")

This will use 192.168.56.101 and 192.168.56.102 to bootstrap the Couchbase cluster and connect to mybucket with password password.

All bucket-scoped methods take a bucketName, but you can leave it out (or set it to null) if only one bucket is configured. The client will pick it up.

Connecting To Multiple Buckets

Since the Couchbase SDK supports multiple buckets, it is also possible to configure more than one. You just pass in more com.couchbase.bucket.<bucket> setting:

val conf = new SparkConf()
  // spark specific params
  .setMaster("local[*]")
  .setAppName("myapp")
  // couchbase specific params
  .set("com.couchbase.nodes", "192.168.56.101;192.168.56.102")
  .set("com.couchbase.bucket.bucket1", "passwd")
  .set("com.couchbase.bucket.bucket2", "passwd")

If you configure more than one bucket you cannot use null for the bucketName and need to always pass it in (since the SDK can't infer it anymore). If you still pass in null, you'll get:

java.lang.IllegalStateException: The bucket name can only be inferred if there is exactly 1 bucket set on the config
	at com.couchbase.spark.connection.CouchbaseConnection.bucket(CouchbaseConnection.scala:46)
	at com.couchbase.spark.RDDFunctions$$anonfun$couchbaseGet$1.apply(RDDFunctions.scala:51)
	at com.couchbase.spark.RDDFunctions$$anonfun$couchbaseGet$1.apply(RDDFunctions.scala:47)
	at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618)

Direct Access

You can also directly access the underlying cluster and bucket if you want to:

// Couchbase config based on the spark config
val cfg = CouchbaseConfig(conf)

// Get access to the lazily created cluster object
val cluster = CouchbaseConnection().cluster(cfg)

// Get access to one of the buckets on the config
val bucket = CouchbaseConnection().bucket("beer-sample", cfg)