Setting up multiple Sparkline ThriftServers (Load Balancing & HA)

To handle multiple concurrent client connections, often it is necessary to setup more than one Sparkline Thrift Server. Client application can be made agnostic of multiple servers using Dynamic Service Discovery feature of SparklineThrift Server. This feature is similar to HiveServer2 Dynamic Service Discovery.

Follow the steps below to enable Dynamic Service Discovery/LB/HA

Install Spark & Sparkline Accelerator Jar on all the machines.
Update hive_site.xml

hive.server2.support.dynamic.service.discovery=true
hive.server2.zookeeper.namespace=hiveserver2 (change this to reflect the namespace that you want to use)
hive.zookeeper.quorum="localhost:8080” (ZooKeeper host:port ‘,’ separated list if using ZooKeeper Ensemble)

Bring up Sparkline Thrift Server
Change client connection URL to point to ZooKeeper

jdbc:hive2://<zookeeper_ensemble>/<db>;serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=<hiveserver2_namespace>
Example: jdbc:hive2://localhost:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

Each thrift Server will register itself with Zookeeper when it comes up & will remove it self when it goes down. Client specifies ZooKeeper address & name space (hive.server2.zookeeper.namespace); client then gets redirected to one of the available thrift servers. ZooKeeper load balances client connections in round robin fashion. Note, there is no transparent failover for active sessions in case ThriftServer goes down.

Overview
Quick Start
- Installing and Setup Druid
User Guide
- [Defining a DataSource on a Flattened Dataset](https://github.com/SparklineData/spark-druid-olap/wiki/Defining-a Druid-DataSource-on-a-Flattened-Dataset)
- Defining a Star Schema
- Sample Queries
- Approximate Count and Spatial Queries
- Druid Datasource Options
- Sparkline SQLContext Options
- Using Tableau with Sparkline
- How to debug a Query Plan?
- Running the ThriftServer with Sparklinedata components
- [Setting up multiple Sparkline ThriftServers - Load Balancing & HA] (https://github.com/SparklineData/spark-druid-olap/wiki/Setting-up-multiple-Sparkline-ThriftServers-(Load-Balancing-&-HA))
- Runtime Views
- Sparkline SQL extensions
- Sparkline Pluggable Modules
Dev. Guide
Reference Architectures
- Accelerating existing SQL Datasets
Releases
Cluster Spinup Tool
TPCH Benchmark
- Generating Denormalized TPCH Dataset
- Build TPCH Index for Benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up multiple Sparkline ThriftServers (Load Balancing & HA)

Clone this wiki locally