Do multiple stores on the same node work correctly? #3531

tildeleb · 2015-12-27T22:50:05Z

I am trying to get a config with 2 stores on a single node working. The volumes have about 38 GB of storage available each. I issued the following commands to setup a two store node using volumes /ssd and /mnt.

$ mkdir /ssd/cdb /mnt/cdb
$ ./cockroach init --stores=ssd=/ssd/cdb
I1226 06:59:07.831951 32115 server/context.go:182  initialized 1 storage engine(s)
I1226 06:59:07.832123 32115 storage/engine/rocksdb.go:107  opening rocksdb instance at "/ssd/cdb"
I1226 06:59:07.848661 32115 multiraft/multiraft.go:999  node 1 campaigning because initial confstate is [1]
I1226 06:59:07.849213 32115 storage/replica_command.go:1209  range 1: new leader lease replica {1 1 1} 1970-01-01 00:00:00 +0000 UTC +1451113148.848s
I1226 06:59:07.850029 32115 cli/start.go:84  cockroach cluster 0dd338fb-646b-4489-a3fd-e65154ec5078 has been initialized
I1226 06:59:07.850103 32115 storage/engine/rocksdb.go:141  closing rocksdb instance at "/ssd/cdb"
$ nohup ./cockroach start --stores=ssd=/ssd/cdb,ssd=/mnt/cdb  --gossip=self= --addr 10.184.154.141:8080 --insecure >& log.txt &

After that I ran a program to load 70 GB of random keys and values I generated. All the ranges end up on /ssd/cdb and when that volume runs out of space cockroach gets the panic "No space left on device".

Questions?

Is this a config error on my part or a bug?
Am I correct that you only init the first volume because init puts the UID on the first volume?
If a bug, would adding--balance-mode: "range count" be a workaround?
I would really like to get this working.

The following log entries would seem to be relevant:

server/context.go:182  initialized 2 storage engine(s)
storage/engine/rocksdb.go:107  opening rocksdb instance at "/ssd/cdb"
server/node.go:284  initialized store store=1:1 ([ssd]=/ssd/cdb): {Capacity:42133159936 Available:39935733760 RangeCount:0}
storage/engine/rocksdb.go:107  opening rocksdb instance at "/mnt/cdb"
server/node.go:271  store store=0:0 ([ssd]=/mnt/cdb) not bootstrapped
server/node.go:211  node ID 1 initialized
Started node with [[ssd]=/ssd/cdb [ssd]=/mnt/cdb] engine(s) and attributes []
server/node.go:335  bootstrapping 1 store(s)
server/node.go:362  bootstrapped store store=1:2 ([ssd]=/mnt/cdb)

The following is a partial log from the beginning and the end of the log file.

I1226 07:00:57.263748 32128 cli/start.go:114  build Vers: go1.5.1
I1226 07:00:57.263805 32128 cli/start.go:115  build Tag:  alpha-6416-g2869b15
I1226 07:00:57.263819 32128 cli/start.go:116  build Time: 2015/12/16 20:13:29
I1226 07:00:57.263830 32128 cli/start.go:117  build Deps:
I1226 07:00:57.263883 32128 server/context.go:182  initialized 2 storage engine(s)
I1226 07:00:57.263911 32128 cli/start.go:146  starting cockroach cluster
W1226 07:00:57.263939 32128 server/server.go:98  running in insecure mode, this is strongly discouraged. See --insecure and --certs.
I1226 07:00:57.264500 32128 storage/engine/rocksdb.go:107  opening rocksdb instance at "/ssd/cdb"
I1226 07:00:57.272684 32128 server/node.go:284  initialized store store=1:1 ([ssd]=/ssd/cdb): {Capacity:42133159936 Available:39935733760 RangeCount:0}
I1226 07:00:57.272768 32128 storage/engine/rocksdb.go:107  opening rocksdb instance at "/mnt/cdb"
I1226 07:00:57.273284 32128 multiraft/multiraft.go:999  node 1 campaigning because initial confstate is [1]
I1226 07:00:57.279825 32128 server/node.go:271  store store=0:0 ([ssd]=/mnt/cdb) not bootstrapped
I1226 07:00:57.279854 32128 server/node.go:211  node ID 1 initialized
I1226 07:00:57.279979 32128 gossip/gossip.go:189  setting node descriptor node_id:1 address:<network_field:"tcp" address_field:"10.184.154.141:8080" > attrs:<>
I1226 07:00:57.280034 32128 server/node.go:374  connecting to gossip network to verify cluster ID...
I1226 07:00:57.280062 32128 server/node.go:391  node connected via gossip and verified as part of cluster "0dd338fb-646b-4489-a3fd-e65154ec5078"
I1226 07:00:57.280123 32128 server/node.go:249  Started node with [[ssd]=/ssd/cdb [ssd]=/mnt/cdb] engine(s) and attributes []
I1226 07:00:57.280153 32128 server/server.go:230  starting http server at 10.184.154.141:8080
I1226 07:00:57.280360 32128 server/node.go:335  bootstrapping 1 store(s)
I1226 07:00:57.282738 32128 server/node.go:362  bootstrapped store store=1:2 ([ssd]=/mnt/cdb)
E1226 07:01:02.274745 32128 storage/queue.go:374  failure processing replica range=1 [/Min-/Max) from replicate queue: storage/allocator.go:255: unable to allocate a target store; no candidates available
E1226 07:01:07.275042 32128 storage/queue.go:374  failure processing replica range=1 [/Min-/Max) from replicate queue: storage/allocator.go:255: unable to allocate a target store; no candidates available

.
.
.

E1227 07:39:28.094412 32128 storage/queue.go:374  failure processing replica range=1578 ["ks0027:pultitsdxdufi"-"ks0027:qghdbastrsnq") from replicate queue: storage/allocator.go:255: unable to allocate a target store; no candidates available
E1227 07:39:28.440789 32128 storage/queue.go:374  failure processing replica range=1579 ["ks0027:qghdbastrsnq"-"ks0027:qscrbxuifkotraw") from replicate queue: storage/allocator.go:255: unable to allocate a target store; no candidates available
E1227 07:39:28.787182 32128 storage/queue.go:374  failure processing replica range=1580 ["ks0027:qscrbxuifkotraw"-"ks0027:rdzaoiwgrotx") from replicate queue: storage/allocator.go:255: unable to allocate a target store; no candidates available
panic: IO error: /ssd/cdb/004632.log: No space left on device

goroutine 66 [running]:
github.com/cockroachdb/cockroach/multiraft.(*writeTask).start.func1()
        /e/gotest/src/github.com/cockroachdb/cockroach/multiraft/storage.go:244 +0xa4e
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc82013afc0, 0xc8202a2000)
        /e/gotest/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:88 +0x52
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /e/gotest/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:89 +0x62

The text was updated successfully, but these errors were encountered:

tbg · 2015-12-27T23:08:47Z

Hi @tildeleb,

you're running into #2067. The tl;dr is that you can't have two copies of the same replica on a single node, which prevents what should happen from happening in your case: The whole keyspace starts out on your first store, and some of it can only make it to the second store if it were to move there. But the strategy for that is "basically" copy-then-delete, which for a short period of time means two copies of the replica on the same node.

So,

bug
yes, you init a specific store and the rest happens at runtime (or, rather, should when we fix this)
no, that only changes the decision making, but it can't ever decide to do what you want
this is something we'll get working before beta. More intensive multi-store testing is just beginning, so we'll get there.

As of today, you'll need to run #of-replicas+1 nodes locally and then the ranges will be able to move around.

This is a bit of a bummer, but stay tuned - you'll be able to do this soon.

I'm glad you're taking the time to test us; let us know if you run into anything else that doesn't work as expected.

tildeleb · 2015-12-28T01:23:30Z

@tschottdorf,

Thanks. I did a search before I wrote this issue using keywords like store and EOF but didn't see #2607. I should have searched on replica.

Also to be considered with this bug is what happens when storage runs out? Panic is probably not what customers want. I would suggest warnings well ahead of running out that storage. For example, warnings at 70%, 80%, 90%, 95%, 96% and so on. A simple mechanism should suffice, like a url that can get hit when this happens.

I guess that begs the question about being able to add stores/nodes on the fly?

If this is a dup of #2067, please feel free to close it out.

tbg · 2015-12-28T08:05:46Z

yes, there will be prominent warnings on the UI emanating from the cluster when a node runs out of storage. You should be able to add nodes on the fly already (but you need more than the replication factor to avoid the bug above).
I'm going to close this - watch for activity on #2067 to see when this is fixed.
Thanks again for reporting!

bdarnell · 2015-12-28T20:56:18Z

As of today, you'll need to run #of-replicas+1 nodes locally and then the ranges will be able to move around.

Note that as long as those nodes are all on the same disk, you'll also need --balance-mode: "range count"; otherwise all the nodes will show equal amounts of "space available" and the rebalancer won't do anything.

tbg self-assigned this Dec 27, 2015

tbg closed this as completed Dec 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do multiple stores on the same node work correctly? #3531

Do multiple stores on the same node work correctly? #3531

tildeleb commented Dec 27, 2015

tbg commented Dec 27, 2015

tildeleb commented Dec 28, 2015

tbg commented Dec 28, 2015

bdarnell commented Dec 28, 2015

Do multiple stores on the same node work correctly? #3531

Do multiple stores on the same node work correctly? #3531

Comments

tildeleb commented Dec 27, 2015

tbg commented Dec 27, 2015

tildeleb commented Dec 28, 2015

tbg commented Dec 28, 2015

bdarnell commented Dec 28, 2015