-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clustering: influxdb 0.9.0-rc23 panics when doing a GET with merge_metrics in a 3 node cluster #2272
Comments
I understand if this is a feature that hasn't been implemented yet, but the same commands worked with RC19, so we have written many regression tests with merge_metrics in the GET that now fail. |
OK, this may be a regression since we did change the query engine. Can you supply a sequence of https://github.com/influxdb/influxdb/blob/master/CONTRIBUTING.md#bug-reports |
this is the curl command I use to reproduce it: |
I just reproduced it with influxdb stand-alone as follows:
curl -G 'http://localhost:8086/query' --data-urlencode "db=mydb" --data-urlencode "q=SELECT value FROM cpu_load_short WHERE region='us-west'" curl -XPOST 'http://localhost:8086/write' -d ' { curl -G 'http://localhost:8086/query' --data-urlencode "db=mydb" --data-urlencode "q=SELECT value FROM cpu_load_short WHERE region='us-west'" influxdb fails with panic message from original comment |
From: https://github.com/influxdb/influxdb/blob/master/tx.go#L144, it looks like this is triggered because the replication factor is less than the number of servers in the cluster. I believe the default replication factor is 1 for the default retention policy and you are using a 3 node cluster. You might try creating a retention policy w/ a replication factor of 3 and specifying that RP in your writes as a work around. @otoolep is this panic still needed though? |
Yeah, that could be an issue. Let me look into it, we may not need the explicit panic any longer. |
I resolved the issue in my enviironment by adding DEFAULT to the database creation so that the correct retention policy and replication factor get set; no opinion on whether that panic should still occur if the replication factor doesn't match the clustered env, perhaps a more informative message would help. |
Here is the requested documentation. |
this works for me now |
Facing the same issue on rc25. 3 server setup, all 3 are both broker and data-node. Then I put some data curl -XPOST 'http://influxdb:8086/write' -d '
{
"database": "ilia",
"retentionPolicy": "default",
"points": [
{
"name": "cpu",
"tags": {
"host": "server1",
"region": "nl"
},
"timestamp": "2015-04-10T15:00:00Z",
"fields": {
"value": 10.64
}
},
{
"name": "cpu",
"tags": {
"host": "server1",
"region": "nl"
},
"timestamp": "2015-04-10T15:05:00Z",
"fields": {
"value": 20.00
}
},
{
"name": "cpu",
"tags": {
"host": "server1",
"region": "nl"
},
"timestamp": "2015-04-10T15:10:00Z",
"fields": {
"value": 25.00
}
},
{
"name": "cpu",
"tags": {
"host": "server1",
"region": "nl"
},
"timestamp": "2015-04-10T15:15:00Z",
"fields": {
"value": 35.01
}
}
]
}' Then I run curl -G http://influxdb:8086/query?pretty=true --data-urlencode "q=SELECT * FROM cpu" --data-urlencode "db=ilia" And it crashed the node. Then I found this issue. Created the policy with replicaN = 2 and it started to work. But that's weak, default retention policy has replicaN = 1 (which is not more than the number of servers in the cluster). Why it is running into the panic? Although, In <0.9.0 there was an option to set the default replica factor and in 0.9.0 there is not. |
even though this works when the replication factor matches the number of nodes in the cluster, it sounds like there is still a question around whether the panic should occur if the replication factor is less than the number of clustered nodes. |
Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed. There may be a more efficient way to do fix this but this fix simply queries all the shards and merges their outputs. Previously, the code assumed that only one shard would be hit. Querying multiple shards ended up producing duplicate values during the map phase so the map output needed to be merged as opposed to appended to avoid the dups.
Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed.
@jwilder how do I configure default amount of shards? |
Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed.
@svscorp You should be able to change the replication factor using the CLI
You could also create new retention policy and mark is as default
|
Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed.
@jwilder I get it, my question was about some kind of configuration, how it was done in <0.9. But that's okay, means I need to execute the ALTER query after I created a database. But I think it would be wise to have a configuration for applying the default replication factor to all the new databases created with 'default' key. Because it's easy to lost that bit, you need to always make 1 query more to have the replication factor you need (or that you need to include it to the create db query). What do you think? |
Also, here it is hardcoded: https://github.com/influxdb/influxdb/blob/master/server.go#L1104 maybe it make sense to define it as constant or something (or again, the configuration)? |
Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed.
influxdb.log: panic: distributed queries not implemented yet and there are too many shards in this group
the command I use to get the panic is: monasca measurement-list cpu.idle_perc 1970 --merge_metrics
The text was updated successfully, but these errors were encountered: