Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During bulk import of historical data, points written to incorrect shard (according to timestamp) #7482

Closed
mark-rushakoff opened this issue Oct 18, 2016 · 0 comments
Assignees
Milestone

Comments

@mark-rushakoff
Copy link
Contributor

mark-rushakoff commented Oct 18, 2016

I am reliably reproducing a bulk import that writes data to the wrong shard.

OSX, InfluxDB from master @ b50d955; same behavior observed on 1.0.2 open source.

  1. Start with a clean database, preferably with monitoring disabled to avoid creating _internal database (use environment variable INFLUXDB_MONITOR_STORE_ENABLED=false)
  2. Import the NOAA data:
curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
 influx -import -path=./NOAA_data.txt -precision=s

Querying specifically for location='santa_monica' with no time range returns the correct results:

> SELECT "water_level" FROM "h2o_feet" WHERE "location"='santa_monica' limit 6
name: h2o_feet
time                    water_level
----                    -----------
2015-08-18T00:00:00Z    2.064
2015-08-18T00:06:00Z    2.116
2015-08-18T00:12:00Z    2.028
2015-08-18T00:18:00Z    2.126
2015-08-18T00:24:00Z    2.041
2015-08-18T00:30:00Z    2.051

But then querying with explicit time bounds, lined up with the previous results, incorrectly returns no results:

> SELECT "water_level" FROM "h2o_feet" WHERE "location"='santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'

Okay, what if we pick a specific value to match, and supply a time range? We see a later value when we supply a time range beyond those initial six results:

> SELECT "water_level", location FROM "h2o_feet" where (water_level = 2.028) AND location = 'santa_monica' AND time < '2015-08-19T00:00:00Z'
> SELECT "water_level", location FROM "h2o_feet" where (water_level = 2.028) AND location = 'santa_monica' AND time < '2015-08-29T00:00:00Z'
name: h2o_feet
time                    water_level     location
----                    -----------     --------
2015-08-28T21:00:00Z    2.028           santa_monica

And if we extend the upper time bound far enough, we eventually get the result from 2015-08-18, indicating that the results for 2015-08-18 are contained in the wrong shard:

> SELECT "water_level", location FROM "h2o_feet" where (water_level = 2.028) AND location = 'santa_monica' AND time < '2015-09-09T00:00:00Z'
name: h2o_feet
time                    water_level     location
----                    -----------     --------
2015-08-18T00:12:00Z    2.028           santa_monica
2015-08-24T05:12:00Z    2.028           santa_monica
2015-08-26T11:30:00Z    2.028           santa_monica
2015-08-28T21:00:00Z    2.028           santa_monica
2015-08-30T08:54:00Z    2.028           santa_monica
2015-09-01T23:24:00Z    2.028           santa_monica
2015-09-03T13:48:00Z    2.028           santa_monica
2015-09-07T10:12:00Z    2.028           santa_monica

> show shards
name: NOAA_water_database
id      database                retention_policy        shard_group     start_time              end_time                expiry_time             owners
--      --------                ----------------        -----------     ----------              --------                -----------             ------
1       NOAA_water_database     autogen                 1               2015-08-17T00:00:00Z    2015-08-24T00:00:00Z    2015-08-24T00:00:00Z
2       NOAA_water_database     autogen                 2               2015-08-24T00:00:00Z    2015-08-31T00:00:00Z    2015-08-31T00:00:00Z
3       NOAA_water_database     autogen                 3               2015-08-31T00:00:00Z    2015-09-07T00:00:00Z    2015-09-07T00:00:00Z
4       NOAA_water_database     autogen                 4               2015-09-07T00:00:00Z    2015-09-14T00:00:00Z    2015-09-14T00:00:00Z
5       NOAA_water_database     autogen                 5               2015-09-14T00:00:00Z    2015-09-21T00:00:00Z    2015-09-21T00:00:00Z

I'm out of the office Wednesday the 19th, but @desa and @rkuchan both have context if additional info is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants