Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries slow down hundreds times after overwriting points #6611

Closed
kub00n opened this issue May 12, 2016 · 1 comment · Fixed by #6668
Closed

Queries slow down hundreds times after overwriting points #6611

kub00n opened this issue May 12, 2016 · 1 comment · Fixed by #6668
Labels
Milestone

Comments

@kub00n
Copy link

kub00n commented May 12, 2016

Bug report

System info:
InfluxDB 0.13, InfluxDB 0.12.2
Ubuntu 14.04.1

Steps to reproduce:

  1. Add some measurements ( 5M points, timerange: 5000 s )
...
test_series,tag=tag1 value=2 1462060800001
test_series,tag=tag2 value=3 1462060800002
test_series,tag=tag0 value=4 1462060800003
...
  1. Run a query
  2. Insert exactly the same points again (overwriting old ones)
  3. Run a query

Expected behavior:
Both query return the same results at a similar time

Actual behavior:
Results are the same, but second query (after overwrite) is almost 500 times slower than the first query. Repeating query after 10s works a bit faster but still far from the first query.

Additional info:
I wrote simple python script to reproduce it : influx_overwrite_bug.py :

==CREATING DATABASE==
==INSERTING SERIES==
* inserting 5000000 points: from 1462060800000ms - to 1462065800000ms
==QUERY==
* query: select * from test_series LIMIT 1
* result: {u'tag': u'tag0', u'value': 1, u'time': 1462060800000}
* duration: 0.0093s
==INSERTING SERIES==
* inserting 5000000 points: from 1462060800000ms - to 1462065800000ms
==QUERY==
* query: select * from test_series LIMIT 1
* result: {u'tag': u'tag0', u'value': 1, u'time': 1462060800000}
* duration: 4.4923s
==SLEEP 10s==
==QUERY==
* query: select * from test_series LIMIT 1
* result: {u'tag': u'tag0', u'value': 1, u'time': 1462060800000}
* duration: 3.1518s
==REMOVING DATABASE==
@jwilder jwilder added this to the 1.0.0 milestone May 13, 2016
jwilder added a commit that referenced this issue May 18, 2016
If there were duplicate points in multiple blocks, we would correctly
dedup the points and mark the regions of the blocks we've read.
Unfortunately, we were not excluding the already points as the cursor
moved to points in the later blocks which could cause points to be
return twice incorrectly.

Fixes #6611
@jwilder
Copy link
Contributor

jwilder commented May 18, 2016

@kub00n Thanks for providing the python script. That made it really easy to reproduce and track down the issue. The perf issue is fixed in master, but your script highlighted a correctness issue with deduplicating overwritten points. See #6668 which will fix that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants