Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when using expand #149

Open
tebben opened this issue Mar 7, 2018 · 4 comments
Open

Performance issue when using expand #149

tebben opened this issue Mar 7, 2018 · 4 comments

Comments

@tebben
Copy link
Contributor

tebben commented Mar 7, 2018

After changing the query to fix amount of returned entities when using an expand the performance took a big hit.

Test 1

  • Added 1 Thing with a Location and Datastream
  • Posted 2 million observations using JMeter
  • Run query generated by QueryBuilder.go requesting localhost:8080/v1.0/Things?$expand=Datastreams/Observations

Query returned in 12ms

Test 2

  • Added 20 Things with a Location and Datastream
  • Posted 2.5 million observations using JMeter spread over 4 different datastreams
  • Run query generated by QueryBuilder.go requesting localhost:8080/v1.0/Things?$expand=Datastreams/Observations

Query returned in 17.5 seconds

Test 3

  • Remove Order By from left join observation part with Test 2 as base

Query returned in 22ms

Test 4

  • Added 20 Things with a Location and Datastream
  • Posted 3 million observations using JMeter spread over all datastreams
  • Run query generated by QueryBuilder.go requesting localhost:8080/v1.0/Things?$expand=Datastreams/Observations

Query returned in 1.1 seconds

Test 5

  • Remove Order By from left join observation part with Test 4 as base

Query returned in 86ms

It seems like the main problem is joining observations on datastreams where a datastream has no observation. Tests are done on gost-db:latest which uses Postgres 9.6, Postgres 10 has some nice new features including Query Planner Improvements which I'm going to try out first to see if this fixes our issue.

@tebben
Copy link
Contributor Author

tebben commented Mar 7, 2018

Created docker geodan/gost-db:test with PostGIS 10 (mdillion/10-alpine) Did the same tests and only saw a 3 second performance improvement over Test 2 which can also be caused by slightly lesser observations. Fiddled around with some postgres config settings which didn't seem to increase the performance.

After looking further into the EXPLAIN ANALYZE and using my google skills I created this YOLO index

CREATE INDEX fki_id_stream_id
ON v1.observation
USING btree
(stream_id, id);

which brought the query down from 14s to 34ms

Next steps:

  • Go back to geodan/gost-db:latest and test the same index there
  • If query execution time is improved figure out why this index is needed
  • If query execution time is improved figure out if old stream_id index can be removed
  • If query execution time is improved check what impact it has on inserts
  • Check if we also have performance issues on expanding Locations, HistoricalLocations and FeaturesOfInterest (and fix)

@tebben
Copy link
Contributor Author

tebben commented Mar 20, 2018

Problem was that the query planner was using the wrong strategy. To fix this the problem I had to change some things on the observation table in the database and the querybuilder used in the server.

Since some changes were needed to the database I took the opportunity to ugrade postgres to 10 and add GeoJSON fields to Location and FeatureOfInterest, previously properties were left out of location.location and featureofinterest.feature because GeoJSON was converted to geom. The new database is not compatible with the old version so cleaning up the old docker volume is required.

Tested performance on a dataset with 20 Things, Locations, Datastreams and 17.000.000 observations spread over the datastreams, the query generated by the request v1.0/Things?$count=true&$expand=Locations($select=location),Datastreams($select=id,name),Datastreams/Observations($select=id,phenomenonTime,result;$top=1)&$select=id,name,properties took 33ms and after a manual VACUUM ANALYZE on the observation table it went down to 14ms.

Still have to check the performance on expanding Locations/HistoricalLocations.

@tomvantilburg
Copy link

tomvantilburg commented Mar 21, 2018

Since you are breaking the compatibility anyway, would this be a good time to look at:
gost/gost-db#3 and gost/gost-db#4 before releasing?

@bertt
Copy link
Contributor

bertt commented Mar 22, 2018

@tomvantilburg plan is to release 0.6 first, after that we'll take a look at TimescaleDB. But a PR is always welcome :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants