Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/mvt: stable marshalling #93

Merged
merged 2 commits into from
Apr 2, 2022
Merged

Conversation

tgrigsby-sc
Copy link
Contributor

@tgrigsby-sc tgrigsby-sc commented Mar 25, 2022

Hi! We've been using the library and are big fans. I've noticed one opportunity to (potentially) improve things. I'm trying to take an MD5 hash of the marshaled bytes to use as a fingerprint that I can compare with what the client sends me to see if they match, and it doesn't work quite right.

Currently, encode_properties iterates over a map to encode. Since there are no guarantees on iteration order for go maps, the output can differ each time for the same input. In practice, I've found the iteration order to be different almost every time.

func TestStableMarshalling(t *testing.T) {
layers := NewLayers(loadGeoJSON(t, maptile.New(17896, 24449, 16)))
values := make(map[string]bool)

for i := 0; i < 100; i++ {
	marshal, _ := Marshal(layers)
	checksum := md5.Sum(marshal)
	sum := hex.EncodeToString(checksum[:])
	values[sum] = true
}

if len(values) != 1 {
	t.Errorf("multiple values (%d) for marshalled bytes", len(values))
}

}
The above test will typically produce 100 different md5 values.

This change makes encode_properties stable by getting the keys to the map, sorting them, and iterating over them. The benchmark numbers are:

Unstable:

BenchmarkMarshal
BenchmarkMarshal-16          	    1819	    606970 ns/op	  368286 B/op	    4790 allocs/op
BenchmarkUnmarshal
BenchmarkUnmarshal-16        	    4959	    245290 ns/op	  208822 B/op	    2468 allocs/op
BenchmarkProjectToTile
BenchmarkProjectToTile-16    	   14942	     80458 ns/op	    6672 B/op	     309 allocs/op
BenchmarkProjectToGeo
BenchmarkProjectToGeo-16     	   19521	     64411 ns/op	    6672 B/op	     309 allocs/op
PASS

Stable:

BenchmarkMarshal
BenchmarkMarshal-16          	    1363	    817600 ns/op	  411622 B/op	    5354 allocs/op
BenchmarkUnmarshal
BenchmarkUnmarshal-16        	    4479	    245967 ns/op	  208905 B/op	    2468 allocs/op
BenchmarkProjectToTile
BenchmarkProjectToTile-16    	   15022	     83612 ns/op	    6672 B/op	     309 allocs/op
BenchmarkProjectToGeo
BenchmarkProjectToGeo-16     	   19256	     61878 ns/op	    6672 B/op	     309 allocs/op
PASS

I'm happy to go whatever route you'd prefer here. I understand you may not want to impact the perf since the stability hasn't mattered up to now. I can perhaps add a separate method like MarshalStable?

Much appreciated!

@paulmach
Copy link
Owner

Hi, thanks for the contribution. LGTM.

You could put a shared []string buffer on the keyValueEncoder so reduce allocs. Might help a lot.

On the performance thing. If you guys are "big fans" and okay with it, I am too :). I don't have enough context to really make the call. I like the idea of it being stable for better caching and comparison, as you mentioned.

@tgrigsby-sc
Copy link
Contributor Author

Ok I updated with 4408fe0 to do what I believe you were suggesting - let me know if this isn't what you had in mind. The new benchmark numbers are:

BenchmarkMarshal
BenchmarkMarshal-16          	    1450	    755798 ns/op	  377562 B/op	    5102 allocs/op
BenchmarkUnmarshal
BenchmarkUnmarshal-16        	    4641	    249363 ns/op	  208880 B/op	    2468 allocs/op
BenchmarkProjectToTile
BenchmarkProjectToTile-16    	   15543	     75910 ns/op	    6672 B/op	     309 allocs/op
BenchmarkProjectToGeo
BenchmarkProjectToGeo-16     	   19398	     60365 ns/op	    6672 B/op	     309 allocs/op
PASS

Looks like improvements on ns, B, and allocs. Great suggestion!

@paulmach
Copy link
Owner

paulmach commented Apr 2, 2022

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants