Accidental O(N^2) behaviour? #23

pwaller · 2023-08-04T10:36:23Z

This is somewhat academic since you clearly get very good resolution with a low number of samples, but maybe there are non-academic issues lurking you might be interested in.

I observe that if I double the samples, it takes approx 4x as long to generate an export. Is there some accidental N^2 complexity somewhere? I don't think this is necessary a big deal (so feel free to close), but thought you might like the data:

Note I'm using handwavy order-of-magnitude math but I might have expected the latter to take (5.3s x 4 = ) ~20s rather than ~40s.

Collected 237700 samples (achieving a resolution of ~9.999 MiB) in 5 secs, 143 ms, 833 μs, and 4 hnsecs.
real	0m5.338s / user	0m5.083s / sys	0m0.198s

Collected 475616 samples (achieving a resolution of ~4.997 MiB) in 16 secs, 432 ms, and 132 μs.
real	0m16.710s / user	0m16.259s / sys	0m0.420s

Collected 950729 samples (achieving a resolution of ~2.500 MiB) in 45 secs, 119 ms, 284 μs, and 8 hnsecs.
real	0m45.520s / user	0m44.655s / sys	0m0.854s

X axis: # samples
Y axis: Time per sample in microseconds (i.e, it's getting more costly per sample)

The text was updated successfully, but these errors were encountered:

CyberShadow · 2023-08-04T10:38:12Z

Is it just export? If you leave out -o, is it more linear?

pwaller · 2023-08-04T10:39:53Z

Behaviour is the same without export.

pwaller · 2023-08-04T10:49:16Z

If I double the samples, I see 3.2x the number of cycles spent on a single 'test' instruction in _D4btdu5paths7SubPath8__mixin2__T10appendNameVbi0ZQrMFIAaZPSQCgQCeQCb.

CyberShadow · 2023-08-04T10:52:02Z

On a filesystem with a very small number of files, I expect performance to be linear.

On a filesystem very many files (e.g. one hypothetical extreme being that every sample will be in a never-seen-before file), then performance should be very close to O(n^2). This is because siblings are stored as a linked list.

At some point siblings were stored as a hash table, however I swapped that out because keeping a hash table on every node had a significant overhead, and memory use seems to be the dominating side of performance pressure right now for typical filesystems.

Maybe we can try something with less overhead but still better than linear, such as a red-black tree.

pwaller · 2023-08-04T10:57:17Z

Happy to provide test results if you want to experiment, but I don't consider this too important.

pwaller · 2023-08-19T15:25:52Z

@CyberShadow Do you want to close this one also or are you considering a tweak?

CyberShadow · 2023-08-19T17:23:09Z

I think it's actionable.

pwaller changed the title ~~Accidental O(N^2) behaviour in export?~~ Accidental O(N^2) behaviour? Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accidental O(N^2) behaviour? #23

Accidental O(N^2) behaviour? #23

pwaller commented Aug 4, 2023

CyberShadow commented Aug 4, 2023

pwaller commented Aug 4, 2023

pwaller commented Aug 4, 2023

CyberShadow commented Aug 4, 2023

pwaller commented Aug 4, 2023

pwaller commented Aug 19, 2023

CyberShadow commented Aug 19, 2023

Accidental O(N^2) behaviour? #23

Accidental O(N^2) behaviour? #23

Comments

pwaller commented Aug 4, 2023

CyberShadow commented Aug 4, 2023

pwaller commented Aug 4, 2023

pwaller commented Aug 4, 2023

CyberShadow commented Aug 4, 2023

pwaller commented Aug 4, 2023

pwaller commented Aug 19, 2023

CyberShadow commented Aug 19, 2023