Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perf or String.iter and String.iteri by 25% #9497

Closed

Conversation

abelbraaksma
Copy link
Contributor

Following up on the findings and timings presented at #9390 (comment), this PR improves performance to about 25%, sometimes more on modern processors.

Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just some suggestions about the comments. The code is reasonable and there is evidence in the issue that this works well.

src/fsharp/FSharp.Core/string.fs Outdated Show resolved Hide resolved
src/fsharp/FSharp.Core/string.fs Outdated Show resolved Hide resolved
src/fsharp/FSharp.Core/string.fs Outdated Show resolved Hide resolved
src/fsharp/FSharp.Core/string.fs Outdated Show resolved Hide resolved
@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

Strange, after i clicked the "Commit suggestion" in the online version of Github and clicked "resolve conversation", it marked the other suggestion as resolved automatically. Maybe because the sentences were equal, even though on different locations in the file.

Anyway, all resolved now, tx for review! And I added some tests, since unrolling can introduce subtle new bugs: b1d250e

and there is evidence in the issue that this works well.

@cartermp, tx, though critics might say that "one source is no source". Though it's highly unlikely that this introduces a regression on anyone's system, cpu cache sizes and speed, prefetch, branch prediction, physical cores etc can produce different timings here.

Of note is also that the CLR team is endeavoring to get better loop-unrolling done in RyuJIT, but this may still be a couple of months/years off.

@cartermp
Copy link
Contributor

Looks like there are more failures.

@saul
Copy link
Contributor

saul commented Jun 18, 2020

It seems very strange to have to unroll manually... it's trivially optimisable by the JIT. Is it possible to debug why the JIT isn't unrolling when it's a simple for loop?

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

@saul, my guess is that the hof function makes it hard to predict for the JIT that unrolling is safe or guaranteed to improve perf. Also, unrolling in the JIT is done quite rarely (the conditions have to be "just right"), there are efforts underway to improve the situation.

It's not uncommon to unroll, but it depends on many factors whether it makes sense. If the callback function is nontrivial, the benefit of unrolling gets undone. But for small inlineable functions, it shows significant improvement.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

Looks like there are more failures.

Maybe the new tests. I can't see the failure on mobile, but I'll check as soon as I'm back.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

@saul, this report gives some insight: dotnet/runtime#4248

As can be seen, even the most trivial loops do not get unrolled presently in RyuJIT (legacy JIT and x86 RyuJIT do better here) and the examples there are even way simpler than this. Since that was 5 years ago, I don't think we should expect this to change anytime soon.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

@cartermp: there was indeed a bug, glad I wrote the extra tests, which showed it.

Since this changed the code for String.iteri, I ran the benchmark again, but the benefit is largely the same, so that's good:

image

@abelbraaksma abelbraaksma requested a review from KevinRansom June 19, 2020 17:02
@abelbraaksma
Copy link
Contributor Author

After discussion with @KevinRansom, we decided to drop this PR on the grounds that loop-unrolling should eventually be done by the JIT team. This PR can be used as an example where they can improve.

@abelbraaksma abelbraaksma deleted the String-iter-and-iteri-perf branch June 24, 2020 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants