Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize text rendering by caching UBreakIterator instances. #102129

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Jan 28, 2025

Today I profiled Godot launch times again.
Through caching, I was able to shave 4% total launch time off the Godot Editor (and additionally, improve render time of texts).

I am not familiar with the advanced text server, so I this change should be well tested and reviewed.

Explanation

It appears that ubrk_open converts its input locale from UTF8 to UTF16, allocating memory. This is surprisingly slow, and called quite often too.

The implementation of BreakIterator::createLineInstance(Locale(locale), *status); appears to not be able to avoid this behavior, e.g. by using UTF32 strings directly.
Instead, it is easy to cache UBreakIterator instances. There is already a thread lock on sd->mutex, so we should be safe that this does not cause cache variable contention.
Using a UBreakIterator pool independent of sd might be even better; I experimented with this and got the launch time down by almost 12% (almost the entire contribution of createLineInstance). However, I am not aware of a precedent for variable pools like this, so that may require a more complex solution.

Profiling details

I used Apple Instruments' CPU Profiler and inverted the call tree.

Launch

Before update

1.35 Gc  12,2 %	  u_strFromUTF8WithSub_76_godot
1.35 Gc  12,2 %	   icu_76_godot::UnicodeString::setToUTF8(icu_76_godot::StringPiece)
1.35 Gc  12,2 %	    icu_76_godot::RBBIDataWrapper::init(icu_76_godot::RBBIDataHeader const*, UErrorCode&)
1.35 Gc  12,2 %	     icu_76_godot::RBBIDataWrapper::RBBIDataWrapper(UDataMemory*, UErrorCode&)
1.35 Gc  12,2 %	      icu_76_godot::RBBIDataWrapper::RBBIDataWrapper(UDataMemory*, UErrorCode&)
1.35 Gc  12,2 %	       icu_76_godot::RuleBasedBreakIterator::RuleBasedBreakIterator(UDataMemory*, UErrorCode&)
1.35 Gc  12,2 %	        icu_76_godot::RuleBasedBreakIterator::RuleBasedBreakIterator(UDataMemory*, signed char, UErrorCode&)
1.35 Gc  12,2 %	         icu_76_godot::BreakIterator::buildInstance(icu_76_godot::Locale const&, char const*, UErrorCode&)
1.35 Gc  12,2 %	          icu_76_godot::BreakIterator::makeInstance(icu_76_godot::Locale const&, int, UErrorCode&)
1.35 Gc  12,2 %	           ubrk_open_76_godot
1.35 Gc  12,2 %	            TextServerAdvanced::_shaped_text_update_breaks(RID const&)
1.35 Gc  12,1 %	             TextServer::shaped_text_get_line_breaks(RID const&, double, long long, BitField<TextServer::LineBreakFlag>) const

After update

882.78 Mc   8,2 %	  u_strFromUTF8WithSub_76_godot
882.78 Mc   8,2 %	   icu_76_godot::UnicodeString::setToUTF8(icu_76_godot::StringPiece)
882.78 Mc   8,2 %	    icu_76_godot::RBBIDataWrapper::init(icu_76_godot::RBBIDataHeader const*, UErrorCode&)
882.78 Mc   8,2 %	     icu_76_godot::RBBIDataWrapper::RBBIDataWrapper(UDataMemory*, UErrorCode&)
882.78 Mc   8,2 %	      icu_76_godot::RBBIDataWrapper::RBBIDataWrapper(UDataMemory*, UErrorCode&)
882.78 Mc   8,2 %	       icu_76_godot::RuleBasedBreakIterator::RuleBasedBreakIterator(UDataMemory*, UErrorCode&)
882.78 Mc   8,2 %	        icu_76_godot::RuleBasedBreakIterator::RuleBasedBreakIterator(UDataMemory*, signed char, UErrorCode&)
882.78 Mc   8,2 %	         icu_76_godot::BreakIterator::buildInstance(icu_76_godot::Locale const&, char const*, UErrorCode&)
882.78 Mc   8,2 %	          icu_76_godot::BreakIterator::makeInstance(icu_76_godot::Locale const&, int, UErrorCode&)
882.78 Mc   8,2 %	           ubrk_open_76_godot
882.78 Mc   8,2 %	            TextServerAdvanced::ShapedTextDataAdvanced::_get_break_iterator_for_locale(String const&, UErrorCode*)
882.78 Mc   8,2 %	             TextServerAdvanced::_shaped_text_update_breaks(RID const&)
878.78 Mc   8,2 %	              TextServer::shaped_text_get_line_breaks(RID const&, double, long long, BitField<TextServer::LineBreakFlag>) const

Caveats

This change is likely to use some additional RAM. I don't have a sophisticated way to measure this, but my system reports a potential increase of 70mb:

  • 1,22 GB on master
  • 1,29 GB on this PR

It would be good if this could be measured more precisely to weigh it against the performance improvement.
Using a UBreakIterator pool (explained above) would likely further improve RAM efficiency.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected. Code looks good to me.

Benchmark

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 41)

Using an optimized editor build with LTO.

Startup + shutdown times of the editor on an empty project:

❯ hyperfine -iw1 -m50 "bin/godot.linuxbsd.editor.x86_64 /tmp/5/project.godot --quit" "bin/godot.linuxbsd.editor.x86_64.master /tmp/5/project.godot --quit"
Benchmark 1: bin/godot.linuxbsd.editor.x86_64 /tmp/5/project.godot --quit
  Time (mean ± σ):      4.528 s ±  0.435 s    [User: 2.589 s, System: 0.712 s]
  Range (min … max):    3.770 s …  5.322 s    50 runs
 
Benchmark 2: bin/godot.linuxbsd.editor.x86_64.master /tmp/5/project.godot --quit
  Time (mean ± σ):      4.620 s ±  0.340 s    [User: 2.597 s, System: 0.693 s]
  Range (min … max):    3.753 s …  4.825 s    50 runs

This PR is on average 0.1s faster to startup and shut down the editor on an empty project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants