Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added StringBuilder class #10860

Merged
merged 1 commit into from
Sep 4, 2017

Conversation

karroffel
Copy link
Contributor

When doing large string concatenations the default push_back on the
String class can slow down things quite a bit. This is because it
has to constantly reallocate the memory and copy the contents. This
StringBuilder class delays the concatenation until the size of the
resulting string is known.

@karroffel karroffel added this to the 3.0 milestone Sep 1, 2017
@karroffel karroffel force-pushed the bob-the-string-builder branch 2 times, most recently from 816fd31 to 5722eb1 Compare September 1, 2017 13:59
@karroffel karroffel force-pushed the bob-the-string-builder branch 2 times, most recently from e7b4add to 02db945 Compare September 1, 2017 14:14
@karroffel karroffel requested a review from reduz September 1, 2017 14:15

class StringBuilder : public Reference {
GDCLASS(StringBuilder, Reference)
private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private is unnecessary here.

@karroffel karroffel force-pushed the bob-the-string-builder branch from 02db945 to 3cb06bd Compare September 1, 2017 14:46
@neikeq
Copy link
Contributor

neikeq commented Sep 1, 2017

Would be great if we could specify a custom bucket size. For example, for a bindings generator I know I will append a lot, so I would benefit from a bigger bucket.


#include "core/ustring.h"

#define BUCKET_SIZE 32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better put this in an anonymous enum, no need to pollute the #define space with a generic macro name

};

uint32_t string_length = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably too complex and both String and const char * are really small (both 8 bytes), and have no initialization/freeing cost if empty, so i would just do:


struct StringData {
   String string;
   cosnt char* cstring; //if not null, use this, otherwise use string.
}; // should be at most 16 bytes, which is tiny

Vector<StringData> strings; //resizes every powers of 2, so this is should already be more efficient and cache local than buckets


#define BUCKET_SIZE 32

class StringBuilder : public Reference {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object is a pretty fatty class, which lots of code for initializing/deinitializing, using it would beat the purpose and it would just be faster to use this as a local class.

@karroffel karroffel force-pushed the bob-the-string-builder branch from 3cb06bd to bb7af10 Compare September 1, 2017 15:46
@Zylann
Copy link
Contributor

Zylann commented Sep 1, 2017

Is it possible to re-use the StringBuilder? (random question, I just know the C# one can by setting Length back to 0) Also is it planned to have it available in GDScript? (oh wait if it's not an Object then it can't...)

@karroffel
Copy link
Contributor Author

@Zylann currently it can't be reused. I could implement a reset method if the demand is there.

I had it be a Reference type and tested it in GDScript and all, so that worked, but @reduz said (rightfully so) that the Object properties make it pretty heavyweight which it doesn't need to be.
Maybe we can have a Reference wrapper so it can be used from Scripting languages, I wouldn't mind that.

@Zylann
Copy link
Contributor

Zylann commented Sep 1, 2017

The re-using feature was a bit related to inheriting Object though, but I understand it can be useless when used on the engine side


const char *s = c_strings[c_string_elem];

String converted_string = String::utf8(s, appended_strings[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not assume that this is utf8. Godot simply does not have this datatype around in this format, as it's always converted when parsed. I would assume it a regular ascii C string. and copy char -> wchar_t with a for() loop. It will be much faster for existing use cases we intend to use this for.

@karroffel karroffel force-pushed the bob-the-string-builder branch from bb7af10 to 80ce82b Compare September 1, 2017 17:52
When doing large string concatenations the default push_back on the
String class can slow down things quite a bit. This is because it
has to constantly reallocate the memory and copy the contents. This
StringBuilder class delays the concatenation until the size of the
resulting string is known.
@karroffel karroffel force-pushed the bob-the-string-builder branch from 80ce82b to 95a0886 Compare September 1, 2017 17:52
@MednauN
Copy link
Contributor

MednauN commented Sep 1, 2017

Hello there, I came from #10844. My PR will likely depend on the StringBuffer here, so I kinda want to request a few optimizations.
Namely, the usage of short pre-allocated array of characters for building short strings. This will save us from memory allocations at all. And, FYI this is actually how most implementations of std::string work - they use internal buffer for string up to 16 symbols (clang can store up to 22 if i'm not mistaken), so it reduces number of memory allocations quite a big time.
Personally, I think StringBuilder should look kinda like this:

class StringBuilder {
	CharType short_buffer[SHORT_BUFFER_LEN];
	vector<CharType> buffer;
	uint32_t string_length;
...
}

So, it'll store everything in short_buffer while there is enough space and once length is more than SHORT_BUFFER_LEN it copies all data info buffer and uses it instead (or maybe not copies, but uses both buffers and appends them in as_string method). And if buffer will be resized with grow strategy about 1.5, it'll have very few memory re-allocations for most of the cases.
This optimization is important for that PR of mine because every memory allocation counts there.

Also, it would be nice to have to_int and to_double methods in StringBuilder, because you can use them with const CharType* without allocating a new string and that saves us from another redundant memory allocation.
Oh, and a method to append a single character would be nice too)

Well, I know this is quite a request) If something of this will be done, that's nice. If I got the "go ahead" to implement all these things in my PR, that would be nice too) The point that I really need these optimizations to use StringBuilder in my PR.

@karroffel
Copy link
Contributor Author

@MednauN I see, I made the StringBuilder with the shader compiler and the code that GDNative and mono/C# need to generate in mind, so it tries to minimize allocations until you need the final string. You want a StringBuffer, which is a bit different.

I can add one of those too if @reduz is not against it, but as I said, this was made with huge strings in mind, not constructing small ones.

@endragor
Copy link
Contributor

endragor commented Sep 4, 2017

@karroffel I don't see though how @MednauN's suggestions contradict or worsen the use case of constructing large strings, and it's better not to increase the number of entities when you can.

@karroffel
Copy link
Contributor Author

@endragor his implementation is basically a buffer which you can write to. This is not any better than just using String::push_back directly, but it has a small buffer to save for an initial allocation in case your constructed string is small. That also means that if you have filled the buffer and it needs to grow you might need to copy the whole contents into the newer larger buffer.

You don't win anything over String::push_back apart from the saved initial allocation.

My implementation saves pointers to the data. If you append more data then you might need to copy the pointers around, but you don't copy the data itself. So in essence:

  • his implementation adds a local buffer to save the initial allocation
  • my implementation delays the final allocation for as long as possible, but doesn't reduce small allocations (because the pointers are in heap allocated too).

That said, I could add such a small buffer, but it's not really the purpose of that data structure.

What this StringBuilder does is basically just [s0, s1, s2, ... sn].join(""), not more. It's to concatenate many strings together without huge temporary buffers that might need copying.

I'd rather add a StringBuffer that is used to construct strings from scratch, which is basically what he needs.

@karroffel karroffel merged commit 84c33cf into godotengine:master Sep 4, 2017
@karroffel karroffel deleted the bob-the-string-builder branch December 4, 2017 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants