-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do about the zero terminator of a string in a Span? #273
Comments
@bricelam Have you folks at Microsoft developed any recommended guidelines for using Span with unmanaged libraries that deal in zero-terminated strings? |
@divega might know who to ask... |
I would start with @jkotas. |
For public .NET Core APIs that take or return
The internal .NET Core implementation has a few places that create and operate on Spans that are zero terminated. This is local internal implementation detail that does not leak out through public APIs. |
@jkotas That makes sense. For a .NET-only API. But what if you were passing a Span through P/Invoke to a C function that expects the zero terminator? Would you include that zero byte in the span length? |
P/Invoke methods cannot take Spans directly today. I think you are really asking whether you should include zero byte in the Span length in your internal implementation details that prepare string passed to PInvoke. It is up to you and what makes sense for the type of code that you are writing. For public APIs, I would recommend to be on the same plan as public .NET Core APIs. |
Okay. I admit this issue is arcane and my explanation of it above is tedious. Anyway, at some point I will eventually cross paths with somebody else who is weighing the pros and cons of this. Thanks for the replies. |
For others who might visit this issue, I want to add a bit more explanation for why "be on the same plan as public .NET Core APIs" doesn't seem helpful in this context. The underlying C function requires a zero terminated string. That's just the way it is. So let's say I expose a C# wrapper for that function which takes a Span, and to be consistent with .NET practices, the caller is not expected to include a zero terminator. What that means is that I have to make a copy of the data in order to append the zero terminator before passing it down to the unmanaged function. And this would defeat the whole point of using Span. |
Maybe you can write a |
@tmds I've been wondering about that possibility myself. |
A tidbit of good news noticed by sgjennings on Twitter: This writeup on the upcoming Utf8String type: contains this bit: "ensuring a null terminator (important for p/invoke scenarios)" which indicates an awareness of this issue. I was aware of Utf8String, but had not yet seen any indication that dealing with zero terminator cases was (or was not) being considered as part of the design. |
…simply encapsulates (and hides) a ReadOnlySpan of byte, with the additional property of knowing for certain that it includes a zero terminator. it also supports the fixed pattern. and it has a constructor that takes a string. change ISQLite3Provider.sqlite3_open() to accept one of these instead of ROS byte. #273
@tmds So yeah, as a first attempt, that idea was easier than I thought. See 17785ea So The notion here is that As mentioned in the commit message, this is an experimental idea. I'm just throwing some code around to see what works. |
…d of ROS byte. requires sz to have new ways of constructing, while continuing to ensure the zero terminator. requires sz.ToString(), called by extension method utf8_to_string. #273
…delegate to use sz instead of ROS byte. #273
After continuing the I'm starting to believe the best practice here is to NOT publicly use As an alternative, the This may in fact be what @jkotas meant when he said "in your internal implementation details that prepare string passed to PInvoke", but I initially didn't understand it because I had not yet implemented my own |
I agree that the type safe wrapper makes in your case since you are passing the value around. I have not thought about it, but you have figure it out. It looks pretty neat. |
cool!
Compared to the Span approach, afaik, Utf8String won't allow you to wrap native memory. Nor can you stackalloc the string.
In a callback the |
Yeah, Utf8String (someday) will be complementary, but occupies a different place in the world. I'm trying to make it possible to use SQLitePCLRaw in a situation where everything is UTF-8, with minimal conversion and copying. For now, if the enclosing app or library is built around System.String, it won't be able to take full advantage, but I'll be more ready for the day when Utf8String is available.
Yeah, the method I experimentally converted over was already returning Thanks. Cheers. |
…hich uses the length) or sz (which does not), since the underlying sqlite3 APIs allow that. #273
…vel, since the sqlite3 API allows either approach #273
…ion method instead of sz.FromString() #273
The use of Span has raised some interesting questions.
In the SQLite API, when a parameter is a string, it is usually just a pointer, with the (documented) expectation that the string ends with a zero terminator byte.
So, at a higher layer, if we use a
ReadOnlySpan<byte>
for that string, the Length or the span is ignored. By the time it gets to SQLite, it's just a pointer. Is that a signal that we shouldn't be using span?Suppose the string is "hello". That's 5 bytes of actual string plus one byte for the zero terminator. We could pass a span of length 5 or a span of length 6. It doesn't matter to SQLite, because it's just going to look for that zero byte anyway.
Heck, we could pass a span of length 1, and it still wouldn't matter. But I mention that just to illustrate the problem. Clearly, if we're going to use a span, we should have the length be correct.
But which is more correct? Should the Length include the zero terminator or not?
Including the zero byte in the length seems wrong because the zero is not part of the string data. If we called strlen() on the pointer, it would return a length which did not include the zero. If we passed this pointer to System.Text methods to convert it from UTF-8 to System.String, the length we provide should not include the zero byte. The length of the string "hello" is 5.
However, it also seems wrong to not include the zero byte in the length. The zero byte may not be part of the string, but it is a documented and required part of the block of bytes the function expects. And if we don't include the zero byte, then we are passing a span to a function which is going to walk beyond the end of that span's length, which seems wrong. The length of the zero-terminated block of memory that represents "hello" is 6.
At the moment, the solution I have chosen is to include the zero byte in the span length. And the provider code checks to make sure the last byte of the span is a zero.
But this is only half the story. We have a similar but slightly different set of problems for cases where SQLite is returning a string or passing one in a callback.
In this case, SQLite itself is responsible for putting that zero byte in place, and it does. But when we construct a
ReadOnlySpan<byte>
for that piece of memory, again, do we include the zero byte in the length or not?In this case, a big difference is that the code that receives that span is likely to use the Length and ignore the presence of any zero terminator For example, using System.Text functions to convert from UTF-8 to a string, the span Length must be correct and must not include the zero terminator.
If we did include the zero terminator in this upward-moving set of strings, then most cases that use the spans would need to subtract one from the Length before using it. And should it check to make sure the last byte actually is a zero before doing so?
So for the moment, the solution I have chosen is to NOT include the zero terminator when constructing a span for a string that is coming from SQLite.
Neither of these decisions seems correct. Each one merely seems to be the least incorrect option.
And then there is the inconsistency. I have two very similar questions, but I came up with two different answers. That too seems wrong. For example, as things stand right now, if you receive a string from SQLite as a span and then pass that span directly back to another SQLite function, it won't work.
The text was updated successfully, but these errors were encountered: