-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement #scan_integer to efficiently parse Integer #114
Conversation
Arf, forgot this gem had a Java version. Looking at it. |
Another thing that in unclear is whether we assume ASCII compatible encoding or not. |
>> "1234".encode(Encoding::UTF_32BE).to_i
(irb):2:in `to_i': ASCII incompatible encoding: UTF-32BE (Encoding::CompatibilityError) So I guess we should just do the same here. |
1d5ebec
to
50ed873
Compare
PR updated with the Java implementation, and now raising when called with encoding that's not ASCII compatible. |
8c63de8
to
f8b7b02
Compare
Hum, TruffleRuby fails with
But I don't get why, doesn't it uses the C or Java implementation? I don't see any third implementation (cc @eregon ) |
f8b7b02
to
2f5a44f
Compare
Apparently, the Truffle implement is in Truffle itself, so I just skip those test like many others. |
TruffleRuby support isn't included in this repository. We can just ignore TruffleRuby. |
|
||
ptr = CURPTR(p); | ||
|
||
if (ptr[len] == '-' || ptr[len] == '+') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ptr[len]
safe when S_RESTLEN()
is 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, because (for now?) Ruby strings are guaranteed to be NULL terminated. So if we're EOF, ptr[len]
is \0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that a String
created by rb_str_new_static()
may not be terminated by \0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically you can but the documentation clearly states:
Identical to rb_str_new(), except it takes a C string literal.
So it assume it's a literal, hence is NUL terminated.
I can add the checks if you feel strongly about it, but it's a lot of busy code for something I think isn't supposed to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the extra checks in #115 noetheless.
buffer = ALLOCV_N(char, buffer_v, len + 1); | ||
|
||
MEMCPY(buffer, CURPTR(p), char, len); | ||
buffer[len] = '\0'; | ||
integer = rb_cstr2inum(buffer, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we want to avoid this copy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Ruby doesn't provide an API that doesn't require a NULL terminated string. But if you want I can implement a fast path like I did in ruby/json: ruby/json@3a4dc9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. We can use rb_cstr2inum()
for now. Could you propose the fast path to Ruby itself? It may be useful for extensions.
Fix: ruby#113 This allows to directly parse an Integer from a String without needing to first allocate a sub string. Notes: The implementation is limited by design, it's meant as a first step, only the most straightforward, based 10 integers are supported.
2f5a44f
to
b41d45a
Compare
ByteList bytes = str.getByteList(); | ||
int curr = this.curr; | ||
|
||
int bite = bytes.get(curr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for asking that, I now re-learned that in Java, there are certain reserved words: https://en.wikipedia.org/wiki/List_of_Java_keywords and byte
is one of them.
EDIT: Oh,
Of these 68 keywords, 17 of them are only contextually reserved, and can sometimes be used as an identifier, unlike standard reserved words
(Ah, right, that list of contextually-reserved didn't include byte
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!!!
Closing in favor of #115 for logistical reasons. |
Fix: #113
This allows to directly parse an Integer from a String without needing to first allocate a sub string.
Notes:
The implementation is limited by design, it's meant as a first step, only the most straightforward, based 10 integers are supported.
cc @kou