Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Implementation of reverse string via CFString #2479

Merged
merged 4 commits into from
Apr 20, 2017

Conversation

msft-Jeyaram
Copy link
Contributor

@msft-Jeyaram msft-Jeyaram commented Apr 10, 2017

  • Implement _reverseString via CoreFoundation and _wcsrev
  • Remove _reverseString from Foundation (only CoreText uses it)
  • CTline _getTruncatedStringFromSourceLine should use CF*based reverse string
  • Remove redundant tests

fixes #2478

@DHowett-MSFT
Copy link

DHowett-MSFT commented Apr 10, 2017

nit: reword to be imperative mood #Pending

@msft-Jeyaram
Copy link
Contributor Author

msft-Jeyaram commented Apr 10, 2017

Getting some bench mark for old vs new #Resolved


CFMutableStringRef result = CFStringCreateMutable(kCFAllocatorDefault, 0);

while (index > 0) {
Copy link
Contributor

@rajsesh rajsesh Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to suck when it comes to perf. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well use buffer indexing, given it's somewhat "safe"


In reply to: 110780304 [](ancestors = 110780304)

Copy link

@DHowett-MSFT DHowett-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of unichars and individual character access, because it's safe for multi-byte encodings... but we should be cognizant of allocations!

CFIndex index = CFStringGetLength(string);

if (index < 2) {
return CFStringCreateCopy(kCFAllocatorDefault, string);
Copy link

@DHowett-MSFT DHowett-MSFT Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could also be CFRetain #ByDesign

Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is a mutable string, retain might not be a good idea/ #ByDesign

Copy link

@DHowett-MSFT DHowett-MSFT Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! #ByDesign

return CFStringCreateCopy(kCFAllocatorDefault, string);
}

CFMutableStringRef result = CFStringCreateMutable(kCFAllocatorDefault, 0);
Copy link

@DHowett-MSFT DHowett-MSFT Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create the new string with a capacity hint from the original string, you can avoid the inevitable reallocations #Resolved


CFMutableStringRef result = CFStringCreateMutable(kCFAllocatorDefault, 0);

while (index > 0) {
Copy link

@DHowett-MSFT DHowett-MSFT Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be cheaper to create a unichar buffer and pass that off to CFStringCreateWithCharacters -- you get the added benefit of not creating a mutable string

It should be somewhat "safe" to use utf-16 codepoints or unichars; it's definitely safer than utf-8 #Resolved

Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what i was going with first, it might be more sensible given that only CT uses it.
Hmm i might as well move it in there, but this kind of makes it more util based. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YEP utf-16 is safer than utf-8 :)


In reply to: 110780284 [](ancestors = 110780284)

@DHowett-MSFT
Copy link

DHowett-MSFT commented Apr 10, 2017

Also, I wonder how it works with composing diacritics. It might be reasonable to normalize the string first... #Resolved

#import <CFInternal.h>
#import "CFFoundationInternal.h"

CFStringRef _CFStringCreateReversedString(CFStringRef string) {
Copy link
Contributor

@rajsesh rajsesh Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there is exactly one place where this is used (and was used in the past), I don't like that we are doing this. Why not have the helper in CT and just use it there? #ByDesign

Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what my initial plan was, but since this seems very common and could be utilized by others, it would make a good addition. Also we already have some tests that can be tested independently. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do this. the _reverseString was added because some other code was using it. Given that there is only one instance where _reverseString was used, there is no need to go overboard here.


In reply to: 110781327 [](ancestors = 110781327)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, what is wrong with good old fashioned strrev variants?


In reply to: 110782273 [](ancestors = 110782273,110781327)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_strrev + UTF-16 encoding should do the trick then.


In reply to: 110782324 [](ancestors = 110782324,110782273,110781327)

Copy link
Contributor

@rajsesh rajsesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do this at all.

@msft-Jeyaram
Copy link
Contributor Author

msft-Jeyaram commented Apr 10, 2017

@rajsesh-msft still need reverse in CFString backed by strrev, that is contained in CoreText.
Updating the CR #Resolved

@msft-Jeyaram msft-Jeyaram changed the title Re-Implementation of reverse string safely via CFString Re-Implementation of reverse string via CFString Apr 11, 2017
@@ -15,7 +15,10 @@
//******************************************************************************

#import <CoreText/CTUtilities.h>
#import <Starboard/SmartTypes.h>
Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed #Resolved

@@ -188,8 +187,9 @@ CTLineRef CTLineCreateTruncatedLine(CTLineRef sourceLine, double width, CTLineTr
CFDictionaryRef attribs = CTRunGetAttributes(currentRun);

if (truncationType == kCTLineTruncationStart) {
NSString* reverse = [runString _reverseString];
NSAttributedString* string = [[NSAttributedString alloc] initWithString:reverse attributes:(NSDictionary*)attribs];
auto reverse = woc::MakeStrongCF<CFStringRef>(_CTCreateReversedString(static_cast<CFStringRef>(runString)));
Copy link
Contributor

@ms-jihua ms-jihua Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's just the one usage of it, can it be a file-scope function instead? #Resolved

}

// Returns the number (in terms of UTF-16 code pairs) of Unicode characters in a string.
CFIndex length = CFStringGetLength(string);
Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using CFStringGetBytes to be safer
#Resolved

@msft-Jeyaram
Copy link
Contributor Author

@DHowett-MSFT @ms-jihua please take a look

#import "CoreTextInternal.h"
#import "CGContextInternal.h"
#import "DWriteWrapper_CoreText.h"
#import <CoreText/CTTypesetter.h>

#import <algorithm>
#import <numeric>
#import <wchar.h>
Copy link

@DHowett-MSFT DHowett-MSFT Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer <cwchar> #Resolved

return CFStringCreateCopy(kCFAllocatorDefault, string);
}

std::vector<UniChar> characters((usedBufLen / (sizeof(UniChar))) + 1, '\0');
Copy link

@DHowett-MSFT DHowett-MSFT Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use unique_iw instead of std::vector, you can use CFStringCreateWithBytesNoCopy and detach it instead of copying the buffer. #Resolved

NSAttributedString* string = [[NSAttributedString alloc] initWithString:reverse attributes:(NSDictionary*)attribs];
auto reverse = woc::MakeStrongCF<CFStringRef>(_CTCreateReversedString(static_cast<CFStringRef>(runString)));
NSAttributedString* string =
[[NSAttributedString alloc] initWithString:static_cast<NSString*>(reverse.get()) attributes:(NSDictionary*)attribs];
Copy link

@DHowett-MSFT DHowett-MSFT Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??? can't we use CFAttributedStringCreate? #ByDesign

Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have changes for all NSAttributed, this is strictly for the reverse. #ByDesign

Copy link

@DHowett-MSFT DHowett-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I chose the wrong option.


CFIndex length = CFStringGetLength(string);
CFIndex usedBufLen;
CFStringGetBytes(string, CFRangeMake(0, length), kCFStringEncodingUTF16, 0, false, nullptr, length, &usedBufLen);
Copy link
Contributor

@ms-jihua ms-jihua Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems kind of expensive for this purpose (unless CFStringGetBytes has a fastpath here?)

I wonder if CFStringGetMaximumSizeForEncoding() would be faster/would still work? #ByDesign

Copy link
Contributor Author

@msft-Jeyaram msft-Jeyaram Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does take a faster way than if you provide a buffer.
Further improvements can be addressed by a bug to improve core text. The CR is mainly to remove Foundation dependency and what we have now is better than the earlier Foundation/copy based implementation.
#ByDesign

#import <vector>

static CFStringRef __CTCreateReversedString(CFStringRef string) {
Copy link
Contributor

@aballway aballway Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this change being tested? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnitTests


In reply to: 111007038 [](ancestors = 111007038)


CFIndex bufLen = usedBufLen + 1 * sizeof(UniChar);
woc::unique_iw<UniChar> characters(static_cast<UniChar*>(IwMalloc(bufLen)));
memset(characters.get(), '\0', bufLen);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c'mon, you only need to set the last character in the buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣 that was coming 😄

bufLen,
kCFStringEncodingUTF16,
false,
nullptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to provide an allocator to delete this string; look at the implementation of initWithBytesNoCopy:freeWhenDone: for NSString, expecting freeWhenDone=YES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already passing in the nullptr for deallocation.
Switched to use characters, which does it for you.

@msft-Jeyaram
Copy link
Contributor Author

@DHowett-MSFT please take a look

- Implemented via _wcsrev
- Reduced string copies
- Implemented using CoreFoundation
- Removed usage of Foundation associated with reverse string.

Fixes microsoft#2478
CFIndex usedBufLen;
CFStringGetBytes(string, CFRangeMake(0, length), kCFStringEncodingUTF16, 0, false, nullptr, length, &usedBufLen);

if (length < 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this above CFStringGetBytes call

@@ -1,6 +1,6 @@
//******************************************************************************
//
// Copyright (c) 2015 Microsoft Corporation. All rights reserved.
// Copyright (c) Microsoft. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These clean-ups, though nice, seem unrelated to this PR. Just revert these files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NSString _reverseString should be implemented via CFString
6 participants