Unicode (read: Emoji) support #6

RLovelett · 2016-07-29T13:31:57Z

When auto-completing a line that contains an emoji (or any other multi-byte unicode character) the suggestions stop working.

For instance, typing let 💯 = Foo( does accurately bring up the suggestion box. However the suggestions are not relevant to the location of the current cursor position.

Overview

The problem is that FullTextDocument does not handle Unicode characters.

For example, say we have two Swift source documents:

ascii.swift

struct Foo {
    let bar: Int
}

let x = Foo()

unicode.swift

struct Foo {
    let bar: Int
}

let 💯 = Foo()

Both source documents have the same number of code points, e.g., 46, but they have a different number of bytes, e.g., 46 for ascii.swift and 49 for unicode.swift.

Therefore, if you were to ask for the byte-offset of the closing parenthesis in ascii.swift it would be 45. Compare that with unicode.swift which would have the value 48.

Example

Using the same above documents, ascii.swift and unicode.swift.

import * as fs from 'fs';
import { TextDocument, Position } from 'vscode-languageserver';

const ascii = '/path/to/ascii.swift';
const unicode = '/path/to/unicode.swift';

// Load the text documents
let asciiBuffer: Promise<Buffer> = new Promise((resolve, reject) => {
    fs.readFile(ascii, (err, data) => {
        if (err) { reject(err); }
        else { resolve(data); }
    });
});

let unicodeBuffer: Promise<Buffer> = new Promise((resolve, reject) => {
    fs.readFile(unicode, (err, data) => {
        if (err) { reject(err); }
        else { resolve(data); }
    });
});

// REMEMBER Position is zero indexed!
// https://github.com/Microsoft/vscode-languageserver-node/blob/a9f36d43a789e6fd9c16e5e50fc818eb35d097db/types/src/main.ts#L12
let position = Position.create(4, 12);

let asciiByteOffset = asciiBuffer.then((buffer) => TextDocument.create(ascii, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 45 ✅

let unicodeByteOffset = unicodeBuffer.then((buffer) => TextDocument.create(unicode, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 45 ❌

Resolution?

One idea that I've been working towards is creating a new class UnicodeTextDocument that conforms to the TextDocument interface.

Which could serve as a drop-in replacement for TextDocument that transparently provides byte-offset.

Such that you could do:

let asciiByteOffset = asciiBuffer.then((buffer) => new UnicodeTextDocument(ascii, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 45 ✅

let unicodeByteOffset = unicodeBuffer.then((buffer) => new UnicodeTextDocument(unicode, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 48 ✅

The text was updated successfully, but these errors were encountered:

RLovelett · 2016-08-01T18:14:36Z

Sadly this is probably going to roll into the next (or a subsequent) release. It turns out this is going to be more in-depth than I hoped.

Basically it boils down to the fact that the TextDocument's method, offsetAt is not a byte aware offset, like SourceKit requires. This means we need to write a byte aware version of TextDocument and/or offsetAt.

RLovelett · 2016-08-03T13:34:42Z

I'm going to try and list all the up-stream bugs related to this in VS Code:

Cursor only works correctly on Basic Multilingual Plane microsoft/vscode#1273

RLovelett added the bug label Jul 29, 2016

RLovelett added this to the Improve Code Completion milestone Jul 29, 2016

RLovelett removed this from the Improve Code Completion milestone Aug 1, 2016

RLovelett mentioned this issue Aug 3, 2016

[WIP] Implement UnicodeTextDocument #16

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode (read: Emoji) support #6

Unicode (read: Emoji) support #6

RLovelett commented Jul 29, 2016 •

edited

Loading

RLovelett commented Aug 1, 2016

RLovelett commented Aug 3, 2016

Unicode (read: Emoji) support #6

Unicode (read: Emoji) support #6

Comments

RLovelett commented Jul 29, 2016 • edited Loading

Overview

ascii.swift

unicode.swift

Example

Resolution?

RLovelett commented Aug 1, 2016

RLovelett commented Aug 3, 2016

RLovelett commented Jul 29, 2016 •

edited

Loading