Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce custom UTF-8 decoding pipeline.
WordPress relies on various extensions, regular expressions, and basic string operations when working with text potentially encoded as UTF-8. In this patch an efficient UTF-8 decoding pipeline is introduced which can remove these dependencies, normalize all decoding behaviors, and open up new kinds of processing opportunities. The decoder was taken from [Björn Höhrmann]. While it may be possible that other methods are more efficient, such as in the multi-byte extension, this decoder provides a streamable interface useful for more flexible kinds of processing: for example, whether or not to replace invalid byte sequences, zero-memory-overhead code point counting, and partially decoding strings. [Björn Höhrmann]: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
- Loading branch information