JVM | Platform | Status |
---|---|---|
OpenJDK (Temurin) Current | Linux | |
OpenJDK (Temurin) LTS | Linux | |
OpenJDK (Temurin) Current | Windows | |
OpenJDK (Temurin) LTS | Windows |
The jeucreader
package provides an interface for reading Unicode codepoints
one at a time.
- Unicode codepoint reader interface.
- High coverage test suite.
- Written in pure Java 17 with no dependencies.
- OSGi-ready
- JPMS-ready
- ISC license.
For some reason, Java does not expose any interface to read individual Unicode
codepoints from any kind of I/O stream. It does provide methods to, for
example, read text into a String
and then iterate over the codepoints of
the String
.
The jeucreader
package attempts to provide this missing functionality.
Given a java.io.Reader r
, instantiate a UnicodeCharacterReaderType
and
use it to read individual codepoints:
Reader r;
try (var u = UnicodeCharacterReader.newReader(r)) {
int c0 = u.readCodePoint();
int c1 = u.readCodePoint();
int c2 = u.readCodePoint();
...
}
On consuming malformed text, the reader may raise subtypes of IOException
such as InvalidSurrogatePair
, MissingLowSurrogate
, OrphanLowSurrogate
,
and etc.