Introduction

This example code exposes a failure in SWIG's default behavior in Python 3. The default mapping from byte strings (char * or std::string) to Python objects attempts to decode those bytes as UTF-8, and when this decoding fails there is no way to obtain the original byte string if the user wants to use a different codec or treat the data as a raw sequence of bytes. This behavior can be overridden in each project, but SWIG's default behavior may as well be as helpful as possible.

This example code is related to SWIG issue #165, which changes the default decoding to use the surrogateescape error handler by default.

Usage

Compilation

Run make.

Testing

Enter the appropriate build directory, run python3, import the SWIG module and call its test method. With Python 3.4:

$ cd build/lib.linux-x86_64-3.4/
$ python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodetest
>>> unicodetest.test()

Without the patch for SWIG issue 165, this produces the following output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

With that patch, this produces:

'h\udce9llo'

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
setup.py		setup.py
unicodetest.c		unicodetest.c
unicodetest.h		unicodetest.h
unicodetest.i		unicodetest.i

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

Compilation

Testing

About

Releases

Packages

Languages

hfalcic/swig-py3-bytes-failure

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Compilation

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages