Skip to content

Commit

Permalink
Merge pull request #1052 from giampaolo/1040-fix-unicode
Browse files Browse the repository at this point in the history
#1040 fix unicode
  • Loading branch information
giampaolo authored May 3, 2017
2 parents e6f5f49 + 04111d9 commit 91f26bb
Show file tree
Hide file tree
Showing 19 changed files with 486 additions and 359 deletions.
17 changes: 17 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
Process.as_dict(): "attrs" and "ad_value". With this you can iterate over all
processes in one shot without needing to catch NoSuchProcess and do list/dict
comprehensions.
- 1040_: implemented full unicode support.

**Bug fixes**

Expand All @@ -29,15 +30,31 @@
properly handle unicode paths and may raise UnicodeDecodeError.
- 1033_: [OSX, FreeBSD] memory leak for net_connections() and
Process.connections() when retrieving UNIX sockets (kind='unix').
- 1040_: fixed many unicode related issues such as UnicodeDecodeError on
Python 3 + UNIX and invalid encoded data on Windows.
- 1046_: [Windows] disk_partitions() on Windows overrides user's SetErrorMode.
- 1047_: [Windows] Process username(): memory leak in case exception is thrown.
- 1048_: [Windows] users()'s host field report an invalid IP address.

**Porting notes**

- 1039_: returned types consolidation:
- Windows / Process.cpu_times(): fields #3 and #4 were int instead of float
- Linux / FreeBSD: connections('unix'): raddr is now set to "" instead of
None
- OpenBSD: connections('unix'): laddr and raddr are now set to "" instead of
None
- 1040_: all strings are encoded by using OS fs encoding.
- 1040_: the following Windows APIs returned unicode and now they return str:
- Process.memory_maps().path
- WindosService.bin_path()
- WindosService.description()
- WindosService.display_name()
- WindosService.username()
- 1046_: [Windows] disk_partitions() on Windows overrides user's SetErrorMode.
- 1047_: [Windows] Process username(): memory leak in case exception is thrown.
- 1050_: [Windows] Process.memory_maps memory() leaks memory.
>>>>>>> master

*2017-04-10*

Expand Down
38 changes: 38 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2239,6 +2239,44 @@ Constants
>>> if psutil.version_info >= (4, 5):
... pass

----

Unicode
=======

Starting from version 5.3.0 psutil
`fully supports unicode <https://github.com/giampaolo/psutil/issues/1040>`__.
The notes below apply to *any* method returning a string such as
:meth:`Process.exe` or :meth:`Process.cwd`, including non-filesystem related
methods such as :meth:`Process.username`:

* all strings are encoded by using the OS filesystem encoding which varies
depending on the platform you're on (e.g. UTF-8 on Linux, mbcs on Win)
* no API call is supposed to crash with ``UnicodeDecodeError``
* instead, in case of badly encoded data returned by the OS, the following error handlers are used to replace the bad characters in the string:
* Python 2: ``"replace"``
* Python 3: ``"surrogatescape"`` on POSIX and ``"replace"`` on Windows
* on Python 2 all APIs return bytes (``str`` type), never ``unicode``
* on Python 2 you can go back to unicode by doing:

.. code-block:: python
>>> unicode(p.exe(), sys.getdefaultencoding(), errors="replace")
Example which filters processes with a funky name working with Python 2 and 3::

# -*- coding: utf-8 -*-
import psutil, sys

PY3 = sys.version_info[0] == 2
LOOKFOR = u"ƒőő"
for proc in psutil.process_iter(attrs=['name']):
name = proc.info['name']
if not PY3:
name = unicode(name, sys.getdefaultencoding(), errors="replace")
if LOOKFOR == name:
print("process %s found" % p)

Recipes
=======

Expand Down
83 changes: 60 additions & 23 deletions psutil/_psutil_bsd.c
Original file line number Diff line number Diff line change
Expand Up @@ -215,11 +215,7 @@ psutil_proc_oneshot_info(PyObject *self, PyObject *args) {
#elif defined(PSUTIL_OPENBSD) || defined(PSUTIL_NETBSD)
sprintf(str, "%s", kp.p_comm);
#endif
#if PY_MAJOR_VERSION >= 3
py_name = PyUnicode_DecodeFSDefault(str);
#else
py_name = Py_BuildValue("s", str);
#endif
if (! py_name) {
// Likely a decoding error. We don't want to fail the whole
// operation. The python module may retry with proc_name().
Expand Down Expand Up @@ -372,12 +368,7 @@ psutil_proc_name(PyObject *self, PyObject *args) {
#elif defined(PSUTIL_OPENBSD) || defined(PSUTIL_NETBSD)
sprintf(str, "%s", kp.p_comm);
#endif

#if PY_MAJOR_VERSION >= 3
return PyUnicode_DecodeFSDefault(str);
#else
return Py_BuildValue("s", str);
#endif
}


Expand Down Expand Up @@ -472,6 +463,7 @@ psutil_proc_open_files(PyObject *self, PyObject *args) {
struct kinfo_file *kif;
kinfo_proc kipp;
PyObject *py_tuple = NULL;
PyObject *py_path = NULL;
PyObject *py_retlist = PyList_New(0);

if (py_retlist == NULL)
Expand Down Expand Up @@ -507,12 +499,16 @@ psutil_proc_open_files(PyObject *self, PyObject *args) {
// XXX - it appears path is not exposed in the kinfo_file struct.
path = "";
#endif
py_path = PyUnicode_DecodeFSDefault(path);
if (! py_path)
goto error;
if (regular == 1) {
py_tuple = Py_BuildValue("(si)", path, fd);
py_tuple = Py_BuildValue("(Oi)", py_path, fd);
if (py_tuple == NULL)
goto error;
if (PyList_Append(py_retlist, py_tuple))
goto error;
Py_DECREF(py_path);
Py_DECREF(py_tuple);
}
}
Expand Down Expand Up @@ -546,6 +542,8 @@ psutil_disk_partitions(PyObject *self, PyObject *args) {
struct statfs *fs = NULL;
#endif
PyObject *py_retlist = PyList_New(0);
PyObject *py_dev = NULL;
PyObject *py_mountp = NULL;
PyObject *py_tuple = NULL;

if (py_retlist == NULL)
Expand Down Expand Up @@ -658,22 +656,32 @@ psutil_disk_partitions(PyObject *self, PyObject *args) {
if (flags & MNT_NODEVMTIME)
strlcat(opts, ",nodevmtime", sizeof(opts));
#endif
py_tuple = Py_BuildValue("(ssss)",
fs[i].f_mntfromname, // device
fs[i].f_mntonname, // mount point
py_dev = PyUnicode_DecodeFSDefault(fs[i].f_mntfromname);
if (! py_dev)
goto error;
py_mountp = PyUnicode_DecodeFSDefault(fs[i].f_mntonname);
if (! py_mountp)
goto error;
py_tuple = Py_BuildValue("(OOss)",
py_dev, // device
py_mountp, // mount point
fs[i].f_fstypename, // fs type
opts); // options
if (!py_tuple)
goto error;
if (PyList_Append(py_retlist, py_tuple))
goto error;
Py_DECREF(py_dev);
Py_DECREF(py_mountp);
Py_DECREF(py_tuple);
}

free(fs);
return py_retlist;

error:
Py_XDECREF(py_dev);
Py_XDECREF(py_mountp);
Py_XDECREF(py_tuple);
Py_DECREF(py_retlist);
if (fs != NULL)
Expand Down Expand Up @@ -783,6 +791,9 @@ psutil_net_io_counters(PyObject *self, PyObject *args) {
static PyObject *
psutil_users(PyObject *self, PyObject *args) {
PyObject *py_retlist = PyList_New(0);
PyObject *py_username = NULL;
PyObject *py_tty = NULL;
PyObject *py_hostname = NULL;
PyObject *py_tuple = NULL;

if (py_retlist == NULL)
Expand All @@ -801,12 +812,21 @@ psutil_users(PyObject *self, PyObject *args) {
while (fread(&ut, sizeof(ut), 1, fp) == 1) {
if (*ut.ut_name == '\0')
continue;
py_username = PyUnicode_DecodeFSDefault(ut.ut_name);
if (! py_username)
goto error;
py_tty = PyUnicode_DecodeFSDefault(ut.ut_line);
if (! py_tty)
goto error;
py_hostname = PyUnicode_DecodeFSDefault(ut.ut_host);
if (! py_hostname)
goto error;
py_tuple = Py_BuildValue(
"(sssfi)",
ut.ut_name, // username
ut.ut_line, // tty
ut.ut_host, // hostname
(float)ut.ut_time, // start time
"(OOOfi)",
py_username, // username
py_tty, // tty
py_hostname, // hostname
(float)ut.ut_time, // start time
#ifdef PSUTIL_OPENBSD
-1 // process id (set to None later)
#else
Expand All @@ -821,22 +841,33 @@ psutil_users(PyObject *self, PyObject *args) {
fclose(fp);
goto error;
}
Py_DECREF(py_username);
Py_DECREF(py_tty);
Py_DECREF(py_hostname);
Py_DECREF(py_tuple);
}

fclose(fp);
#else
struct utmpx *utx;

setutxent();
while ((utx = getutxent()) != NULL) {
if (utx->ut_type != USER_PROCESS)
continue;
py_username = PyUnicode_DecodeFSDefault(utx->ut_user);
if (! py_username)
goto error;
py_tty = PyUnicode_DecodeFSDefault(utx->ut_line);
if (! py_tty)
goto error;
py_hostname = PyUnicode_DecodeFSDefault(utx->ut_host);
if (! py_hostname)
goto error;
py_tuple = Py_BuildValue(
"(sssfi)",
utx->ut_user, // username
utx->ut_line, // tty
utx->ut_host, // hostname
"(OOOfi)",
py_username, // username
py_tty, // tty
py_hostname, // hostname
(float)utx->ut_tv.tv_sec, // start time
#ifdef PSUTIL_OPENBSD
-1 // process id (set to None later)
Expand All @@ -853,6 +884,9 @@ psutil_users(PyObject *self, PyObject *args) {
endutxent();
goto error;
}
Py_DECREF(py_username);
Py_DECREF(py_tty);
Py_DECREF(py_hostname);
Py_DECREF(py_tuple);
}

Expand All @@ -861,6 +895,9 @@ psutil_users(PyObject *self, PyObject *args) {
return py_retlist;

error:
Py_XDECREF(py_username);
Py_XDECREF(py_tty);
Py_XDECREF(py_hostname);
Py_XDECREF(py_tuple);
Py_DECREF(py_retlist);
return NULL;
Expand Down
19 changes: 19 additions & 0 deletions psutil/_psutil_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,22 @@ AccessDenied(void) {
Py_XDECREF(exc);
return NULL;
}


/*
* Backport of unicode FS APIs from Python 3.
* On Python 2 we just return a plain byte string
* which is never supposed to raise decoding errors.
* See: https://github.com/giampaolo/psutil/issues/1040
*/
#if PY_MAJOR_VERSION < 3
PyObject *
PyUnicode_DecodeFSDefault(char *s) {
return PyString_FromString(s);
}

PyObject *
PyUnicode_DecodeFSDefaultAndSize(char *s, Py_ssize_t size) {
return PyString_FromStringAndSize(s, size);
}
#endif
4 changes: 4 additions & 0 deletions psutil/_psutil_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,7 @@

PyObject* AccessDenied(void);
PyObject* NoSuchProcess(void);
#if PY_MAJOR_VERSION < 3
PyObject* PyUnicode_DecodeFSDefault(char *s);
PyObject* PyUnicode_DecodeFSDefaultAndSize(char *s, Py_ssize_t size);
#endif
Loading

0 comments on commit 91f26bb

Please sign in to comment.