Skip to content

Latest commit

 

History

History
495 lines (412 loc) · 36.5 KB

Python_Tips.md

File metadata and controls

495 lines (412 loc) · 36.5 KB

Python Tips/Shortcuts

Tips and important things that I've learned for the Python programming language.

Cheat Sheets

Online Python Shells/IDEs

Shells

IDEs

Forums/Social Media

Tutorials and Learning

Free Books

Modules and Exports

The __all__ directive

  • Defines the public interface of the module. In other words, it provides the explicit list of the names (methods, classes, etc.) that are exported by the module and that are available when imported by other Python modules. In particular, __all__ ensures that only the symbols listed are available for use by import, allowing you to control access (i.e., to make methods, variables, and classes "private").
  • For example, if you have a module named module, with __init__.py containing the following:
# __init__.py

__all__ = ['foo', 'Bar']

Then, when module is used by your program by either of the following methods:

import *

or (preferably)

from module import *

This will import only the symbols foo and Bar (presumably a class, since it is capitalized) into program.

  • Should be placed in the __init__.py file of the module.

Reference1
Reference2

Virtual Environments and Dependency Management

Dependency management in Python is a key aspect of successful development just as with any modern programming language. Python has a somewhat notorious reputation when it comes to dependency management, but things are markedly better with Python 3. Python's solution to depenency management is the virtual environment, often abbreviated venv. Virtual environments provide an isolated run-time environment (self-contained directory tree) with the Python environment plus additional packages with specific versions applicable to the given application.

The Python community differs on whether virtual environments should be contained within the project directory for an application or separate from (outside of) it. My preference is to keep the venv in the project directory in a hidden directory named .venv. This helps to avoid confusion about which virtual environment directories correspond to which projects, especially if you rename your project directories.

The basic workflow for virtual environments is:

  1. Create
  2. Activate
  3. Use
  4. Deactivate

Let's look at each of these phases in turn.

Create

To create a virtual environment, in your shell (command prompt), navigate to the desired directory for it and run:

python3 -m venv .venv

Substitute whatever name you prefer for the virtual environment for .venv above. (Again, I use .venv so that it will be a hidden directory under Linux and the name shows that it is for the virtual environment, but you can use whatever name you like, such as the project name itself.)

Activate

Once the virtual environment has been created, it has not yet been activated, meaning that the system Python (or perhaps even another virtual environment!) is still active. To confirm this, on Linux, run which python3 and it is likely to return /usr/bin/python3. We must activate our new virtual environment. The process is slightly different between Linux/MacOS and Windows. On Windows, run:

.venv\Scripts\activate.bat

On Linux/MacOS, run:

source .venv/bin/activate

Again, as above, substitute the name of your virtual environment of .venv in these commands.

Activating the environment will update your command prompt by prepending the name of the virtual environment in parentheses to your original prompt. Likewise, it set several other environment variables to configure the environment to use the virtual environment.

As above, run which python3 to determine the path of the Python executable for the virtual environment. You should see it in a directory under the virtual environment now! You can further confirm this by opening the Python shell (REPL) and checking the sys.path parameter:

(.venv) $ python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python38.zip', '/usr/lib/python3.8',  '/home/tim/.venv/lib/python3.8/site-packages']

Deactivate

Before we look at using our virtual environment, let's discuss how to deactivate it. When we finish with a specific virtual environment, such as when we want to switch to another project, we want to make sure that we deactivate the virtual environment. This is necessary to avoid accidentally installing a package (or the wrong version of a package!) in that environment.

To deactivate our venv, we simply "undo" the activate process from above. In Windows, run:

.venv\Scripts\deactivate.bat

In Linux/MacOS, from any directory run:

deactivate

As with activating the virtual environment, the command prompt will change to remove the reference to the virtual environment name and the other settings will likewise be removed, setting things back to the system Python environment.

Using Your Virtual Environment

Once you've created and activated the virtual environment, you are ready to use it for your development work.

Typically, the first thing that you want to do is ensure that pip is up to date:

python3 -m pip install --upgrade pip

Now that you have the latest version of pip, you can install the other packages that you need, including any version-specific instances. For example, let's say that you need to install version 2.24 of the requests HTTP library, due to a known issue with the later versions. In this case, you would run:

python3 -m pip install -v 'requests==2.24.0'

Since this is in the virtual environment, it does not interfere with any other projects that depend on requests which require the newer versions of the package.

Finally, you can use pip to create a requirements.txt file for your project with the version-specific instances of the packages, so that you can reproduce the same configuration in your production environment:

$ python3 -m pip freeze > requirements.txt
$ more requirements.txt
certifi==2020.12.5
chardet==3.0.4
idna==2.10
pkg-resources==0.0.0
requests==2.24.0
urllib3==1.25.11

Tools for Managing Virtual Environments and Dependencies

While venv Python standard library module is the de facto tool for creating and managing virtual environments, the Python community has many options that you may wish to evaluate for your particular situation. For a more comprehensive overview of these tools, see this article.

  • virtualenv - This is the "granddaddy" of virtual environment tools and is actually the forerunner of the venv standard library module. Nevertheless, it has some handy extensions, such as "seeders" that allow you to configure all of your virtual environments with a set of standard packages. If you decide to use virtualenv, be sure to check out virtualenvwrapper, as well, since it provides a nice convenience wrapper around it to simplify use.
  • Poetry - Poetry takes a declarative approach to depenency management. You specify your dependencies in a configuration file (pyproject.toml) or via the Poetry command-line tool (poetry add package). Poetry takes care of setting up your virtual environment, installing the packages, activating/deactivating the virtual environments and so forth. You just interact with Poetry itself.
  • tox - At first glance, tox seems a bit more than just a virtual environment tool and that is, in fact, true. It is intended to be used as part of a CI/CD pipeline, specifically for testing against multiple Python versions using pytest. I'm including here, since it's a widely used tool (you've probably seen the ubiquitous tox.ini file in various Github or Gitlab repositories). Moreover, tox provides a good framework for gently transitioning from virtual environments to a more rigorous CI/CD environment.
  • Pyflow - Pyflow is a new tool (built in Rust no less!) that also uses pyproject.toml configuration. However, its "superpower" is that you can point it to a Python script itself that contains a __requires__ directive with a list of package names and it will install those packages as dependencies and build out the appropriate pyproject.toml configuration.
  • pipenv - Pipenv was developed by Kenneth Reitz of requests fame, so it's simple, solid and well-designed. It's designed to resolve some of the drawbacks of deterministic versioning with requirements.txt by using hash-based version management of its Pipfile and Pipfile.lock configuration files, similarly to other dependency management tools like npm for Node.JS and composer for PHP. As explained here, Pipfile manages the logical dependencies, meaning the explicit dependencies that your code has, such as requests library in the example above. Pipfile.lock acts as a dependency manifest for the transitive dependencies resulting from your explicit logical dependencies, such as urllib3 and certifi in the example above.
  • pyenv - Pyenv is not actually a virtual environment management tool per se, but instead is a Python version management tool. It allows you to install multiple versions of the Python compiler/run-time and switch between them quickly and easily. Each time you add a new Python version, pyenv downloads the source code for that version and compiles it from scratch, so you also have flexibility with optimizations and so forth with the Python installation, as well.

Generate random strings in Python

A common task in Python programming is generating random strings, such as passwords or dictionary keys. Here are some simple techniques for generating random strings of most any length using the Python random and string modules.

First, you need to determine the population of characters to use for your random strings. Good choices include the following constants from the string module:

Several of the punctuation characters don't make great choices for such things as passwords, so perhaps you want to define your own subset like:

SPECIAL_CHARS = '!#$%^&+-_=/@~'

You can then use any combination of these by concatenating them together:

RAND_STR_CHARS = "".join(string.ascii_uppercase, string.digits, SPECIAL_CHARS)

If you want some of the characters to be more likely to be selected, just include them more than once in your definition of RAND_STR_CHARS.

Use lambda function for simple random string generator

Now, you can create a simple lambda function that allows you pass in the length of string to generate.

import random
import string

RAND_STR_CHARS = "".join(string.ascii_uppercase, string.digits, SPECIAL_CHARS)

# lambda function to generate random string of length _n_
rand_str = lambda n: "".join(random.SystemRandom().choice(RAND_STR_CHARS) for _ in range(n))

# Generate a random string of 10 characters
rand_str_10 = rand_str(10)

Use UUID module

Another quick way to generate unique random strings is to use the Python uuid module. Since this module conforms to RFC 4122 the results are guaranteed to be unique. In particular, you can use the uuid4() method, which returns a unique 36-character value, including 4 hyphens, or 32 hexadecimal digits: 0123456789ABCDEF.

If you don't need all 32 hexadecimal digits, you can take a slice from the result, although remember that it may not be unique. In the technique below, we randomize the portion of the result returned in the slice to help reduce the likelihood of duplication.

import random
import uuid

# function to generate "random" string _n_ characters long (up to 32-characters)
def uuid_rand_str(n):
    n = int(n)
    if n > 32:
        raise ValueError("'rand_str' does not support strings greater than 32 characters in length.")
    slice_start = random.SystemRandom().randint(0, (31-(n-1)))
    return (uuid.uuid4().hex[slice_start:(slice_start+(n-1))])
    
# Generate a "random" string of 22 characters
rand_str_22 = uuid_rand_str(22)

Of course, you aren't really limited to 32 characters with this method. You can simply use uuid4() multiple times to get longer strings. For example:

def uuid_rand_str(n):
    n = int(n)
    # Number of times to run uuid4() method
    base = (n % 32) + 1
    slice_start = random.SystemRandom().randint(0, (((base*32)-1)-(n-1)))
    uuid_str = ""
    for _ in range(base):
        uuid_str = uuid_str.join(uuid.uuid4().hex)
    return (uuid_str[slice_start:(slice_start+(n-1))])
    
# Generate a "random" string of 100 characters
rand_str_100 = uuid_rand_str(100)

Install Python package to global environment with pip

In most cases, you should use Python virtual environments for your work. However, in some situations, you may need to install a package globally, such as for linters (code-formatting checkers) like pylint or flake8 or the IPython shell.

Typically, if you try to run pip against the global environment you'll get an error similar to the following.

$ python3 -m pip install flake8
ERROR: Could not find an activated virtualenv (required).

On Windows, this error will occur even if you are running as Administrator.

To correct the problem, you need to temporarily disable the virtual environment requirement with the PIP_REQUIRE_VIRTUALENV environment variable. In Linux/Unix, at the shell prompt:

$ export PIP_REQUIRE_VIRTUALENV=false

And on Windows, at the command or Powershell prompt:

$ set PIP_REQUIRE_VIRTUALENV=false

In the same shell prompt, run your pip install command as normal.

Difference between append and extend for lists

In most cases, for list data, we use the Python append() operator to add new items to an existing list. However, this has the (usually) unwanted behavior that when the type of the item added is itself a list, then list is added as a sub-list. To add the elements of the list to the end of the existing list instead, we can use the extend() operator instead.

One quirk to watch for is when using extend() with a single string, because Python will add each of the characters as elements to list.

>>> a = [1, 2, 3, 4]
>>> b = ["apple", "banana", "cherry"]
>>> c = [3.14159, 1.4142, 2.71828]
>>> a.append(10)
>>> a
[1, 2, 3, 4, 10]
>>> a.append([123, 456])
>>> a
[1, 2, 3, 4, 10, [123, 456]]
>>> a.extend([789])
>>> a
[1, 2, 3, 4, 10, [123, 456], 789]
>>> b.extend("date")
>>> b
['apple', 'banana', 'cherry', 'd', 'a', 't', 'e']
>>> c.extend([2.2360, 1.6180])
>>> c
[3.14159, 1.4142, 2.71828, 2.236, 1.618]

Reference1

Install Python cryptography package on Windows 10

The Python cryptography package depends on several external open-source libraries for implementation of the cryptographic algorithms, including the OpenSSL library. Accordingly, installation of the package requires that some of these libraries be compiled from source during the installation process.

Recently, the library has moved to using Rust compiler. However, many libraries, such as OpenSSL, still require C/C++ compiler, as well, to build them. On Windows, the recommended compiler is the Visual Studio C++ Build Tools. (Note: This application is different/separate from Visual Studio and Visual Studio Code. It is the free compiler suite for C/C++ applications that underpins Visual Studio.)

Install OpenSSL library/SDK

Before installing the Visual Studio C++ Build Tools, you will need to install the OpenSSL library/SDK. Install from the official repository, making sure to choose full package (not the "light" version) for the appropriate architecture (Win32 or Win64).

Ensure that you choose the version of OpenSSL compatible with the version of cryptography package that you are installing. For example, the current (as of this writing) version of OpenSSL is 3.0.2, which works fine with current version of cryptography (37.x.x). However, if you are installing an earlier version of cryptography, you may need to use an earlier version of OpenSSL, such as 1.1.1n. As an example, I found that I had to use OpenSSL 1.1.1n to compile cryptography version 3.4.7.

In general, the default installation settings, including copying the libraries to the Windows directory, should work fine. Make note of the installation directory, as this information will be needed later when compiling the libraries for the cryptography package.

Install Visual Studio C++ Build Tools

Download and run as an Administrator the installation wrapper for Visual Studio C++ Build Tools from the Microsoft site. This tool will, in turn, install the Visual Studio installation tool and launch it with the "Build Tools" default selection, which is also called Desktop development with C++ workload. The right pane of the window (Installation details) will list the actual components for installation; ensure that you install at least the following items:

  • Included
    • C++ Build Tools core features
    • C++ 2019 Redistributable Update
    • C++ core desktop features
  • Optional
    • MSVC v142 - VS 2019 C++ x64/x86 build tools
    • Windows 10 SDK (10.0.19041.0)
    • C++ CMake tools for Windows
    • Testing tools core features - Build Tools
    • C++ AddressSanitizer

The Included tools list is fixed and you won't be able to change it. In addition, some of the items in the Optional tools list may have different version numbers; just look for the item with the closest name and highest version number. The total installation size is likely to be approximately 6.5 GB (actual download size is approximately 25% of this total). Continue with the installation after confirming the Installation details.

Install Rust compiler and build tools

The Rust installation is quite straightforward. Download the rustup launcher (rustup-init.exe) from from the Rust web site. Open an Administrator Windows Command Prompt and run rustup-init.exe from it. Follow the default prompts, which will configure Rust in your "home" directory (i.e., C:\Users\_user_name_). Close the Windows Command Prompt after successful installation, so that environment changes are active in the next step.

Install the cryptography package

Open a Windows Command Prompt. (This Command Prompt does NOT require Administrator privileges.) Configure the Command Prompt environment for running the Visual Studio C++ Build Tools C++ compiler by running the vcvarsall.bat script; if you used the default directory for installation, it will be in C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build directory. Typically, the command will be:

C:\> "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64

Failure to set the environment configuration via vcvarsall.bat script will typically result in an error similar to:

pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory

(If you are missing the vcvarsall.bat script, then you probably did not install the Windows 10 SDK optional component when installing Visual Studio C++ Build Tools.)

We also need to set the INCLUDE and LIB environment variables to reference the appropriate directories for OpenSSL, per the cryptography install documentation. Assuming that you installed OpenSSL to the default directory (C:\Program Files\OpenSSL-win64), the commands will be:

C:\> set INCLUDE=C:\Program Files\OpenSSL-win64\include;%INCLUDE%
C:\> set LIB=C:\Program Files\OpenSSL-win64\lib;%LIB%

Now, we are almost ready to install cryptography package! Ensure that you are in the desired virtual environment (unless you intend to install globally and have your future virtual environments inherit this configuration); for example, you might run .venv/Scripts/activate.bat from within your project directory. After activating the virtual environment, install cryptography package using pip as usual:

C:\> python -m pip install cryptography

During the installation, you'll see the usual process of downloading the various wheel packages. Likewise, pip will download the required source files for the binary packages and compile and install them. Ultimately, the installation should complete with message similar to:

Installing collected packages: cryptography
  Running setup.py install for cryptography ... done
Successfully installed cryptography-37.0.1

Reference1

Various tips for datetime class

The Python datetime class is one of the most important, but least understood, in the Standard Library. Here are a few tips that I've picked up.

Create a timezone-aware datetime object set to "now"

from datetime import datetime, timezone
n = datetime.now(tz=timezone.utc)

OR

from datetime import datetime, timezone
n = datetime.now()
n.replace(tzinfo=timezone.utc)

Remove microseconds from datetime object when formatting with isoformat()

from datetime import datetime, timezone
n = datetime.now(tz=timezone.utc).isoformat(timespec="seconds")

Use copy.deepcopy to avoid unexpected results with mutable objects

By design, using assignment operator (=) in Python creates another reference to the same object. In the case of a dictionary, this means that if you create a new reference via assignment and make a change to the second reference, those changes will also be reflected on the original reference (instance), as well. For example:

>>> colors = {"red": 5, "blue": 2, "orange": 3}
>>> id(colors)
2435811006864
>>> more_colors = colors
>>> id(more_colors)
2435811006864
>>> more_colors["purple"] = 7
>>> colors
{'red': 5, 'blue': 2, 'orange': 3, 'purple': 7}

If you want more_colors to have a distinct identity (and colors to not be changed when you change more_colors!), you should make a deepcopy of colors when creating more_colors. The deepcopy method is part of the Python copy module of the Standard Library.

>>> from copy import deepcopy
>>> colors = {"red": 5, "blue": 2, "orange": 3}
>>> id(colors)
2435811478048
>>> more_colors = deepcopy(colors)
>>> id(more_colors)
2435811479200
>>> more_colors["purple"] = 13
>>> colors
{'red': 5, 'blue': 2, 'orange': 3}
>>> more_colors
{'red': 5, 'blue': 2, 'orange': 3, 'purple': 13}

Reference1

Install and configure PortableApps PyCharm CE (Community Edition)

First, this explanation is entirely non-standard, but it's what works for me and is what I use. It may or may not work for you. :)

I like to run the PortableApps versions of various applications on Windows, because they keep each application isolated and don't clutter up the Registry or cause conflicts. One of the difficulties with PortableApps is with Java-based applications, including PyCharm, which is based on the IntelliJ IDEA. This process explains how to install the PortableApps version of the free, open-source PyCharm CE (Community Edition) using the PortableApps version of OpenJDK.

  1. Download the latest version of PortableApps version of PyCharm CE (Community Edition) from mwayne's Sourceforge site: https://sourceforge.net/projects/mwayne/files/PyCharmPortable/. (All versions reference Dev_Test_1, so just choose the latest PyCharm version.) For more details, see the package source.
  2. Install PortableApps version of PyCharm CE as usual. Installation directory is up to you, but simply make a note of where you installed. For this example, we'll assume that it's installed to C:\PortableApps\PyCharmPortable.
  3. Download the latest version of the 64-bit OpenJDK JRE from PortableApps open-source Sourceforge site: https://sourceforge.net/projects/portableapps/files/OpenJDK%20JRE%20Portable/. Notes:
    • 64-bit JRE is required for PyCharm. These versions are identifed with 64 in the file name after OpenJDKJRE.
    • PyCharm requires minimum Java version of Java 11. (This is why you cannot use the common PortableApps jPortable64, because it ends at Java 8, due to licensing restrictions. OpenJDK is the open-source equivalent of the standard Java platform.
    • You may use the JDK (Java Development Kit) version of PortableApps OpenJDK instead of the JRE version, if you prefer. However, PyCharm only requires the JRE portion to run the application.
  4. Install the PortableApps version of OpenJDK JRE as usual. Again, note the directory which you installed it to. The default/standard installation directory will include CommonFiles in the path name.
  5. In the PortableApps PyCharm installation folder (in our example, C:\PortableApps\PyCharmPortable), navigate to App\PyCharm\bin sub-folder and open pycharm.bat in a text editor (Windows Notepad works fine). In pycharm.bat, add the following line immediately below the @ECHO OFF statement at the top of the file:
SET JAVA_HOME="C:\PortableApps\CommonFiles\OpenJDKJRE64"

(Remember that your path may vary depending on where you installed the OpenJDK JRE in step #4 above.) Save the file and close the text editor. Run pycharm.bat, either from Windows Command Prompt or by double-clicking in Windows Explorer to ensure that it launches PyCharm properly. If PyCharm does not launch, ensure that the path to OpenJDK JRE is correct.

  1. Open a text editor (again, Windows Notepad is fine) with an empty/blank file and enter the following lines and save the file as start_pycharm.bat in App\PyCharm sub-folder of PortableApps PyCharm installation folder (i.e., one directory level above where pycharm.bat is stored).
@ECHO OFF
CMD.EXE /C "START /MIN /HIGH C:\PortableApps\PyCharmPortable\App\PyCharm\bin\pycharm.bat"

(Again, the path to pycharm.bat may be different for you.) While this step is optional, this batch file gives some flexibility in launching PyCharm by allowing you adjust the priority (in the example above, we use "HIGH") of the application.

  1. Create a desktop/Start Menu launcher for PyCharm by setting start_pycharm.bat as the target (the application to launch) and choosing pycharm.ico in App\PyCharm\bin as the icon. The launcher will open a minimized Windows Command Prompt when PyCharm is launched. You must leave it running while PyCharm is running; it will close automatically when PyCharm is closed. (There are ways to hide the Command Prompt; see, for example, here and here.)

"Flatten" list/array in Python

A common task in Python is to "flatten" (or reduce the dimension) of a list/array, so that elements of nested lists/arrays are made members of the top-level list/array. For example, if you consider a "list of lists" (two-dimensional array or perhaps a matrix), you often want to "flatten" it to a single list with the elements of the nested lists as the elements. A common scenario is when extracting items (or keys) from a list/array of dictionaries.

Consider this list of lists:

list_of_lists = [[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]]

Essentially, list_of_lists is a list of three nested lists, let's call them sublists, with three elements each (or you could think of it as a 3x3 matrix). We can think of each of the sublists as "rows". To "flatten" the list, we can use a (relatively) simple Python list comprehension to do this:

flat_list = [item for sublist in list_of_lists for item in sublist]
print(flat_list)	# [1, 2, 3, 4, 5, 6, 7, 8, 9]

At first glance, this list comprehension may seem a bit strange, but if you think about what it is doing in terms of Python for loops, it makes very good sense:

flat_list = list()
for sub_list in list_of_lists:
	for item in sub_list:
		flat_list.append(item)
print(flat_list)	# [1, 2, 3, 4, 5, 6, 7, 8, 9]

From this, we can see that the list comprehension above is simply a double list comprehension where we are first iterating over the sublists and then passing those into the second iteration to handle the individual elements of sublists! Once you've used this syntax a few times, it'll become second nature.

Reference1
Reference2 Reference3

Avoid pass in exception handlers with context handler

Sometimes in an exception handler (try-except block), you simply want log the exception and continue without any specific handling, such as raising your own exception with more details. The typical pattern/construct for this is:

try:
  something_that_might_throw_exception()
except Exception:
  pass

However, from this code it's not clear that you intend to "eat" the exception. To make it explicit that you want to ignore the exception, you can use the suppress method from contextlib standard library.

from contextlib import suppress

with suppress(Exception):
  something_that_might_throw_exception()

You simply replace Exception with the specific type of exception, if you want more granularity in the handling. For example:

from contextlib import suppress

with suppress(FileNotFoundError):
  os.remove("maybe_file.txt")

Reference1