Super fast Python: Good Practices

Published on: Nov 9, 2022

Super fast Python (Part-2): Good practices

In the earlier post on why Python is slow?, we discussed slowness is in Python due to its internal design of some essential components like GIL, dynamic typing, and interpretation.

In this blog, we discuss some good practices to speed up Python incredibly faster.

This is the second post in the series on Python performance and Optimization. The series points out the utilization of inbuilt libraries, low-level code conversions, and other Python implementations to speed-up Python. The other posts included in this series are

(Part-1): Why Python is slow?
(Part-2): Good practices to write fast Python code (this post)
(Part-3): Multi-processing in Python
(Part-4): Use Cython to get speed as fast as C
(Part-5): Use Numba to speed up Python Functions and Numeric calculations

The following section describes the various good practices one can use to make Python super speed (up to 30% or more) without any external support like PyPy, Cython, Numpy, etc.

Python good practices for super fast code

Use built-in data structures and libraries

As Python data types are implemented directly in C, using the built-in types like list, map, and trees, compared to custom types we define, really helps the program to run faster.

Also, use built-in libraries for common algorithms like counting the duplicates, summing all list elements, finding the maximum element, etc, because these are all already written in C and compiled which makes these functions run faster than custom functions we write.

1from random import randint
2
3rand_nums = [randint(1, 100) for _ in range(100000)]
4

Create 100000 random numbers between 1 and 100.

1%%timeit
2cc = 0
3for i in range(len(rand_nums)):
4    cc += rand_nums[i]
5

output
16.02 ms ± 94 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2

Manually summing up the all numbers over running a loop takes approximately 6ms.

1%%timeit
2cc = sum(rand_nums)
3

output
1332 µs ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2

Using built-in sum() takes only 0.3ms approximately.

Local Variables vs Global Variables

When we call a function (a routine), the system pauses the code execution at the call site in the current routine (say main()) where the call has been made and places the called function at the top of the call stack. Imagine if we have defined numerous global variables and made multiple function calls. The system has to make sure that all these global variables should be available for any routine placed in the call stack at all times. The system has to provide a lookup mechanism for both local and global variables for each routine. And with global variables, this lookup mechanism may take some time than local variables.

Import the sub-modules and functions directly

When importing any module to use its sub-modules, classes, or functions, import them directly instead of importing just the module. When accessing objects using ., it triggers dictionary lookup using ___getattribute__. If we call the object multiple times using ._, that may increase the program time.

1# instead of this
2import abc
3def_obj = abc.Def()
4
5# do this
6from abc import Def
7def_obj = Def()
8

Limit the usage of '.'

Speaking of the lookup time with the module's object references, the same can be applied to the referencing of properties and functions of an object(both custom and in-built).

1%%timeit
2ll = []
3for i in range(len(rand_nums)):
4    ll.append(rand_nums[i])
5

output
16.67 ms ± 84.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2

Adding the list elements by appending with list.append() takes more time than following code because the function is assigned to a variable (functions are first-class citizens in Python) and used inside the loop. This simple practice avoids referencing the functions with '.' too often and finally limits the need for dictionary lookup.

1%%timeit
2ll = []
3ll_append = ll.append
4for i in range(len(rand_nums)):
5    ll_append(rand_nums[i])
6

output
15.45 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2

Avoid writing functions unnecessary

It's good to have code separability by using functions for each independent task. But, as functions in Python are relatively more expensive than C/C++ due to boxing and unboxing dynamic variables and other factors, limit writing functions for unnecessary cases like one-liners.

Don't wrap lambdas around functions

Overuse or misuse of lambdas is not a good practice. It's common to wrap functions inside lambdas which do the same thing without wrapping.

Consider the following two functions for sorting a list based on absolute values.

1def fun_sort_with_lambda(l):
2    return sorted(l, key=lambda x: abs(x))
3
4def fun_sort_without_lambda(l):
5    return sorted(l, key=abs)
6

If we look at the CPython bytecode for the above functions with lambda expression passed as a key and with abs function object as a key,

1>>> from dis import dis
2>>> dis(fun_sort_with_lambda)
3  2           0 LOAD_GLOBAL              0 (sorted)
4              2 LOAD_FAST                0 (l)
5              4 LOAD_CONST               1 (<code object <lambda> at 0x7fc51b3a19d0, file "<ipython-input-62-c4147c242c71>", line 2>)
6              6 LOAD_CONST               2 ('fun_sort_with_lambda.<locals>.<lambda>')
7              8 MAKE_FUNCTION            0
8             10 LOAD_CONST               3 (('key',))
9             12 CALL_FUNCTION_KW         2
10             14 RETURN_VALUE
11
12Disassembly of <code object <lambda> at 0x7fc51b3a19d0, file "<ipython-input-62-c4147c242c71>", line 2>:
13  2           0 LOAD_GLOBAL              0 (abs)
14              2 LOAD_FAST                0 (x)
15              4 CALL_FUNCTION            1
16              6 RETURN_VALUE
17
18>>> dis(fun_sort_without_lambda)
19  2           0 LOAD_GLOBAL              0 (sorted)
20              2 LOAD_FAST                0 (l)
21              4 LOAD_GLOBAL              1 (abs)
22              6 LOAD_CONST               1 (('key',))
23              8 CALL_FUNCTION_KW         2
24             10 RETURN_VALUE
25

for the function fun_sort_with_lambda, there is an additional function has been generated for lambda. We can avoid this function generation without using lambda as we can see in function fun_sort_without_lambda.

List comprehension is fast

When operating over lists like data structures, list comprehension is faster than traditional methods like looping, functional programming, etc.

1%%timeit
2rand_nums = []
3for _ in range(1000):
4    rand_nums.append(randint(1, 100))
5

1603 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2

The list comprehension version of the above code snippet runs faster.

output
1%%timeit
2rand_nums = [randint(1, 100) for _ in range(1000)]
3

output
1565 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2

The optimization practices are not limited to the above approaches. We can check how much time it is taking for each line by using libraries like CProfile and making changes to run faster. In the next blog, we discuss how to improve Python computing efficiency using multiprocessing.