Python threading and subprocesses explained

Take advantage of Python’s ability to parallelize workloads using threads for I/O-bound tasks and subprocesses for CPU-bound tasks

Python threading and subprocesses explained

By default, Python’s runtime executes in a single thread, traffic-directed by its Global Interpreter Lock (GIL). Most of the time this isn’t a significant bottleneck, but it becomes one when you want to run many jobs in parallel.

Python provides two ways to work around this: threading and multiprocessing. Each allows you to take long-running jobs, break them into parallel batches, and work on them side-by-side.

Depending on the job in question, you can sometimes speed up operations tremendously. At the very least, you can treat tasks in such a way that they don’t block other work while they wait to be completed.

In this article we’ll look at one of the fastest ways to make use of threading and subprocesses in Python, the thread and process pool.

Python threads versus Python processes

Python threads are units of work that run independently of one another. They don’t correspond to hardware threads on the CPU, though—at least not in CPython. Python threads are controlled by the GIL, so they run serially. Because only one Python thread runs at a time, they’re a useful way to organize tasks that involve some waiting. Python can execute thread A or thread C while thread B is waiting for a reply from an external system, for example.

Python processes are whole instances of the Python interpreter that run independently. Each Python process has its own GIL and its own copy of the data to be worked on. That means that multiple Python processes can run in parallel on separate hardware cores. The tradeoff is that a Python process takes longer to spin up than a Python thread.

Here’s how to choose between Python threads and Python processes:

  • If you’re performing long-running I/O bound operations, tasks that involve waiting on a service outside Python—like multiple parallel web-scraping or file-processing jobs—use threads.
  • If you’re performing long-running CPU bound operations handled by an external library written in C, such as NumPy, use threads (because here too the work is being done outside Python).
  • If you’re performing long-running CPU bound operations in Python, use processes.

Python thread pools and Python process pools

The easiest way to work with Python threads and Python processes for many kinds of jobs is by using Python’s Pool object. A Pool lets you define a set of threads or processes (your choice) that you can feed any number of jobs, which will return results in the order they finish. 

By way of example, let’s take a list of numbers from 1 to 100, construct URLS from them, and fetch them in parallel. This example is I/O bound, so there’s likely to be no discernible performance difference between the use of threads or processes, but the basic idea should be clear.

# Python 3.5+
from multiprocessing.dummy import Pool as ThreadPool from multiprocessing import Pool as ProcessPool from urllib.request import urlopen def run_tasks(function, args, pool, chunk_size=None): results =, args, chunk_size) return results def work(n): with urlopen("{n}") as f: contents = return contents if __name__ == '__main__': numbers = [x for x in range(1,100)] # Run the task using a thread pool t_p = ThreadPool() result = run_tasks(work, numbers, t_p) print (result) t_p.close() # Run the task using a process pool p_p = ProcessPool() result = run_tasks(work, numbers, p_p) print (result) p_p.close()

Python multiprocessing example

Here’s how the above example works:

The multiprocessing module provides pool objects for both threads (multiprocessing.dummy) and processes (multiprocessing). One nice thing about using multiprocessing is having the same API for threading and subprocesses, so you can create functions that work interchangeably with both as shown here.

t_p and p_p are instances of ThreadPool and ProcessPool. Both get passed to run_tasks as the type of pool to use for the task. By default, each pool instance uses a single thread or process per available CPU core. There is a certain amount of overhead associated with creating pools, so don’t overdo it. If you’re going to be processing lots of jobs over a long period of time, create the pool first and don’t dispose of it until you’re done. You dispose of a pool by calling the .close() function. is the function we use to subdivide the work. takes a function with a list of arguments to apply to each instance of the function, splits the work into chunks (you can specify the chunk size; the default is generally fine), and feeds each chunk to a worker thread or process.

Normally map blocks the thread it’s running in, meaning you can’t do anything else until map returns finished work. If you want to run map asynchronously, by providing a callback function that runs when all its jobs finish, use map_async.

Finally, this basic example only involves threads and processes that have their own individual state. if you have a long-running CPU-bound operation where threads or processes need to share information with one another, look into using multiprocessing with shared memory or a server process.

On the whole, though, the more you can partition both the processing and the data to be processed, the faster everything will run. That’s a cardinal rule of multiprocessing and multithreading no matter what language you’re using.

Copyright © 2018 IDG Communications, Inc.