Get started with async in Python

Learn how to use asynchronous programming in Python to get more done in less time, without waiting

Get started with async in Python
graemenicholson / Getty Images

Asynchronous programming, or async for short, is a feature of many modern languages that allows a program to juggle multiple operations without waiting or getting hung up on any one of them. It’s a smart way to efficiently handle tasks like network or file I/O, where most of the program’s time is spent waiting for a task to finish.

Consider a web scraping application that opens 100 network connections. You could open one connection, wait for the results, then open the next and wait for the results, and so on. Most of the time the program runs is spent waiting on a network response, not doing actual work.

Async gives you a more efficient method: Open all 100 connections at once, then switch among each active connection as they return results. If one connection isn’t returning results, switch to the next one, and so on, until all connections have returned their data.

Async syntax is now a standard feature in Python, but longtime Pythonistas who are used to doing one thing at a time may have trouble wrapping their heads around it. In this article we’ll explore how asynchronous programming works in Python, and how to put it to use.

Note that if you want to use async in Python, it’s best to use Python 3.7 or Python 3.8 (the latest version as of this writing). We’ll be using Python’s async syntax and helper functions as defined in those versions of the language.

When to use asynchronous programming

In general, the best times to use async are when you’re trying to do work that has the following traits:

  • The work takes a long time to complete.
  • The delay involves waiting for I/O (disk or network) operations, not computation.
  • The work involves many I/O operations happening at once, or one or more I/O operations happening when you’re also trying to get other tasks done.

Async lets you set up multiple tasks in parallel and iterate through them efficiently, without blocking the rest of your application.

Some examples of tasks that work well with async:

  • Web scraping, as described above.
  • Network services (e.g., a web server or framework).
  • Programs that coordinate results from multiple sources that take a long time to return values (for instance, simultaneous database queries).

It’s important to note that asynchronous programming is different from multithreading or multiprocessing. Async operations all run in the same thread, but they yield to one another as needed, making async more efficient than threading or multiprocessing for many kinds of tasks. (More on this below.)

Python async await and asyncio

Python recently added two keywords, async and await, for creating async operations. Consider this script:

def get_server_status(server_addr)
    # A potentially long-running operation ...
    return server_status

def server_ops()
    results = []
    results.append(get_server_status('addr1.server')
    results.append(get_server_status('addr2.server')
    return results

An async version of the same script—not functional, just enough to give us an idea of how the syntax works—might look like this.

async def get_server_status(server_addr)
    # A potentially long-running operation ...
    return server_status

async def server_ops()
    results = []
    results.append(await get_server_status('addr1.server')
    results.append(await get_server_status('addr2.server')
    return results

Functions prefixed with the async keyword become asynchronous functions, also known as coroutines. Coroutines behave differently from regular functions:

  • Coroutines can use another keyword, await, which allows a coroutine to wait for results from another coroutine without blocking. Until results come back from the awaited coroutine, Python switches freely among other running coroutines.
  • Coroutines can only be called from other async functions. If you run server_ops() or get_server_status() as-is from the body of the script, you won’t get their results; you’ll get a Python coroutine object, which can’t be used directly.

So if we can’t call async functions from non-asynchronous functions, and we can’t run async functions directly, how do we use them? Answer: By using the asyncio library, which bridges async and the rest of Python.

Python async await and asyncio example

Here is an example (again, not functional but illustrative) of how one might write a web scraping application using async and asyncio. This script takes a list of URLs and uses multiple instances of an async function from an external library (read_from_site_async()) to download them and aggregate the results.

import asyncio
from web_scraping_library import read_from_site_async

async def main(url_list):
    return await asyncio.gather(*[read_from_site_async(_) for _ in url_list])

urls = ['http://site1.com','http://othersite.com','http://newsite.com']
results = asyncio.run(main(urls))
print (results)

In the above example, we use two common asyncio functions:

  • asyncio.run() is used to launch an async function from the non-asynchronous part of our code, and thus kick off all of the progam’s async activities. (This is how we run main().)
  • asyncio.gather() takes one or more async-decorated functions (in this case, several instances of read_from_site_async() from our hypothetical web-scraping library), runs them all, and waits for all of the results to come in.

The idea here is, we start the read operation for all of the sites at once, then gather the results as they arrive (hence asyncio.gather()). We don’t wait for any one operation to complete before moving onto the next one.

Components of Python async apps

We’ve already mentioned how Python async apps use coroutines as their main ingredient, drawing on the asyncio library to run them. A few other elements are also key to asynchronous applications in Python:

Event loops

The asyncio library creates and manages event loops, the mechanisms that run coroutines until they complete. Only one event loop should be running at a time in a Python process, if only to make it easier for the programmer keep track of what goes into it.

Tasks

When you submit a coroutine to an event loop for processing, you can get back a Task object, which provides a way to control the behavior of the coroutine from outside the event loop. If you need to cancel the running task, for instance, you can do that by calling the task’s .cancel() method.

Here is a slightly different version of the site-scraper script that shows the event loop and tasks at work:

import asyncio
from web_scraping_library import read_from_site_async

tasks = []

async def main(url_list):    
    for n in url_list:
        tasks.append(asyncio.create_task(read_from_site_async(n)))
    print (tasks)
    return await asyncio.gather(*tasks)

urls = ['http://site1.com','http://othersite.com','http://newsite.com']
loop = asyncio.get_event_loop()
results = loop.run_until_complete(main(urls))
print (results)

This script uses the event loop and task objects more explicitly.

  • The .get_event_loop() method provides us with an object that lets us control the event loop directly, by submitting async functions to it programmatically via .run_until_complete(). In the previous script, we could only run a single top-level async function, using asyncio.run(). By the way, .run_until_complete() does exactly what it says: It runs all of the supplied tasks until they’re done, then returns their results in a single batch.
  • The .create_task() method takes a function to run, including its parameters, and gives us back a Task object to run it. Here we submit each URL as a separate Task to the event loop, and store the Task objects in a list. Note that we can only do this inside the event loop—that is, inside an async function.

How much control you need over the event loop and its tasks will depend on how complex the application is that you’re building. If you just want to submit a set of fixed jobs to run concurrently, as with our web scraper, you won’t need a whole lot of control—just enough to launch jobs and gather the results. 

By contrast, if you’re creating a full-blown web framework, you’ll want far more control over the behavior of the coroutines and the event loop. For instance, you may need to shut down the event loop gracefully in the event of an application crash, or run tasks in a threadsafe manner if you’re calling the event loop from another thread.

Async vs. threading vs. multiprocessing

At this point you may be wondering, why use async instead of threads or multiprocessing, both of which have been long available in Python?

First, there is a key difference between async and threads or multiprocessing, even apart from how those things are implemented in Python. Async is about concurrency, while threads and multiprocessing are about parallelism. Concurrency involves dividing time efficiently among multiple tasks at once—e.g., checking your email while waiting for a register at the grocery store. Parallelism involves multiple agents processing multiple tasks side by side—e.g., having five separate registers open at the grocery store.

Most of the time, async is a good substitute for threading as threading is implemented in Python. This is because Python doesn’t use OS threads but its own cooperative threads, where only one thread is ever running at a time in the interpreter. In comparison to cooperative threads, async provides some key advantages:

  • Async functions are far more lightweight than threads. Tens of thousands of asynchronous operations running at once will have far less overhead than tens of thousands of threads.
  • The structure of async code makes it easier to reason about where tasks pick up and leave off. This means data races and thread safety are less of an issue. Because all tasks in the async event loop run in a single thread, it’s easier for Python (and the developer) to serialize how they access objects in memory.
  • Async operations can be cancelled and manipulated more readily than threads. The Task object we get back from asyncio.create_task() provides us with a handy way to do this.

Multiprocessing in Python, on the other hand, is best for jobs that are heavily CPU-bound rather than I/O-bound. Async actually works hand-in-hand with multiprocessing, as you can use asyncio.run_in_executor() to delegate CPU-intensive jobs to a process pool from a central process, without blocking that central process.

Next steps with Python async

The best first thing to do is build a few, simple async apps of your own. Good examples abound now that asynchronous programming in Python has undergone a few versions and had a couple of years to settle down and become more widely used. The official documentation for asyncio is worth reading over to see what it offers, even if you don’t plan to make use of all of its functions.

You might also explore the growing number of async-powered libraries and middleware, many of which provide asynchronous, non-blocking versions of database connectors, network protocols, and the like. The aio-libs repository has some key ones, such as the aiohittp library for web access. It is also worth searching the Python Package Index for libraries with the async keyword. With something like asynchronous programming, the best way to learn is to see how others have put it to use.

Copyright © 2019 IDG Communications, Inc.