engineering

December 03, 2020   |   8min read

A Guide to Python asyncio

It is hard to imagine modern programming in Python without the asyncio library. The package that lets Python programmers write concurrent code is one of the largest and most ambitious libraries ever added to Python. In this asyncio tutorial, we will examine what are the biggest advantages of using it. I will begin with a few basic examples that describe asyncio’s fundamentals. After that, I will present a bigger example—a script for downloading files asynchronously that is written with asyncio. Before we dive deeper into the topic, let us first establish a good understanding of concurrency and parallelism.

Introduction to Python asyncio

Concurrency is like having multiple threads running on a single CPU core. There is nothing extraordinary about that. A modern workstation has four or eight CPU cores, but, at the same time, is running more than 100 processes. Even though the CPU itself can’t handle more than four or eight jobs at once, the computer is seamlessly dealing with +100 processes.

On the other hand, parallelism is like running two threads simultaneously on different cores of a CPU. Do note that the parallelism implies concurrency, but not the other way around.

So, what’s the role of asyncio in the world of concurrent programming?

We all know that one of the main causes of slowdowns are I/O operations—for instance, accessing files from a hard drive, executing queries in databases, or waiting for data that is about to arrive over the network. It is crucial to handle those operations in an efficient way. One solution is to create a thread and wait for I/O in that thread. However, threads don’t come cheap in Python. The threading library is based on OS-threads, and it is an operating system that manages the threads and their call stacks. Each thread consumes some amount of memory. There are also context switching costs involved. In the case of a server application, it is probably not a good idea to create a separate thread for each opened connection, because the resources will run out very shortly.

That’s where concurrency might help. Because all concurrent operations are running on a single thread, there are no context switching costs and memory overhead.

asyncio has been available since Python 3.4 and is constantly gaining new features in each minor release of Python. In the following asyncio examples, some of the newest asyncio’s features are used. Therefore, a minimum version of Python needed to run all of them is 3.7.

Coroutines and Tasks

Coroutines are a key element of the library. Just like generators, coroutines produce data, but can also consume data. A coroutine can suspend its execution if no further progress can be made (because, for instance, it is waiting for a network request to be completed) and transfer the control to another coroutine, which can utilize CPU time better. The point where the coroutine suspends its execution is saved. Once the network response comes, the execution could be resumed from that point. Historically, before Python 3.5, coroutines shared syntax with generators. This has changed with PEP 492, which introduced new async/await syntax to Python. Now coroutines are declared with the async def statement.

The execution of coroutines is similar to generators. Calling a coroutine will not schedule it to be executed. It will just return a coroutine object. So, how should we execute a coroutine? One solution is to use await. Let’s start with a simple example that prints a string after waiting for 1 second, and then prints another string after waiting for another 2 seconds.

import asyncio
from datetime import datetime


def log(msg):
    print(f'{datetime.now():%H:%M:%S.%f} {msg}')


async def say_after(what, delay):
    log(f'"{what}" scheduled for execution')
    await asyncio.sleep(delay)
    log(what)


async def main():
    await say_after('Hello!', 1)
    await say_after('Hi!', 2)
    log('Done.')


if __name__ == '__main__':
    asyncio.run(main())

The output is:

10:53:24.500115 "Hello!" scheduled for execution
10:53:25.504662 Hello!
10:53:25.504785 "Hi!" scheduled for execution
10:53:27.506079 Hi!
10:53:27.506187 Done.

The most significant parts of the code are:

  • import asyncio—it imports the asyncio library.
  • asyncio.run(main())—this function executes the coroutine main. It is used as the main entry point for asyncio programs and should be called only once.
  • await say_after('Hello!', 1)—this statement pauses the coroutine main and schedules say_after to run immediately. The control is given back to the caller only when the coroutine finishes.
  • await asyncio.sleep(delay)—this is an equivalent of a blocking operation. This expression handles the control flow to the event loop, which will resume the coroutine after the sleep delay. Meanwhile, the event loop will continue running and may do something else. Do not use time.sleep(...) in asyncio programs unless you want to freeze the event loop and the whole application as a result!

So far so good, but what if we want to run those two say_after coroutines concurrently? Let’s modify the example and create asyncio Tasks.

# the rest of the code is unchanged

async def main():
    task1 = asyncio.create_task(say_after('Hello!', 1))
    task2 = asyncio.create_task(say_after('Hi!', 2))
    await task1
    await task2
    log('Done.')

The output is:

13:19:52.941399 "Hello!" scheduled for execution
13:19:52.941477 "Hi!" scheduled for execution
13:19:53.941723 Hello!
13:19:54.946535 Hi!
13:19:54.946714 Done.

The snippet ran 1 second faster than before! It can be observed that both coroutines were scheduled at the same time. asyncio.create_task() creates a task from the coroutine object and schedules it on the event loop, but does not pause the caller. This is an important difference between creating a Task via asyncio.create_task() and awaiting via await on a coroutine.

Having covered these fundamentals, let’s move forward to a more real-world example. If you’ve done any programming with threads, you know that there is no API to terminate a thread from the outside. A special object must be instantiated, passed as an argument to a thread, and then checked continuously. For asyncio tasks, there is the Task.cancel() instance method that can be used. The following asyncio example shows how this works.

import asyncio
from datetime import datetime


def log(msg):
    print(f'{datetime.now():%H:%M:%S.%f} {msg}')


async def worker():
    while True:
        await asyncio.sleep(1)
        log('Message from worker')


async def supervisor():
    log('Starting worker...')
    task = asyncio.create_task(worker())

    # Doing other stuff in the meantime...
    await asyncio.sleep(3.5)

    task.cancel()
    try:
        await task  # wait for task cancellation
    except asyncio.CancelledError:
        log('The task has been canceled')

if __name__ == '__main__':
    asyncio.run(supervisor())

The output is:

11:36:30.754006 Starting worker...
11:36:31.755152 Message from worker
11:36:32.756569 Message from worker
11:36:33.757679 Message from worker
11:36:34.255729 The task has been canceled

A Task object is canceled after approx. 3,5 seconds. Task.cancel() raises asyncio.CancelledError at the await line. The coroutine may catch the exception to execute some teardown code:

async def worker():
    while True:
        try:
            await asyncio.sleep(1)
        except asyncio.CancelledError:
            log('Stopping worker...')
            raise
        log('Message from worker')

The coroutine may even suppress cancelation if the exception is not re-raised after being catched.

Asynchronous Downloader

Network I/O is a good example of how an asynchronous operation can handle things more efficiently. Instead of wasting CPU cycles waiting, it is better to do something else until a response comes back from the network. The following example downloads zip archives containing documentation for three latest versions of Python 3.8: 3.8.4, 3.8.5 and 3.8.6.

Python’s asyncio does not support HTTP directly. Popular HTTP clients, like urllib.request and requests, cannot be used either, because they are not asynchronous. Fortunately, there is aiohttp, an asynchronous HTTP client for asyncio. aiohttp is not in the standard library, so it must be installed:

pip install aiohttp

Now let’s review the script.

import asyncio
import os
import time
from typing import Iterable
from typing import Generator
from typing import Optional
from typing import Tuple

import aiohttp


BASE_URL = 'https://docs.python.org/3.8/archives'
DEST_DIR = 'downloads/'
CHUNK_SIZE = 8192


async def get_file(session: aiohttp.ClientSession, filename: str) -> None:
    url = f'{BASE_URL}/{filename}'
    async with session.get(url) as resp:
        if resp.status == 404:
            raise RuntimeError('404 not found')
        writer = write(filename)
        next(writer)
        while True:
            chunk = await resp.content.read(CHUNK_SIZE)
            try:
                writer.send(chunk)
            except StopIteration:
                break


def write(filename: str) -> Generator[None, Optional[bytes], None]:
    if not os.path.exists(DEST_DIR):
        os.mkdir(DEST_DIR)
    with open(os.path.join(DEST_DIR, filename), 'wb') as fd:
        while True:
            chunk = yield
            if not chunk:
                break
            fd.write(chunk)


async def download_one(session: aiohttp.ClientSession, filename: str) -> bool:
    try:
        await get_file(session, filename)
    except RuntimeError:
        return False
    return True


async def download_many(filenames: Iterable[str]) -> int:
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*[download_one(session, f) for f in
                                         filenames])
    return sum(1 if r else 0 for r in results)


def get_files_to_download() -> Iterable[str]:
    versions = ('3.8.4', '3.8.5', '3.8.6')
    return (f'python-{version}-docs-pdf-letter.zip' for version in versions)


def main() -> Tuple[int, float]:
    start = time.time()
    files_to_download = get_files_to_download()
    downloaded = asyncio.run(download_many(files_to_download))
    elapsed = time.time() - start
    return downloaded, elapsed


if __name__ == '__main__':
    downloaded, elapsed = main()
    print(f'{downloaded} files downloaded in {elapsed:.2f}s')

The process is started in download_many.

  • async with is a new syntax for an asynchronous context manager. The asynchronous context manager is a context manager able to suspend execution in its enter and exit methods. aiohttp uses them to manage the lifecycle of its sessions and connections.
  • asyncio.gather is a handy function that runs many awaitable objects (coroutines and tasks) concurrently and returns an aggregated list of results once all awaitables are completed.

get_file is an actual coroutine responsible for querying an URL.

  • If the response status code is not 404, a generator object writer is created. Do note that the generator is not starting immediately! The generator must advance to the first yield before it can start receiving values. This can be done with the next(writer).
  • By using the content attribute, we avoid loading the whole response in memory.
  • StopIteration signals that the generator has exited and the file has been closed. We can break the loop.

write generator function is responsible for opening the file, writing chunks of data, and closing the file. This generator is not asynchronous, because Python does not provide an asynchronous filesystem API. Once the generator returns, StopIteration is raised in get_file.

Summary

asyncio introduces a whole new way of writing concurrent code in Python. Many third-party libraries are introducing support for asyncio, and its popularity is growing fast. Meanwhile, because it is relatively new, asyncio still lacks coverage in books and online tutorials. This is especially true for the new async/await syntax. Let’s hope that despite all the obstacles, the spread of asyncio will continue and result in better and more efficient Python code.

Kamil Wasilewski

Software Engineer

Did you enjoy the read?

If you have any questions, don’t hesitate to ask!