Understanding Awaitables: Coroutines, Tasks, and Futures in Python

Understanding Awaitables: Coroutines, Tasks, and Futures in Python

posted Originally published at kitfucoda.medium.com 11 min read

Previously, we explored chatbot building with AsyncIO. As I am committing to publish one article a week, I am constantly on the lookout for new ideas. Then I figured it may be a good opportunity to delve deeper into asynchronous programming. The writing plan eventually expanded to a series of articles, and we are starting with the concept of Awaitables today.


An illustration from copilot on the topic

To a lot of us, our first exposure to something resembles asynchronous programming would be learning to write GUI applications. A graphical application usually responds to user events, that may happen any time while it is running. We then attach a handler function to events we are interested in, however we do not have much control when it is executed.

The JavaScript Prelude: From Callbacks to Promises


Photo by Womanizer Toys on Unsplash

I first learned programming when I was in college. The assignments were designed to train us to write code with a set of a controlled scope. I graduated in the era of web 2.0, that’s when people started to explore AJAX to add dynamic behaviour to web pages. Suddenly, despite we were plagued with inconsistencies among browser’s JavaScript implementation, webpages started to feel like a GUI application.

AJAX, which is a short for Asynchronous JavaScript and XML, opens up the possibility for us to fetch data from a web server through JavaScript. With proper use, we can get a page to fetch additional information to update the page content, responding to various events defined in the document object model, DOM. Some examples of DOM events are, clicking on elements in page, elements loaded in page, or even timed events through setInterval and setTimeout.

Despite having some experience with GUI application programming, but doing it in real enterprise setting feels very different. I was still too used to structured procedural programming, where code execution is often synchronous, in a mostly predictable order. However, when it comes to an enterprise setting, the requirements are constantly changing and time is often a luxury we cannot afford. Despite having good mentor, it still took me quite a while to have a basic grasp of asychronous programming with the delicate language construct at the time.

Let's start with event handlers. Everything we see in a web page, is represented in a Document Object Model (DOM) tree, and each element is capable of responding to events. For instance, we can click on a button, and have it pop up an alert dialog through a code snippet below.

# In HTML markup
<button id="theButton">The button</button>

# In JavaScript
const button = document.getElementById("theButton")
button.addEventListener((e) => window.alert("Button is pressed"))

Then we have callbacks, which were popularized by AJAX calls. Back then, with jQuery, we could define handlers to deal with both success or failure cases. For instance, let's say we want to fetch the HTML markup of this blog (skipping error failure callback for brevity), we do

$.ajax({
    url: "https://kitfucoda.medium.com/",
    success: (data, status, xhr) => { /* process data here */ }
})

It is worth noticing that both the event handler and the success callback we defined earlier, can only access variables that were defined before their definition. The handler and callback would have no access to new external variables. On the other hand, the code outside the callback and handler do not have access to variables within them. Essentially, we can treat the code being executed separately, completely breaking off from the normal execution flow. Therefore, we do not have a way to tell if they are executed, or when they are done (or failed).

It is also possible to combine the two together. For example, combining the two examples above, we can make the page to fetch this blog after clicking on the button.

button.addEventListener(
    (e) => $.ajax({
        url: 'https://kitfucoda.medium.com',
        success: (data, status, xhr) => { /* maybe populate this somewhere */ }
    })
)

Remember we talked about how it is not possible to know if these handlers are done executing from the outside? Yes, nesting them was the only way to ensure order. If you need to fetch my other blog https://cslai.coolsilon.com AFTER finishing fetching this blog, you can do another $.ajax call within the success handler above. The growing nested callbacks eventually led to what people call a callback hell or pyramid of doom.

I didn’t follow the development of JavaScript much, as I moved on to doing Python. However we eventually had Promise, and my first encounter with it was again through jQuery. Now, it is possible to write a sequence of AJAX calls in a chain, instead of nesting endlessly. For instance

$.ajax({url: 'https://kitfucoda.medium.com/'})
    .then((response) => {
        // do something with the response
        // Now start a new ajax
        return $.ajax({url: 'https://cslai.coolsilon.com/'})
    })
    .then((response) => {
        // Do other thing with the response
    })

No more pyramid of doom, but the callback and the code outside of it still remained separate and independent to each other.

Enter AsyncIO: Awaitables Defined

My exploration of asynchronous programming in Python, was just as chaotic. When I was working for a company that offers CDN service, I spent some time learning about gevent and greenlets. It was practically my first introduction to the asynchronous world in the language. However, I left the company soon after that and didn’t have a chance to see it in action. Years later, I hacked a chatbot that barely worked, and also did a quick exercise building an ASGI websocket chatroom, without use of any frameworks. While I managed to do it, but it wasn’t a useful introduction to AsyncIO.

This article, like the one I did on chatbot, is an article I wished I read back then.

Coroutines: The Foundation of Awaitables


Photo by Kelly Sikkema on Unsplash

As a prologue to the article I am writing next week, we cover Awaitables this week in the context of AsyncIO. Understanding what they are is core to understanding how to program with AsyncIO. However, before we start, let’s talk about coroutines. A coroutine is a function defined with an async keyword. Note that if a function has any async or await statement, it has to be defined as a coroutine.

# a normal coroutine
async def a_coroutine():
    return 42

# this is invalid
def this_should_be_a_coroutine():
    await some_awaitable()
    return 42

We often write functions doing heavy IO-bound work as coroutines, such that other CPU bound computation can still take place simultaneously. To execute these coroutines, we would need an event loop. Despite the name, I prefer seeing it as a timetable like structure, a place where it accepts Awaitables (like coroutines) to be scheduled to run.

Let’s slowly build a fictional project, as a tool to explore more about asyncio in this series of articles. We start by defining a coroutine that simulates I/O operations, and use httpx to make HTTP requests. Given a national Pokédex ID, we want to return the name of the corresponding Pokémon.

async def dex(id: int) -> str:
    assert isinstance(id, int)

    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://pokeapi.co/api/v2/pokemon/{id}/")

        return f"The pokemon with id {id} is {response.json()['name']}"

When a coroutine is called like how a function is called, it returns a Coroutine object. The returned object is still pending scheduling, hence the code isn’t executed at this point. If the coroutine does not get a chance to run, e.g. when the function calling the coroutine exits, dropping the reference to it.

The quickest way to fix it is prefixing an await before it. However, like we discussed earlier, if a function contains an await, then it has to be defined as a coroutine with async. Therefore, going asynchronous usually means an all-in commitment. This also means porting code part-by-part to be asynchronous takes enormous effort.

Using await is a way to schedule a coroutine to the event loop. The behaviour is similar to synchronous programming, it will attempt to start the execution as soon as possible, and will only execute the next line after a result is returned. But there are times we want to have more control, and that’s achieved by turning it to a task.

Tasks: Scheduling and Managing Coroutines


Photo by airfocus on Unsplash

A task can be created, by passing in the coroutine object to asyncio.create_task. It returns a asyncio.Task object referencing the scheduled coroutine. Some control is offered through the returned Task object, for example we can check if it is done, cancel the execution, or set a callback when done.

 task = asyncio.create_task(dex(6)) 


Photo by Halfcut Pokemon on Unsplash

Unlike the await keyword, you can call asyncio.create_task even in a function, as long as an event loop is available and running. However, being an awaitable object, you can also block execution for subsequent statements with an await too, in a coroutine. However, it is typically used when you only want to revisit the state of execution later (or forget about it).

async def foo() -> str:
    task = asyncio.create_task(dex(104)

    # proceed with other work

    return await task 


Photo by Branden Skeli on Unsplash

Futures: Representing Asynchronous Results


Photo by Zulfa Nazer on Unsplash

Next, we have Futures, which is the parent class of the Task we just discussed. I personally find this tricky to explain, but while preparing for the article, I was informed that a future works the way a callback does (still remember?). Let’s say I have a very badly defined function to calculate nth Fibonacci number to simulate a CPU-bound operation

def fibonacci(nth: int) -> int:
    assert nth > 0

    result = ()

    for i in count(1):
        match i:
            case 1:
                result = (0,)

            case 2:
                result += (1,)

            case _:
                result = result[1:] + (sum(result),)

    assert len(result) > 0

    return result[-1]

Asynchronous programming is useful in scenario with a lot of I/O operations. It ensures concurrency by making it possible for CPU-bound operations to work, while waiting for I/O. However, if a CPU-bound operation is taking forever, it would end up stalling everything else. Hence, it is advisible to throw the work to another thread or process. We can use loop.run_in_executor to create an asyncio.Future object to offload the work to another thread.

future = (
    asyncio.get_running_loop()
        .run_in_executor(None, partial(fibonacci, 1500))
)

The first argument can take either a ThreadPoolExecutor or a ProcessPoolExecutor, leave it as None to use the default ThreadPoolExecutor. The second argument takes a callable, we could have rewrite fibonacci with __call__ dunder as discussed in the previous article, but using functools.partial is fine.

Alternatively we can use asyncio.to_thread which is an helpful abstraction to avoid creating a Future ourselves. However, it returns a coroutine, so it has to be awaited, or scheduled as a Task via asyncio.create_task. Also, unlike loop.run_in_executor, we do not get to choose between a ThreadPoolExecutor or ProcessPoolExecutor.

coro = asyncio.to_thread(fibonacci, 1000)

# Schedule it for execution
asyncio.create_task(coro)

# Or await directly
await coro

But we haven’t show how callbacks are possible with Futures. Let’s write a simpler example on this topic. Suppose we have a coroutine that has to be run concurrently in the background, streaming a file content over the internet, and we are interested in finding the line number of a specific line in it.

async def scan_huge_file(future, need_this_line):
    for line_num, line in enumerate(some_huge_file()):
        if line is need_this_line:
            future.set_result(line_num)

Then, when we schedule the tasks (again, we will dive deeper on this in the next article), we would have done something like this.

loop = asyncio.get_running_loop()
future = loop.create_future()

asyncio.create_task(scan_huge_file(future, 'very critical line'))

# do other high-priority tasks
...

# when we are finally done with other tasks
number = await future 

The callback is done in scan_huge_file, through future.set_result. The function scheduling the coroutine, only checks if a line number is found, after it is done with other tasks. Of course we could have save a reference to the asyncio.create_task, and then make scan_huge_file return the line number so we can await the task, but this is just to show how a callback can be implemented.

Now we know the three Awaitables, it is time to revisit what it is. As you may have guessed, it is just a term to describe these three types of objects, where we can use await to get the result.

Python vs. JavaScript: Comparison


Photo by Oleksandr Chumak on Unsplash

Asynchronous programming in JavaScript modelled the problem very differently, and hence we have a complete different design. We stopped at Promise just now, and we briefly discussed that because it is the root of the implementation.

AsyncIO is doing a good job abstracting the event loop away from the users. For instance, we only started to call event loop methods, when we were discussing about Futures. The development experience is getting so much better that using await directly to the coroutine, or to the Task (which wraps a coroutine) should be sufficient for most cases.

OTOH, event loop in JavaScript, at least in browser, is almost completely abstracted away from us.

A friend came to me a few days ago, complaining how he is confused about asynchronous programming in the language. Why certain asynchronous functions were called with an await but some without. That prompted me to do a quick check on the topic, and I was quite surprised with the discovery. It turns out if I call an asynchronous function without using await, it will be scheduled in the background.

The discussion session was the main inspiration to this article, sans the Python part. I started with how event handlers led to creation of callbacks, Promise, and ultimately joining the async-await family. Similar to Python, if a function contains any async or await statement in it, then it has to be defined as async.

While on the surface both share similar syntax, but the underlying implementations are completely different. In Python asynchronous was designed around coroutines, while in JavaScript was through Promise. I hope by weaving examples from both language it helps to clarify doubts on the matter.

Looking Ahead: More AsyncIO


Photo by Andrew Ly on Unsplash

Originally I was about to write on schedule and manage tasks, alongside notes on error handling through a fictional project. However, after planning out the content, just Awaitables alone would be long enough for the week. Hence, we would start building a fictional project next week, and discuss the above mentioned topics. Feel free to subscribe to the email newsletter so you don’t miss it, and leave me messages if I have missed anything today.

Thanks for reading, and I shall write again, next week.

For transparency, I’d like to acknowledge that while the code and technical explanations presented in this article are entirely my own, I received editorial assistance from a large language model. This collaboration has helped refine the clarity and structure of my writing. If you’re interested in discussing project collaborations or job opportunities, please feel free to reach out to me here on Medium or via my LinkedIn profile. The article was originally published to Medium.

If you read this far, tweet to the author to show them you care. Tweet a Thanks
0 votes
0 votes

More Posts

AsyncIO Task Management: A Hands-On Scheduler Project

kitfu10 - Apr 1

Asynchronous Python: A Beginner’s Guide to asyncio

alvisonhunter - Oct 14

Concurrency vs. Parallelism: Achieving Scalability with ProcessPoolExecutor

kitfu10 - Apr 8

Nexios (ASGI Python framework)

Chidebele Dunamix - Jul 14

Multithreading and Multiprocessing Guide in Python

Abdul Daim - Jun 7, 2024
chevron_left