Asyncio Gather: Bringing All the Plates to the Table
When you look at asynchronous Python code, especially involving asyncio.gather
, it can sometimes feel a bit counter-intuitive. You see lines of code executing one after another, and you might think, “Isn’t this just synchronous execution with extra keywords?”
I certainly do.
I was looking at this bit of code earlier today, and for a minute, I thought I did it wrong when I wrote it.
Let’s see if we can demystify this common pattern with a relatable example and reveal the true power of concurrency. The code examples come from my Cloud AutoPkg Runner project if you want to see them in context.
The Scenario: Gathering File Metadata
Imagine you have a file on your computer, and you need to quickly grab three pieces of information from it:
- ETag
- Size
- Last modified date
Each lookup makes a quick trip to the disk and takes a tiny (but measurable) amount of time. In a synchronous world, you would:
- Ask for the ETag, wait.
- Ask for the size, wait.
- Ask for the last modified date, wait.
Three steps, each blocking the next.
With asyncio
, you can kick them all off at once.
etag_task = get_file_metadata(file_path, "com.github.autopkg.etag")
file_size_task = get_file_size(file_path)
last_modified_task = get_file_metadata(file_path, "com.github.autopkg.last-modified")
etag, file_size, last_modified = await asyncio.gather(
etag_task, file_size_task, last_modified_task
)
The Functions Behind the Scenes
These helper functions are declared as async
, and they offload blocking disk operations into background threads with asyncio.to_thread
.
async def get_file_metadata(file_path: Path, attr: str) -> str:
return await asyncio.to_thread(
lambda: cast("bytes", xattr.getxattr(file_path, attr)).decode()
)
async def get_file_size(file_path: Path) -> int:
return await asyncio.to_thread(lambda: file_path.stat().st_size)
So when you call them, you’re not doing the work right away, you’re creating tasks that the event loop will schedule.
If you actually want them to run sequentially, you would do something like this instead.
etag = await get_file_metadata(file_path, "com.github.autopkg.etag")
file_size = await get_file_size(file_path)
last_modified = await get_file_metadata(file_path, "com.github.autopkg.last-modified")
Ordering Food
Let’s look at this a different way: ordering food at a restaurant.
You’re dining with some friends and the restaurant staff comes by and takes each of your orders. You are giving the restaurant tasks to accomplish: preparing your meals.
The restaurant kitchen is simultaneously cooking your table’s meals. They are balancing these tasks amongst other table’s orders as well. Here’s the key analogy: the restaurant staff doesn’t serve one by one as each meal is finished. Instead, they wait until everyone’s meal is ready, then serve them all together.
Since it’s an async
function, you’re not fetching the ETag immediately when you call etag_task = get_file_metadata(...)
, you’re creating a task and adding it to the queue. You then do the same for file size
and last modified date
. At this point, you’ve queued up three tasks.
Now comes the key step.
etag, file_size, last_modified = await asyncio.gather(
etag_task, file_size_task, last_modified_task
)
This is like saying “Here are all the orders for my table. When they are all ready, bring them out together.”
Key Takeaway
This pattern of creating “awaitable tasks” and then awaiting them collectively with asyncio.gather
is fundamental to asynchronous programming. It allows your Python program to initiate multiple “I/O-bound” operations (such as reading from disk, making network requests, database queries, long running tasks, etc.) and then pause its own execution, “yielding control” to the asyncio event loop.
So while the code may look sequential, the combination of await
and gather
is really orchestrating a well-timed dinner service, ensuring your program’s “table” gets everything at once.
Comments