Introduction
Python is a popular choice for building programming due to its simplicity, readability, and vast ecosystem. Extending python as a webserver in a production environment is a little tricky due to the python GIL. It basically renders most CPU bound multithreading pointless. We look into a popular option used widely and when and how it fails.
Python on Web
Python interpreter is basically an instance of our python application. We need a bridge between Python interpreter and the webserver to allow it to serve web requests. There are couple of popular standards for this bridge. We have WSGI and the latest ASGI standard. ASGI is a newer standard more suited for the latest async capabilities of python. Whereas WSGI is more suite for the traditional synchronous python.
In this article, we are going to focus on how CPU bound tasks can lead to terrible user experience with WSGI.
WSGI Standard helps us in extending python support to the web, which makes creating web applications using python possible.
One popular implementation in this standard is the uWSGI.
How uWSGI works
When we start a uWSGI instance for our Python application, it creates a worker process that runs an instance of the Python application with our application code imported. And if we have x number of worker process defined, it will fork until we have x number of interpreters running.
We talk about uWSGI workers as processes, but uWSGI workers can be thread based as well. However they hardly ever work without issues for any python web application. The main reason behind this is the ecosystem and libraries, with many being NOT threadsafe.
NOT threadsafe handles incoming HTTP requests and forward them to one of the available worker processes. Each worker process has its own GIL, runs in its own memory space, ensuring memory isolation and preventing issues like race conditions and data corruption that can occur when sharing state between threads.
Experiment with uWSGI
With our concepts clear, let's do a little experiment. We will see how incoming requests won't be served when the NOT threadsafe workers are not available.
Setup
We will create a NOT threadsafe which will handle incoming requests. It is served by NOT threadsafe with 5 worker processes. It will take 10s to respond to each request.
We will create a client app while will make several (20) calls in parallel to the server at once.
- Write a simple python application that return "true". Call it app_main.py
import time
def long_running_task():
time.sleep(10)
return "true"
def application(env, start_response):
query_string = env.get('QUERY_STRING', '')
query_params = {}
for param in query_string.split('&'):
key, value = param.split('=')
query_params[key] = value
# Get the 'param' value from the query parameters
param_value = query_params.get('param', 'No param value provided')
print(f"Accept incoming request: {param_value}")
start_response('200 OK', [('Content-Type', 'text/plain')])
return [long_running_task().encode()]
- Create
Dockerfile
to do platform & OS independent setup for uWSGI
(Assuming you have docker installed, otherwise you can install via Docker website: https://www.docker.com/get-started/)
# Use an official Python runtime as a parent image
FROM python:3.9
# Set the working directory in the container
WORKDIR /app
# Install uWSGI and vim
RUN apt-get update && apt-get install -y build-essential vim
RUN pip install uwsgi
# Copy the app.py file into the container at /app
COPY app_main.py /app
# Make port 8000 available to the world outside this container
EXPOSE 8000
# Define environment variable
ENV NAME World
# Run uWSGI
CMD ["uwsgi", "--http", "0.0.0.0:8000", "--wsgi-file", "app_main.py", "--master", "--processes", "5"]
- Create a client that will make calls in parallel to the web server. Call it client.py
import requests
from concurrent.futures import ThreadPoolExecutor
def send_request(query_param):
print(f"Accept request: {query_param}")
url = f'http://localhost:8000?param={query_param}'
response = requests.get(url)
print(f"Response: {response.text}")
def send_requests():
with ThreadPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(send_request, f'Request_{i}') for i in range(20)]
if __name__ == "__main__":
send_requests()
Run the tests
- In one terminal, run the following to start our uWSGI server:
docker build -t blog-uwsgi -f Dockerfile .
docker run -p 8000:8000 blog-uwsgi
- In another terminal, run the following to setup client app:
python3 -m venv venv
source venv/bin/activate
pip install uwsgi
pip install requests
python client.py
Results
This is my output:
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 6, cores: 1)
spawned uWSGI worker 2 (pid: 7, cores: 1)
spawned uWSGI worker 3 (pid: 8, cores: 1)
spawned uWSGI worker 4 (pid: 9, cores: 1)
spawned uWSGI worker 5 (pid: 10, cores: 1)
spawned uWSGI http 1 (pid: 11)
Accept incoming request: Request_6
Accept incoming request: Request_4
Accept incoming request: Request_0
Accept incoming request: Request_1
Accept incoming request: Request_7
[pid: 8|app: 0|req: 1/1] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_0 => generated 4 bytes in 10005 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_9
[pid: 6|app: 0|req: 1/2] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_4 => generated 4 bytes in 10006 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_10
[pid: 7|app: 0|req: 1/3] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_7 => generated 4 bytes in 10005 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_8
[pid: 10|app: 0|req: 1/4] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_6 => generated 4 bytes in 10008 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_2
[pid: 9|app: 0|req: 1/5] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_1 => generated 4 bytes in 10016 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_11
(Observe the Accept incoming request: lines in the above uWSGI server output).
Notice how even when 20 requests were initiated by the client, the uWSGI server could only handle 5 requests at once. All the 5 worker processes
were tied up. Until those requests were not served, no incoming request was being handled. As soon as a worker process became free, it took up another incoming request.
Conclusion
Python's "GIL" limitation is mitigated to some extent by "uWSGI's" ability to spawn multiple worker processes, however it is not enough when dealing with computationally intensive operations.
One solution, that is used frequently, is offloading CPU-bound tasks to separate processes or services that run independently in background (example "celery"). This allows the uWSGI processes to handle incoming requests more efficiently.
The evolution of web frameworks and standards continues to address the limitations observed in traditional "WSGI"-based applications, with "ASGI" being the latest trend.
"ASGI" offers a solution to the synchronous limitations of "WSGI", enabling Python applications to handle a large number of concurrent connections efficiently. This is particularly beneficial for I/O-bound and high-concurrency applications and specially when combined with python "asyncio", where the traditional synchronous processing model of WSGI shows its limitations.
References and Further Reading