Multithreading and Multiprocessing Guide in Python

Multithreading and Multiprocessing Guide in Python

posted 11 min read

Have you ever wondered why your Python application lags when handling large volumes of data or numerous user requests simultaneously? Or how to run multiple operations in Python without having them interfere with each other? The solution is concurrent execution that helps to execute parallel tasks in Python and scale with increasing demand. In modern computing, concurrent execution ensures that an application performs multiple operations simultaneously. It is crucial to enhance the efficiency and performance of the application. We can achieve this through the multiprocessing and multithreading concepts of Python. The multithreading technique is bound with I/O-related tasks facilitated by the Python threading module, whereas the multiprocessing technique is related to parallelism, which allows multiple tasks to run in parallel on distinct processes.
Therefore, in this article, we will learn in detail about the fundamentals of concurrency and how you can do it through multithreading and multiprocessing techniques in Python. So, let's get started.

Understanding Concurrency

1. Definition Of Concurrency And Its Significance In Programming

Concurrency means that an application is making progress on more than one task seemingly simultaneously. However, since the CPU can only execute one program at a time, it cannot execute more than one task at the same time; instead, it executes one task for some specific time then switches to another task and executes it for some time and so on. These tasks are called threads, and the switching between them is unnoticeable by the user, which shows the concurrent execution.

2. Difference Between Concurrency And Parallelism

3. Importance Of Concurrency In Software Performance And Responsiveness

In modern computing, concurrency is the fundamental concept of Python that is crucial to know for developing an application because it not only improves the efficiency of the application but makes it more responsive. The importance of concurrent execution lies in its ability to handle multiple tasks simultaneously. There is no need to wait to get a response from the server or send a response to the web request and perform the other task. We can now achieve this through concurrent execution, which can significantly enhance the responsiveness and speed of the application.

Note: Remember that task, thread, and process are relatable but different terms in the context of concurrency. As task is the unit of work, thread is the smallest unit of execution within a process, and process is the instance of a program running in a computer having its own memory and resources.

Multithreading in Python

1. Introduction To Multithreading

Multithreading is a technique for achieving concurrency within a single process. It allows you to run different parts of your process simultaneously. These different parts, known as threads, are the single unit of execution, within the process. They share the same resources and run in parallel but independently of each other. We call them multithreading or multiple threading when multiple threads work together to run within a single process.

Note: Thread is a sequence of instructions in a program that can be executed independent of the remaining program.

2. Global Interpreter Lock (GIL) And Its Impact On Multithreading

Now, moving toward the execution of the thread, let's understand the Global interpreter lock. GIL is a mechanism of applying a global lock on the Python interpreter. This lock ensures the execution of a single thread at a time. However, this mechanism lacks the execution of multiple threads. What happened is that all the threads try to acquire the memory at once because of single thread execution restriction, which causes the data override problem in memory. Even if your system processor has multiple cores, it still allows only single-thread execution one at a time. The reason lies in the thread working mode acquiring the GIL and, when needed for any I/O operation, releasing the lock so other threads can work and so on. Hence, this way GIL impacts the working of multithreading by limiting the thread operations, restricting the parallel execution, and ensuring that only one single thread will run in the interpreter at a time.

Note: Remember, GIL is only supported in CPython interpreter. 

3. Illustration Of Creating And Managing Threads Using The 'threading' Module

In Python, we can perform multithreading using the built-in threading module. This module is directly imported into the program and does not require further installation. Additionally, threads are represented as objects in Python. These thread objects can execute functions that work with data, be stored in iterables (data structures), and be passed as parameters to function. Now, let's take an example and understand the threading module.

import time #importing time module
import threading  #importing threading module

def calc_square(Numbers): #create first thread task
    print("Calculate Square Numbers")
    for n in Numbers:
        time.sleep(0.2) #To pause execution for checking the total time taken
        print("Square:", n*n)


def calc_cube(Numbers): #create second thread task
    print("Calculate Cube of Numbers")
    for n in Numbers:
        time.sleep(0.2) #To pause the execution for checking total time taken
        print("Cube:", n*n*n)

arr = [2,4,7] #array of numbers
t = time.time()

#creating the threads
t1 = threading.Thread(target=calc_square, args = (arr,)) #thread 1
t2 = threading.Thread(target=calc_cube, args = (arr,)) #thread 2

#starting the threads
t1.start()
t2.start()

#waiting for the threads to complete
t1.join()
t2.join()

print("Thread has completed its execution in :", time.time() - t)
Tip: After creating a thread, it does not start executing until you call its start() method. Additionally, you need to wait for a thread to finish its task before the main program can continue or terminate. This is done using the join() method. 

Output

Note:  Keep in mind that Python's GIL allows only one thread to execute at a time, even in a multithreaded architecture. It means threads are not truly concurrent when executing CPU-bound tasks.

Synchronization and Communication Between Threads

1. Synchronization And Communication Techniques For Coordinating Threads

If multiple threads execute concurrently, there is a possibility of race or deadlock conditions occurring. Managing shared resources and coordination between threads within the process is essential to avoid these conditions. They tried to access and modify the shared data, causing the data override or inconsistency problem, and sometimes threads lead to starvation and live lock conditions. Therefore, the Python threading module has provided many techniques that follow strict synchronization between coordinating threads.

2. Thread Synchronization Primitives Like Locks, Semaphores, And Barriers

To overcome these concerns, Python provides various tools for thread synchronization to ensure that threads execute in a controlled manner. We can achieve thread synchronization using the threading module. Using the "threading" module, we can achieve thread synchronization, which allows multiple threads to work together without interfering with each other, especially when they need to share data or resources. The most common primitives are:

1. Lock

It is the most common mechanism for thread synchronization provided by Python. By using a lock mechanism, only one thread will enter a critical block of code at a time. It provides prevention against data corruption and inconsistencies.

import threading #import thread library
import time #import time library for measuring total time taken

# Create a lock object
lock = threading.Lock()

# Function that simulates a task using a lock
def task_with_lock(thread_ID):
    lock.acquire()  # Acquire the lock before accessing the shared resource
    try:
        print(f"Thread {thread_ID} has entered the critical section")
        time.sleep(1)  # Simulate work
    finally:
        lock.release()  # Release the lock when done
        print(f"Thread {thread_ID} is leaving the critical section")

# Create multiple threads to demonstrate lock usage
ThreadsList = [threading.Thread(target=task_with_lock, args=(i,)) for i in range(3)]
#starting the threads
for Thread in ThreadsList:
    Thread.start()
#waiting for the threads to complete
for Thread in ThreadsList:
    Thread.join()

Output

2. Semaphores

Semaphore is the advanced mechanism for thread synchronization. This mechanism limits the number of threads that can access a particular resource or all the resources. Because, sometimes, we want the following number of threads to access the resources e.g. only four members are allowed to access the database at a time. Therefore, we just specify how many threads we want to run concurrently.

import threading #import thread library
import time #import time library for measuring total time taken

# Create a semaphore with 3 threads 
Semaphore = threading.Semaphore(3)

# Function to access a limited resource
def access_resource(thread_ID):
    with Semaphore:
        # Accessing the shared resource
        print(f"Thread {thread_ID} is accessing the resource")
        time.sleep(1)  # Simulate resource usage
        print(f"Thread {thread_ID} is releasing the resource")

# Create multiple threads to demonstrate semaphore usage
ThreadsList = [threading.Thread(target= access_resource, args=(i,)) for i in range(3)]
#starting the threads
for Thread in ThreadsList:
    Thread.start()
#waiting for the threads to complete
for Thread in ThreadsList:
    Thread.join()

Output

Note: Remember, args=(i,) is a tuple that contains the arguments to pass to the target function. We can pass as many arguments as required to the targeted function to run by using ', '. 

3. Barriers

One of the primitives of thread synchronization is a barrier that ensures all the threads wait for each other to reach a certain point of execution before continuing. This means that these threads completed their specific tasks before synchronizing their execution.

import threading #import thread library

# Create a barrier for two threads
barrier = threading.Barrier(2)

# Function that reaches a synchronization point
def synchronize_point(Thread_ID):
    print(f"Thread {Thread_ID} is working...")
    barrier.wait()  # Wait for all threads to reach this point
    print(f"Thread {Thread_ID} passed the barrier")

# Create multiple threads to demonstrate semaphore usage
ThreadsList = [threading.Thread(target= synchronize_point, args=(i,)) for i in range(2)]
#starting the threads
for Thread in ThreadsList:
    Thread.start()
#waiting for the threads to complete
for Thread in ThreadsList:
    Thread.join()

Output

Multiprocessing in Python

1. Introduction To Multiprocessing

Multiprocessing is another technique for achieving true parallelism by utilizing multiple processes. It enables the execution of multiple processes by allowing them to process concurrently on multi-core systems. Additionally, the working of each process is independent of the other, which can significantly reduce the time required compared to performing the task in a single process. Python provides us with a built-in module named "multiprocessing" which creates a separate process, each having its interpreter and memory space.

Tip: Process is an instance of a program that consists of at least one thread called the main thread.

2. Difference Between Multithreading And Multiprocessing

3. Illustration Of Creating And Managing Processes Using The 'Multiprocessing' Module


Now, let's understand how the multiprocessing module works in Python with a simple example where we want to calculate the square of each number in the array in one function and the cube of each number in the second function. Then, we will use multiple processing to execute them separately as an individual process.

import time #importing time module
import multiprocessing  #importing multiprocessing module

def calc_square(Numbers): #create first process task
    print("Process 1 started Woring")
    print("Calculate Square Numbers")
    for n in Numbers:
        time.sleep(0.2) #To pause execution for checking the total ime taken
        print("Square:", n*n)


def calc_cube(Numbers): #create second process task
    print("Process 2 started Woring")
    print("Calculate Cube of Numbers")
    for n in Numbers:
        time.sleep(0.2) #To pause execution for checking total time taken
        print("Cube:", n*n*n)

if __name__ == '__main__':
    arr = [2,4,7] #array of numbers
    t = time.time()
    
    #creating the Processes
    Process1 = multiprocessing.Process(target=calc_square, args = (arr,)) #process 1
    Process2 = multiprocessing.Process(target=calc_cube, args = (arr,)) #process 2
    
    #starting the processes
    Process1.start()
    Process2.start()
    
    #waiting for the processes to complete
    Process1.join()
    Process2.join()
    
    print("Both Processes has completed its execution in :", time.time() - t)

Output

Caution: Remember to implement the main function under if __name__ == "__main__" or otherwise the multiprocessing module will complain. It ensures that Python finishes analyzing the program before the sub-process is created.

Wrapping Up

In conclusion, we discussed how Python programs become efficient if they work concurrently. The main point of concurrent execution is to overcome the deadlock condition by allowing the application to perform multiple tasks concurrently. It improves the performance of the application by utilizing the maximum CPU resources. Not only that, but it also enhances the responsiveness of the application. We also learned about the two mechanisms of achieving concurrent execution, i.e. multithreading and multiprocessing. Each has its own use case and can be helpful in different cases. So, when working with tasks that require I/O operations multithreading is efficient; otherwise, multiprocessing is ideal for CPU-bound tasks. As a developer, don't just ignore these mechanisms, as they significantly affect your application performance. I hope this guide is helpful for you. Thank you for reading this guide. Happy Coding!


Additional Reference

Concurrency in Python: Detailed Documentation
Multiprocessing module: A process based parallelism Python Documentation
Threading module: A thread based parallelism Python Documentation

If you read this far, tweet to the author to show them you care. Tweet a Thanks

More Posts

Mastering Lambda Functions in Python: A comprehensive Guide

Abdul Daim - May 6, 2024

Mastering Trace Analysis with Span Links using OpenTelemetry and Signoz (A Practical Guide, Part 1)

NOIBI ABDULSALAAM - Oct 24, 2024

Regular Expressions in Python

aditi-coding - Apr 5, 2024

Decorators in Python [With Examples]

aditi-coding - Mar 26, 2024

Git and GitHub for Python Developers A Comprehensive Guide

Tejas Vaij - Apr 7, 2024
chevron_left