Mastering Reading and Writing CSV Files in Python

Mastering Reading and Writing CSV Files in Python

posted 13 min read

CSV stands for Comma-separated Values, a popular format for storing tabular data in a spreadsheet or database. Most of the time, people use databases for large amounts of data and spreadsheets for small amounts. But CSVs still have their places. They are simple and convenient. No drivers or special APIs are needed to use them. Python makes them even simpler with the CSV module. So, in this tutorial, I will be talking about the basic understanding of CSV files and how to read, parse, and write CSV files, with some advanced concepts of dealing with CSV efficiently. So, let's get started!

Understanding CSV Files

Overview of CSV File Format

A CSV is a text file that contains the data. Oftentimes, the first row in the file is a header, letting you know what the values represent, and the remaining lines contain the data. Here, each row represents a record in the database. Its plain text format makes it easy and nifty for the website developers to create it. Further, CSV files are also easy to import, and you can import the CSV file from a spreadsheet or a database.

CSV Format: Delimiters, Line Break, and Data Separation

In each row, the pieces of data are separated by commas known as delimiters. Also, you can use some other delimiters too based on the requirement, such as semicolon ';' tabs '\t' and pipe '|'. Alternatively, to separate the row or record, we need breaks or new lines, and this can be done by using the '\n' delimiter. Furthermore, there is no specific datatype, but you may mentally interpret the data as strings, dates, and numbers, and everything is represented as a string. Also, whenever we see two commas in a row, that means a piece of data is simply missing.

Note: Remember, data is separated by commas because this is a text file, and no datatype is defined with the values.

Different Variations of CSV File

CSV is a plain text file that is widely used for storing and exchanging data, but it has multiple variations based on the distinct delimiters used in the data. The popularity of each variation depends on the needs of the software application involved and the specific use case of the user preference. Let's discuss some of them:

1. TSV known as Tab-Separated value as the name defines is like a CSV file, but it uses tabs to separate the data instead of commas. It is mostly preferred to use when the data itself contains commas, therefore tabs are used to reduce conflict.

EMPLOYEE_ID \t FIRST_NAME \t LAST_NAME \t EMAIL \t PHONE_NUMBER
198 \t Donald \t OConnell \t DOCONNEL \t 650.507.9833  

2. DSV known as Delimiter-separated values used to define the custom delimiters other than commas and tabs. It allows the developer to create its own delimiters like pipe '|' or semicolon ';' for separating the data. Furthermore, it is preferred in the case where commas and tabs are not suitable.

EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER 
198 | Donald | OConnell | DOCONNEL | 650.507.9833  

Reading CSV Files

Overview of Python's Built-in 'CSV' Module

We can parse the CSV data using the list comprehension, but sometimes data contains multiple values for example, in books CSV files, there can be multiple authors' names separated by commas, so splitting on commas would split those authors' names. Therefore, handling problems like this is one of the many reasons why you should use the CSV module. Python provides us with the CSV module for performing the file handling operations in CSV format. To perform the operations, you must import the library of CSV.

Reading Data from a CSV File: Using the 'csv.reader()' Function.

1. Import CSV library First, import the CSV module that provides the functionality of reading and writing the CSV file.

import csv  

2. Open the CSV file: Use the open() function to open the file. Then, provide the path of the file with the mode of reading 'r' as we are performing reading operations.

3. Create the CSV Reader Function: After opening the file, use a csv.reader() function to parse the CSV data from the file by passing the file as an argument. Then, save the data into the object name as csv_content.

4. Iterate over Record: Use for loop for performing iteration over all the csv records and then print each record separately.

import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory        
with open(file_path, 'r') as file: #here, the file open operation is performed with read mode    
    csv_content = csv.reader(file) #read the file using csvreader function and save it into variable
    for record in csv_content: #iterate through the rows
        print(record) #print the record 

Output

Read CSV Files in Python: Using '.readlines()'

The second way to read the CSV file is using the .readlines() function of list comprehension. By using this way, you must strip the string data by removing all the white spaces and then this string will be split as a list as a substring where commas are used and generate the string of the list.

import csv #importing the csv module 
file_path = "Sample-CSV.csv" #provide location of file in the same directory          
with open(file_path, 'r') as file: #here, the file open operation is performed with read mode    
    rows_csv = file.readlines() 
    for row in rows_csv: #iterate through the lines 
        lines = row.strip().split(',') #this will remove the white space and then convert the strings into list 
        print(lines)  

Output

Read CSV Files in Python: Using Pandas

Another way of reading the CSV file is using the panda's library. It is the most widely used library in Python and provides various functionalities for reading and writing data. The function read_csv() is used for reading the record from the memory and returns it as a pandas data frame. But first, we have to import the pandas library.

import pandas as pd #importing the pandas library 
file_path = "Sample-CSV.csv" #provide location of file in the same directory        
result = pd.read_csv(file_path) #read the file and return as pandas data frame 
print(result) 
Tip: If you are dealing with large CSV file then use Pandas library instead of csv module.

Output

Read CSV Files in Python: Using 'csv.DictReader()'

Now, if you want to access or read the data in the form of a dictionary, then you must import the csv module. The csv module provides various functionalities for reading the data and one of them is the Dictreader() function used to read the data in the form of a dictionary.

import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory       
with open(file_path, 'r') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.DictReader(file)
    for row in record_csv:
        print(row)

Output

Customizing CSV Reading Behavior

You can also customize the data of your CSV file by specifying the delimiters, quoting characters, and handling headers. This is done by using the csv module with csv.reader() function. Alternatively, you can also write the delimiter that you want to replace and some quoting characters as well. Also, using the next keyword, you can skip the header while reading the file.

import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory        
with open(file_path, 'r', newline = '') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.reader(file, delimiter=';', quotechar='"') #this will create the data with custom settings
    first_row = next(record_csv) #skip the header of the csv file
    for rows in record_csv:
        print(rows)

Output

Writing CSV Files

Writing Data to a CSV file: Using the 'csv.writer()' Function

1. Import CSV library: First, import the CSV module that provides the functionality of reading and writing the CSV file.

import csv  

2. Open the CSV File: Use the open() function to open the file. Then, provide the path of the file with the mode of writing 'w' as we are performing writing operations.

3. Create the CSV Writer Function: After opening the file, use a csv.writer() function to parse the CSV data from the file by passing the file as an argument. Then, save the data into the object name as csv_content.

4. Write data to the file: Use the writerow() function of the record_csv object to write the data of each row. Here, the data is as a list or tuples containing the column values of each row. Always specify the header row first, before the other rows using the same function.

import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory             
with open(file_path, 'w', newline='') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.writer(file) #this will create the data with custom settings        record_csv.writerow(['EMPLOYEE_ID','FIRST_NAME','LAST_NAME','EMAIL','PHONE_NUMBER','JOB_ID','SALARY'])#write header        record_csv.writerow(['03','David','Paul,*Emails are not allowed*','88886433333','14','15000$'])    
with open(file_path, 'r', newline = '') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.reader(file)
    for rows in record_csv: #iterate to read the data
        print(rows)
Note: Always consider including the header in your CSV file for understanding the data better.

Output

Appending Data to an Existing CSV File

When you use the ‘w’ write mode, all the current data from the file will be truncated. But if you want to add the rows in the existing CSV file without truncating the data, you can attain it by defining the file opening mode as append ‘a’. This way the indicator will be positioned at the end of the file.

import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory             
with open(file_path, 'a', newline='') as file: #here, the file open operation is performed with append mode   
    record_csv = csv.writer(file) #this will use the writer function
    record_csv.writerow(['04','Henry','Ferned','*Emails are not allowed*','88898763333','15','10000$'])   
with open(file_path, 'r', newline = '') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.reader(file)
    for rows in record_csv: #iterate to read the data
        print(rows) 
Caution: Always back up the CSV file when dealing with the operation to avoid accidentally removing the entire file, so use append mode instead of write mode.

Output

Handling Unicode Data

If your CSV file contains a non-ASCII character, use the UTF-8 encoding scheme while opening the file. It will ensure the proper encoding and formatting of the data within the file.

import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory       
with open(file_path, 'a', newline='', encoding='utf-8') as file: #here, the file open operation is performed with append mode   
    record_csv = csv.writer(file) #this will use the writer function
    record_csv.writerow(['06','Renée','Jürgen','jü*Emails are not allowed*','88898763333','15','10000$'])  
with open(file_path, 'r', newline = '', encoding='utf-8') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.reader(file)
    for rows in record_csv: #iterate to read the data
        print(rows) 

Output

CSV File Operations and Data Manipulation

Filtering and Transforming CSV data

Filtering involves checking the data based on the specific condition and selecting the rows. Whereas, transforming means changing the values of one or more columns. You can perform both these operations by using the for loop and some conditional statements to access the specific row. Let's say we have a CSV file with employee data, and we want to select only rows where the employee age is above 40 but before that, we want to transform the ages by adding 5 in each employee age.

import csv #importing the csv module
file_path = "Employee_csv.csv" #provide location of file in the same directory       
with open(file_path, 'r', newline='') as file: #here, the file open operation is performed with read mode   
    record_csv = csv.reader(file)
    header = next(record_csv) #skip the header
    age_index = header.index('Age') #get the age column index
    for rows in record_csv: #iterate to read the data
        age = int(rows[age_index]) # convert the string data of age into int
        tform_age = age + 5 # this will transform the age by adding 5 to employee age
        if age > 40: #check the condition for filtering
            print(rows)

Output

Aggregating and Summarizing CSV data

For advanced data manipulation tasks, pandas is the powerful library that provides the leverage to work with structured data like CSV files. To aggregate and summarize the CSV data across multiple rows, such as calculating the average, mean, etc. First, you need to import the pandas library. It has many built-in functions like count(), max(), min(), and mean() for summarizing the data.

import pandas as pd #importing the panda module
file_path = "Employee_csv.csv" #provide location of file in the same directory       
dataframe = pd.read_csv(file_path)
#aggregating techniques
total_age = dataframe['Age'].count()
print("Total Age count:", total_age)
#performing summarizing techniques
age_min = dataframe['Age'].min()
age_max = dataframe['Age'].max()
age_mean = dataframe['Age'].mean()
print("Maximum Age:",age_max)
print("Minimum Age", age_min,)
print("Mean of all Ages:", age_mean)

Output

FAQs Q: How can I create and write data to a file?
A: When you define the mode as 'w' while opening the file, it will create the file if it does not exist and then write the data in it.
Q: How do I write to an existing CSV file in Python?
A: You can write to an existing CSV file by using the append mode while opening the file.
Q: Can I read and write to a CSV at the same time?
A: Yes, You can perform both operations at the same time in Python by using the 'r+' mode while opening the file.

Wrapping Up

In conclusion, we discussed in detail the working of CSV files, how CSV file is a popular format in the world of programming, and data analysis for exchanging data between different applications. Throughout the article, we saw the valuable function of csv.reader() and csv.writer(), by which you can perform reading and writing operations in CSV files. Furthermore, we looked at some other ways of reading the files. However, these functions show how Python is the best choice for handling the csv file, whether you are performing data manipulation or integration with another system. I hope this guide is helpful for you. If you have any concerns or questions, feel free to give feedback. Thank you for reading this guide. Happy Coding!


Reference

CSV File in Python

If you read this far, tweet to the author to show them you care. Tweet a Thanks

More Posts

Opening & Reading Files Handling in Python

Abdul Daim - Apr 12

Writing to Files In Python

Abdul Daim - Apr 12

Python Writing to Files

Abdul Daim - Mar 19

Working With JSON File in Python

Abdul Daim - Mar 18

Python Opening & Reading Files

Abdul Daim - Mar 8
chevron_left