CSV stands for Comma-separated Values, a popular format for storing tabular data in a spreadsheet or database. Most of the time, people use databases for large amounts of data and spreadsheets for small amounts. But CSVs still have their places. They are simple and convenient. No drivers or special APIs are needed to use them. Python makes them even simpler with the CSV module. So, in this tutorial, I will be talking about the basic understanding of CSV files and how to read, parse, and write CSV files, with some advanced concepts of dealing with CSV efficiently. So, let's get started!
Understanding CSV Files
Overview of CSV File Format
A CSV is a text file that contains the data. Oftentimes, the first row in the file is a header, letting you know what the values represent, and the remaining lines contain the data. Here, each row represents a record in the database. Its plain text format makes it easy and nifty for the website developers to create it. Further, CSV files are also easy to import, and you can import the CSV file from a spreadsheet or a database.
CSV Format: Delimiters, Line Break, and Data Separation
In each row, the pieces of data are separated by commas known as delimiters. Also, you can use some other delimiters too based on the requirement, such as semicolon ';' tabs '\t' and pipe '|'. Alternatively, to separate the row or record, we need breaks or new lines, and this can be done by using the '\n' delimiter. Furthermore, there is no specific datatype, but you may mentally interpret the data as strings, dates, and numbers, and everything is represented as a string. Also, whenever we see two commas in a row, that means a piece of data is simply missing.
Remember, data is separated by commas because this is a text file, and no datatype is defined with the values.
Different Variations of CSV File
CSV is a plain text file that is widely used for storing and exchanging data, but it has multiple variations based on the distinct delimiters used in the data. The popularity of each variation depends on the needs of the software application involved and the specific use case of the user preference. Let's discuss some of them:
1. TSV known as Tab-Separated value as the name defines is like a CSV file, but it uses tabs to separate the data instead of commas. It is mostly preferred to use when the data itself contains commas, therefore tabs are used to reduce conflict.
EMPLOYEE_ID \t FIRST_NAME \t LAST_NAME \t EMAIL \t PHONE_NUMBER
198 \t Donald \t OConnell \t DOCONNEL \t 650.507.9833
2. DSV known as Delimiter-separated values used to define the custom delimiters other than commas and tabs. It allows the developer to create its own delimiters like pipe '|' or semicolon ';' for separating the data. Furthermore, it is preferred in the case where commas and tabs are not suitable.
EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER
198 | Donald | OConnell | DOCONNEL | 650.507.9833
Reading CSV Files
Overview of Python's Built-in 'CSV' Module
We can parse the CSV data using the list comprehension, but sometimes data contains multiple values for example, in books CSV files, there can be multiple authors' names separated by commas, so splitting on commas would split those authors' names. Therefore, handling problems like this is one of the many reasons why you should use the CSV module.
Reading Data from a CSV File: Using the 'csv.reader()' Function.
1. Import CSV library First, import the CSV module that provides the functionality of reading and writing the CSV file.
import csv
2. Open the CSV file: Use the open() function to open the file. Then, provide the path of the file with the mode of reading 'r' as we are performing reading operations.
3. Create the CSV Reader Function: After opening the file, use a csv.reader() function to parse the CSV data from the file by passing the file as an argument. Then, save the data into the object name as csv_content.
4. Iterate over Record: Use for loop for performing iteration over all the csv records and then print each record separately.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'r') as file: #here, the file open operation is performed with read mode
csv_content = csv.reader(file) #read the file using csvreader function and save it into variable
for record in csv_content: #iterate through the rows
print(record) #print the record
Output
Read CSV Files in Python: Using '.readlines()'
The second way to read the CSV file is using the .readlines() function of list comprehension. By using this way, you must strip the string data by removing all the white spaces and then this string will be split as a list as a substring where commas are used and generate the string of the list.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'r') as file: #here, the file open operation is performed with read mode
rows_csv = file.readlines()
for row in rows_csv: #iterate through the lines
lines = row.strip().split(',') #this will remove the white space and then convert the strings into list
print(lines)
Output
Read CSV Files in Python: Using Pandas
Another way of reading the CSV file is using the panda's library. The function read_csv() is used for reading the record from the memory and returns it as a pandas data frame. But first, we have to import the pandas library.
import pandas as pd #importing the pandas library
file_path = "Sample-CSV.csv" #provide location of file in the same directory
result = pd.read_csv(file_path) #read the file and return as pandas data frame
print(result)
If you are dealing with large CSV file then use Pandas library instead of csv module.
Output
Read CSV Files in Python: Using 'csv.DictReader()'
Now, if you want to access or read the data in the form of a dictionary, then you must import the csv module. The csv module provides various functionalities for reading the data and one of them is the Dictreader() function used to read the data in the form of a dictionary.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'r') as file: #here, the file open operation is performed with read mode
record_csv = csv.DictReader(file)
for row in record_csv:
print(row)
Output
Customizing CSV Reading Behavior
You can also customize the data of your CSV file by specifying the delimiters, quoting characters, and handling headers. This is done by using the csv module with csv.reader() function. Alternatively, you can also write the delimiter that you want to replace and some quoting characters as well. Also, using the next keyword, you can skip the header while reading the file.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'r', newline = '') as file: #here, the file open operation is performed with read mode
record_csv = csv.reader(file, delimiter=';', quotechar='"') #this will create the data with custom settings
first_row = next(record_csv) #skip the header of the csv file
for rows in record_csv:
print(rows)
Output
Writing CSV Files
Writing Data to a CSV file: Using the 'csv.writer()' Function
1. Import CSV library: First, import the CSV module that provides the functionality of reading and writing the CSV file.
import csv
2. Open the CSV File: Use the open() function to open the file. Then, provide the path of the file with the mode of writing 'w' as we are performing writing operations.
3. Create the CSV Writer Function: After opening the file, use a csv.writer() function to parse the CSV data from the file by passing the file as an argument. Then, save the data into the object name as csv_content.
4. Write data to the file: Use the writerow() function of the record_csv object to write the data of each row. Here, the data is as a list or tuples containing the column values of each row. Always specify the header row first, before the other rows using the same function.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'w', newline='') as file: #here, the file open operation is performed with read mode
record_csv = csv.writer(file) #this will create the data with custom settings record_csv.writerow(['EMPLOYEE_ID','FIRST_NAME','LAST_NAME','EMAIL','PHONE_NUMBER','JOB_ID','SALARY'])#write header record_csv.writerow(['03','David','Paul,*Emails are not allowed*','88886433333','14','15000$'])
with open(file_path, 'r', newline = '') as file: #here, the file open operation is performed with read mode
record_csv = csv.reader(file)
for rows in record_csv: #iterate to read the data
print(rows)
Always consider including the header in your CSV file for understanding the data better.
Output
Appending Data to an Existing CSV File
When you use the ‘w’ write mode, all the current data from the file will be truncated. But if you want to add the rows in the existing CSV file without truncating the data, you can attain it by defining the file opening mode as append ‘a’. This way the indicator will be positioned at the end of the file.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'a', newline='') as file: #here, the file open operation is performed with append mode
record_csv = csv.writer(file) #this will use the writer function
record_csv.writerow(['04','Henry','Ferned','*Emails are not allowed*','88898763333','15','10000$'])
with open(file_path, 'r', newline = '') as file: #here, the file open operation is performed with read mode
record_csv = csv.reader(file)
for rows in record_csv: #iterate to read the data
print(rows)
Always back up the CSV file when dealing with the operation to avoid accidentally removing the entire file, so use append mode instead of write mode.
Output
Handling Unicode Data
If your CSV file contains a non-ASCII character, use the UTF-8 encoding scheme while opening the file. It will ensure the proper encoding and formatting of the data within the file.
import csv #importing the csv module
file_path = "Sample-CSV.csv" #provide location of file in the same directory
with open(file_path, 'a', newline='', encoding='utf-8') as file: #here, the file open operation is performed with append mode
record_csv = csv.writer(file) #this will use the writer function
record_csv.writerow(['06','Renée','Jürgen','jü*Emails are not allowed*','88898763333','15','10000$'])
with open(file_path, 'r', newline = '', encoding='utf-8') as file: #here, the file open operation is performed with read mode
record_csv = csv.reader(file)
for rows in record_csv: #iterate to read the data
print(rows)
Output
CSV File Operations and Data Manipulation
Filtering and Transforming CSV data
Filtering involves checking the data based on the specific condition and selecting the rows. Whereas, transforming means changing the values of one or more columns. You can perform both these operations by using the for loop and some conditional statements to access the specific row. Let's say we have a CSV file with employee data, and we want to select only rows where the employee age is above 40 but before that, we want to transform the ages by adding 5 in each employee age.
import csv #importing the csv module
file_path = "Employee_csv.csv" #provide location of file in the same directory
with open(file_path, 'r', newline='') as file: #here, the file open operation is performed with read mode
record_csv = csv.reader(file)
header = next(record_csv) #skip the header
age_index = header.index('Age') #get the age column index
for rows in record_csv: #iterate to read the data
age = int(rows[age_index]) # convert the string data of age into int
tform_age = age + 5 # this will transform the age by adding 5 to employee age
if age > 40: #check the condition for filtering
print(rows)
Output
Aggregating and Summarizing CSV data
For advanced data manipulation tasks, pandas is the powerful library that provides the leverage to work with structured data like CSV files. To aggregate and summarize the CSV data across multiple rows, such as calculating the average, mean, etc. First, you need to import the pandas library. It has many built-in functions like count(), max(), min(), and mean() for summarizing the data.
import pandas as pd #importing the panda module
file_path = "Employee_csv.csv" #provide location of file in the same directory
dataframe = pd.read_csv(file_path)
#aggregating techniques
total_age = dataframe['Age'].count()
print("Total Age count:", total_age)
#performing summarizing techniques
age_min = dataframe['Age'].min()
age_max = dataframe['Age'].max()
age_mean = dataframe['Age'].mean()
print("Maximum Age:",age_max)
print("Minimum Age", age_min,)
print("Mean of all Ages:", age_mean)
Output
Q: How can I create and write data to a file?
A: When you define the mode as 'w' while opening the file, it will create the file if it does not exist and then write the data in it.
Q: How do I write to an existing CSV file in Python?
A: You can write to an existing CSV file by using the append mode while opening the file.
Q: Can I read and write to a CSV at the same time?
A: Yes, You can perform both operations at the same time in Python by using the 'r+' mode while opening the file.
Wrapping Up
In conclusion, we discussed in detail the working of CSV files, how CSV file is a popular format in the world of programming, and data analysis for exchanging data between different applications. Throughout the article, we saw the valuable function of csv.reader() and csv.writer(), by which you can perform reading and writing operations in CSV files. Furthermore, we looked at some other ways of reading the files. However, these functions show how Python is the best choice for handling the csv file, whether you are performing data manipulation or integration with another system. I hope this guide is helpful for you. If you have any concerns or questions, feel free to give feedback. Thank you for reading this guide. Happy Coding!
Reference
CSV File in Python