Introduction
I’m sure you have listened about CSV datasets. If you are here, it's because you're interested in Artificial Intelligence (AI). AI development is often associated with Machine Learning and huge datasets with big databases. Maybe you have a big idea in mind; you search on YouTube or Google, and the first thing you find out is that if your project doesn't require a large amount of data, a .csv dataset is suitable for this. That is true. It is also true that CSV datasets are very useful for data analysis and are used by enterprises to store and save their data.
I feel that when we talk about CSV datasets, not all is clear. There is much information, but it is not organized in a single place with simple words to allow beginners to understand and use them effectively. The goal of this post is to understand what CSV datasets are, the differences between Machine Learning and Expert Systems, how traditional Expert Systems allow us to automate many tasks, when and how to use .csv datasets, and what the main uses of CSV datasets are.
What this post will not teach you
In this post, I will not explain how to write code that makes CSV files work. This post is centered on CSV files/datasets. Perhaps next week, in another post, I will explain how to create a basic chatbot, assistant, or enterprise bot.
Csv dataset definition
I could use a technical concept from external sources to give a definition of a CSV file, also called CSV dataset or Comma Separated values, but it’s not necessary. Instead, I'll define it and explain it from my understanding and with my own words. A CSV dataset is a collection of structured/organized data often used for creating small artificial intelligence models that do not require much data. They are organized in variables. For example:
# Variable
question, answer
# Information
What is a csv dataset?, is a collection of data saved very well organized.
How can i create a csv dataset?, at the end of the file add the extension .scv
# As you can see in the example the variables are separated with a comma
In the example above the part before the comma is the question and the next part is the answer, the order have to match with the variables names defined on the first line.
In this screenshot, I have a dataset with three variables: image_path refers to the source of the image, image_name is the name of the image, and image_description is, as you can imagine, the description given to the image. In this particular case, I'm working with image recognition. However, if you need to work only with messages and answers, the pattern will be the same but with two variables: question and answer, as shown in the next example.
image_path, image_name, image_description
static/images/tucan.jpg,Tucán,Un ave colorida con un gran pico, nativa de las selvas tropicales.
static/images/jaguar.jpeg,Jaguar,Un felino grande y poderoso que habita en América Latina.
static/images/tapir.jpeg,Tapir,Un mamífero grande y herbívoro que se encuentra en las selvas tropicales.
static/images/mango.jpeg,Mango,Un fruto tropical dulce y jugoso, muy popular en América Latina.
static/images/cafe.jpeg,Café,Granos que se utilizan para hacer una bebida estimulante conocida mundialmente.
static/images/guayaba.jpeg,Guayaba,Una fruta tropical rica en vitamina C, con un sabor dulce y ácido.
Tools to create a CSV file
It depends on your goals or the type of data you’re working with. CSV files can be created using Notepad++, Microsoft Excel, or a code editor like Visual Studio or PyCharm.
Creating a CSV dataset
To create a CSV file using Excel, select the type CSV (comma delimited file, using a text or code editor, name your file and add the extension .csv, for example, dataset.csv you can use the file for creating a dataset on external services like Oracle, or any other cloud service provider.
Reading a CSV dataset
If you need to open and edit your CSV dataset, simply go to your files, locate the CSV dataset, right-click on it, and select "Open with." You can open it with Notepad or a code editor like Visual Studio Code. Once opened, you will be able to read, edit, and save your changes.
Differences between Machine Learning and Expert Systems
First of all, why am I talking about Machine Learning here? When somebody talks about artificial intelligence, we often associate it with Machine Learning or Natural Language Processing or think that it is something different apart from AI. But the reality is that Machine Learning, Natural Language Processing, Computer Vision, Expert Systems, and many others are types of Artificial Intelligence.
The traditional Expert System is a method of artificial intelligence that uses CSV files in their datasets. In this AI model, the developer defines the possible messages or characteristics and provides specific answers if one of those possible messages or characteristics is present.
In a Machine Learning model, you provide a database or a large collection of data, and the model itself uses this data to compare the content of the message with the existing information in the database. If this data coincides with one element in the database, the Machine Learning model will give an answer based on this data.
Both methods have their advantages and disadvantages. Machine Learning models are very useful when working with large language models and other projects with high level of unknowns, however the answers provided by these models can sometimes be inaccurate. So, when we need a bot that provides very accurate, precise answers, we use Expert System models trained with a CSV dataset.
How do traditional Expert Systems allow us to automate tasks?
Let me use an analogy so that we can understand very well how to use CSV datasets with a traditional Expert System AI model to automate repetitive tasks. Imagine that you are a small, medium, or large business that sells products, cellphones, let's say. To attract more clients, you provide support via WhatsApp or on your website. Those clients will ask specific questions about the prices of specific models. They will ask about promotions, available models, schedules of attention, and so on. All those tasks are repetitive and time-consuming. A good amount of those clients will ask questions and not buy, so you're losing your valuable time. What if we create a bot that can answer each question from each client automatically, specifically, and instantly?
Problems like this can be solved using a CSV dataset in which you define the answer for every possible question or message. Of course, you'll need to develop a simple model capable of reading the file, receiving the message sent to the server, and responding with the specific answer. Now, when the client asks something, the bot will respond instantly, reducing time, work, and providing a good experience for the user.
Uses of .csv datasets?
Use a .csv dataset anytime you need to develop a product (AI model) whose first requisite is very accurate and specific answers, as we discussed earlier. In simple terms, this kind of dataset is used for creating Expert System models, but a CSV dataset can also be used to store data in a structured or organized way. Data that can later be used for data analysis and facilitate decision-making for administrative personnel.
Main uses of CSV datasets explained
Chatbots: CSV datasets can be used to create assistant bots for small, medium, and large businesses or enterprises. These bots can be integrated with Twilio, WhatsApp, Telegram, or a website. Next week, I'll publish a tutorial to explain step-by-step how to create a chatbot.
Data store: Save data in an organized manner. You can also use other tools that interact with the data you want to save to write a CSV file or update it automatically.
Data analysis tools: CSV datasets can be easily manipulated with Python libraries like Pandas and Matplotlib for data analysis and visualization.
Image recognition (integrated with other tools): As shown in the earlier screenshot, you can create a CSV dataset with a specific structure to integrate with other tools for image recognition systems.
Other uses
How to use?
The answer to this question depends on the goal or what you’re using it for. Let's say that if you want to create an assistant chatbot, the first step is to create your CSV file with the specific possible questions and the respective answers for each question. But that is not all; you will also need a programming language to train the model (the Expert System I explained earlier). Additionally, this model needs to be integrated into a website or with services like WhatsApp, Twilio, or others. The "how-to" is not covered in this post because it needs more space to cover the step-by-step integration. In this post, as I explained in the introduction, my goal is to understand what CSV datasets are and how we can use them to make our lives easier.
Advantages of a CSV datasets
- Accurate and specific answers: When used in assistants or chatbots, you can specify the answer for the possible questions or messages.
- Personalization: You can create or update your data according to your needs.
- Organization: You can save or write your data from other programs organized in variables according to your needs, which will improve readability and allow this data to be integrated with other programs, libraries, or tools in general to analyze its information.
Conclusion
A CSV dataset is a collection of organized data. These files can be used in combination with the traditional artificial intelligence methodology called Expert System to create automated bots, assistants, data analysis tools, and many other automation tools for small, medium, and large businesses interested in saving time, money, and work, while providing good, personalized, and instant assistance to their clients. They are also used to store data in a well-organized and structured way to be later used in data analysis, facilitating decision-making.