

Overview:
I developed a Content-Based Movie Recommender System that suggests movies to users based on their interests and the similarity of the content. This project explores the core concepts of Natural Language Processing (NLP) and Machine Learning to provide personalized recommendations using the TMDB 5000 Movies dataset.
How it Works (Technical Workflow):
Data Preprocessing: Cleaned the metadata and extracted relevant features like genres, keywords, cast, and crew.
Text Normalization: Applied Stemming using NLTK’s PorterStemmer to ensure consistency in movie tags.
Vectorization: Used CountVectorizer (Bag of Words) to convert textual tags into a 5,000-dimensional numerical space.
Mathematical Engine: Implemented Cosine Similarity to measure the distance between movie vectors. The system identifies movies with the smallest angular distance as the most relevant recommendations.
Key Features:
Handles large-scale movie metadata efficiently.
Provides top 5 personalized suggestions in real-time.
Built using a modular approach for easy integration with web frameworks like Streamlit.
Tools & Technologies:
Language: Python
Libraries: Pandas, NumPy, Scikit-learn, NLTK
Environment: Google Colab / VS Code
Learning Experience:
Coming from a pre-medical background into Electrical Engineering at NUST, this project was a significant milestone in my journey toward mastering Data Science. It helped me understand the practical application of linear algebra in recommendation engines.