How to Build a Movie Recommender System AI

How to Build a Movie Recommender System AI

Table of Contents

  1. Introduction
  2. Building a Movie Recommender System AI
  3. Getting Data for the Recommender System
  4. Basic Analysis of the MovieLens Dataset
  5. Personalizing the Dataset with Our Own Movie Ratings
  6. User-User Collaborative Filtering
  7. Creating a Jabril-Green-bot Hybrid Dataset
  8. Getting Movie Recommendations for Both of Us
  9. Pros and Cons of Different Neighborhood Sizes
  10. Conclusion
  11. FAQ

Building a Movie Recommender System AI

Hey there! I'm Jabril, and in this article, I'm going to Show You how to build a movie recommender system AI. Recommender systems are AIs that use information about something and its social ratings to recommend new things to people. These things can be ads, products, YouTube videos, or pretty much anything like that. Today, I'm going to build a recommender system for movies to hopefully find a new movie that both me and John-Green-bot want to watch for our next movie night.

Getting Data for the Recommender System

Like in previous labs, I'll be writing all of the code using a language called Python in a tool called Google Colaboratory. And as you Read this article, you can follow along with the code in your browser from the link we put in the description. In these Colaboratory files, there's some regular text explaining what we're trying to do, and pieces of code that you can run by pushing the play button. These pieces of code build on each other, so keep in mind that you have to run them in order from top to bottom, otherwise, you might get an error. To actually run the code or make changes to it, you'll have to either click "open in playground" at the top of the page or open the File menu and click "Save a Copy to Drive". And just an fyi: you'll need a Google account for this.

If I'm going to build a movie-recommending AI, the first thing I know is that AI systems need data. I'll need to find and import a dataset of movies, and ideally, it'll already have ratings given by lots of different people to lots of different movies, so I won't have to go through and rank every single movie by myself. That would take a while.

Thankfully, I'm using an existing dataset published by MovieLens, which has about 100,000 user ratings for about 10,000 different movies. MovieLens has bigger datasets available, going up to tens of millions of ratings, but this smaller set should be enough to plan movie nights for John-Green-bot and me. I'm also going to use a library known as LensKit, which comes built-in with some nice tools for building recommender systems.

Basic Analysis of the MovieLens Dataset

Now that I have the data, let's do some basic analysis. Let's start by finding some generic recommendations, like the top-rated movies in both John-Green-bot's and my favorite genres. Maybe we'll get lucky and find a movie we both want to watch and haven't seen yet on those lists. But I don't really have hope for that because we like such different movies.

So, John-Green-bot and I will need to personalize this dataset by providing some of our own movie ratings. Then, I'll use a technique known as user-user collaborative filtering to generate a set of recommendations for both me and John-Green-bot. Hopefully, there will be some overlap on those recommendation lists.

Personalizing the Dataset with Our Own Movie Ratings

To personalize our recommender system AI, we need to give it our own movie data. Okay, we've got two spreadsheets now, but I don't think that they're in the right format for LensKit, so I need to check the documentation which is linked it in the description. It looks like I need to import our spreadsheets and store the data in item-rating pairs just like the original dataset. Thankfully, Python is great for changing data formats.

User-User Collaborative Filtering

In user-user collaborative filtering, each item is its own dimension. So if we have 10,000 movies in our dataset, that's 10,000 Dimensions. We're not even going to try to Visualize that, but we can understand the logic behind user-user collaborative filtering with a two-movie example.

To be totally honest, this is going to be a pretty Simplified explanation of what the user-user algorithm does. Dealing with thousands of dimensions and lots of missing data requires a lot of clever linear algebra and statistics. But I can use the LensKit library to do this math and understand what's happening conceptually, without diving under the hood.

Creating a Jabril-Green-bot Hybrid Dataset

This is the beauty of representing movies we like as lists of numbers! I can Create a Jabril-Green-bot hybrid dataset. If both of us have rated a movie, I'll use the average of our ratings. Using the two-axis graph of Inception and The Notebook from before, this would place our Jabril-Green-bot hybrid around here. And if only one of us has rated a movie, I'll just add that movie rating to the list.

Getting Movie Recommendations for Both of Us

Now that the AI system has run the user-user collaborative filtering algorithm and has clusters, I can give it our personal ratings to get its top 10 recommended movies for both John-Green-bot and me! Remember, for each of us, the user-user algorithm finds a neighborhood of similar users Based on their movie ratings compared to ours. The algorithm looks for movies that people in that neighborhood have seen and rated, that we haven't seen yet. And based on the ratings in our neighborhoods, the algorithm will predict how we might rate each of those movies, and print a list of its "top 10" recommendations for us.

Pros and Cons of Different Neighborhood Sizes

There isn't really a "best" minimum and maximum neighborhood size. It really depends on what I want this AI to recommend. Different parameters have different pros and cons. A small neighborhood size would mean the AI considers fewer people who have more similar movie tastes, and it has less data to make predictions. So I'm more likely to run into the "Bill Hicks: Revelations" situation from earlier, which was when recommendations of surprising or obscure movies were based on what a few people like. A big neighborhood size would mean the AI considers more people who have less similar movie tastes, and it has more data to make predictions. So I'm more likely to get movie recommendations that are generally popular and more widely known. Figuring out the best approach to clustering requires a lot of tinkering.

Conclusion

And that's it! We've successfully built a movie recommender system AI that can give us personalized movie recommendations based on our ratings. Of course, this is just the beginning. There are many ways to improve this system, such as using more advanced machine learning techniques or incorporating more data sources. But for now, we have a system that can help us find a movie to watch on our next movie night.

FAQ

Q: What is a recommender system? A: A recommender system is an AI that uses information about something and its social ratings to recommend new things to people. These things can be ads, products, YouTube videos, or pretty much anything like that.

Q: What is user-user collaborative filtering? A: User-user collaborative filtering is a technique used in recommender systems that clusters users based on their movie ratings and recommends movies based on the ratings of similar users.

Q: What is the Jabril-Green-bot hybrid dataset? A: The Jabril-Green-bot hybrid dataset is a dataset that combines the movie ratings of both Jabril and John-Green-bot to get personalized movie recommendations for both of them.

Q: What are the pros and cons of different neighborhood sizes? A: A small neighborhood size would mean the AI considers fewer people who have more similar movie tastes, and it has less data to make predictions. So you're more likely to run into the "Bill Hicks: Revelations" situation from earlier, which was when recommendations of surprising or obscure movies were based on what a few people like. A big neighborhood size would mean the AI considers more people who have less similar movie tastes, and it has more data to make predictions. So you're more likely to get movie recommendations that are generally popular and more widely known.

Find AI tools in Toolify

Join TOOLIFY to find the ai tools

Get started

Sign Up
App rating
4.9
AI Tools
20k+
Trusted Users
5000+
No complicated
No difficulty
Free forever
Browse More Content