How to implement a recommender system

Take advantage of matrix factorization and graph algorithms to give the users of your application exactly what they want

How to implement a recommender system
Thinkstock

These days, when it comes to capturing and holding an audience’s attention you must deliver the most relevant content possible. Whether searching for jobs or something more important, like looking at cat memes, your reader expects to see content relevant to them – personalized content. However, getting your users to optimal digital consumption nirvana (coin new term – check!) can be unexpectedly difficult. While it isn’t rocket science, it does involve some brain power…

Companies like Netflix and Pinterest have tackled this issue and pioneered personalization for their audiences. Through strategic integration of content and product recommendations systems, these two digital powerhouses have seen measurable ROI. Netflix says that its personalization platform saves the company more than $1B annually, and Pinterest estimates that more than 40 percent of user engagement is powered by its Related Pins recommender systems.

So what about the rest of us, who don’t have the same kinds of resources? Can we do recommender systems too? Yes, and here’s how. Read on to learn how two commonly used algorithms – matrix factorization and a bipartite graph—can be used to deliver personalization in an application.

Matrix factorization

Perhaps the most common type of recommender system algorithm is matrix factorization. The idea behind matrix factorization is to break a user-item feature matrix into a product of matrices, which end up becoming “user latent” features and “item latent” features. Latent features are hidden features that are derived from observed features using matrix factorization.

recommender systems fig 01 Stream

In the above image, matrix A is a sparse matrix filled with user to item ratings. Ratings can take the form of explicit feedback (thumbs up or thumbs down) or implicit feedback (clicks or view time). One important thing to keep in mind is that it is usually a good idea to add user and item biases to take into account the popularity of items and discretion of users. For instance, some users are more liberal with their ratings and some items are just more popular. Now, we can multiply a user feature vector with the corresponding item feature vector to guess what that user may think of that item.

recommender systems fig 02 Stream

There are many nice features that matrix factorization provides. In addition to allowing us to predict what a user may think of another item, the latent feature vectors of the items and users also allow us to compute similarity between other items and users. Let’s take a look at how Netflix could use something like this.

Matrix factorization example—Netflix

A typical Netflix customer loses interest after perhaps 60 to 90 seconds of browsing. Out of thousands of choices, Netflix needs to show the customer something of interest after viewing between 10 and 20 titles, or risk the customer getting bored and churning. So how does Netflix do that? Netflix has something called the Personalized Video Ranker (PVR). As anyone who uses Netflix knows, there are several different styles of recommendations it shows you (broken out by rows). Let’s take a look at how Netflix could use matrix factorization to populate the Because You Watched and Top Picks rows.

recommender systems fig 03 Netflix

The Because You Watched row is based on video-video similarity. It’s a good thing we just found a way to compute item similarity through item latent feature vectors. 

Let’s pretend that Netflix’s database is made up of the MovieLens 100k dataset, and create a quick code example to show how item similarities could be used using the awesome library LightFM.

The first few lines are taken from the example at the GitHub repo, and give us some interesting information to analyze. Let’s first import the necessary libraries:

from lightfm import LightFM
from lightfm.datasets import fetch_movielens
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

Then download and train a model:

# Load the MovieLens 100k dataset. Only five
# star ratings are treated as positive.
data = fetch_movielens(min_rating=5.0)
# Instantiate and train the model
model = LightFM(loss=’warp’)
model.fit(data[‘train’], epochs=30)

This dataset is so nicely formatted that we can see what the item feature labels are:

print(data[‘item_feature_labels’])

[u’Toy Story (1995)’ u’GoldenEye (1995)’ u’Four Rooms (1995)’ ...,
 u’Sliding Doors (1998)’ u’You So Crazy (1994)’
 u’Scream of Stone (Schrei aus Stein) (1991)’]

Awesome! Now let’s see if we can find the movies most similar to “Toy Story (1995)” and “GoldenEye (1995).”

Movies similar to “Toy Story”:

print(data[‘item_labels’][cosine_similarity(
    model.item_embeddings)[0].argsort()][-10:][::-1])

[u’Toy Story (1995)’ u’Sleepless in Seattle (1993)’ u’Star Wars (1977)’
 u’Independence Day (ID4) (1996)’ u’Grease (1978)’
 u’Better Off Dead... (1985)’ u’Star Trek: Generations (1994)’
 u’Emma (1996)’ u”Jackie Chan’s First Strike (1996)”
 u’Robin Hood: Prince of Thieves (1991)’]

Movies similar to “GoldenEye”:

print(data[‘item_labels’][cosine_similarity(
    model.item_embeddings)[1].argsort()][-10:][::-1])

[u’GoldenEye (1995)’ u’Star Trek IV: The Voyage Home (1986)’
 u’Stargate (1994)’ u’Highlander (1986)’ u’Tombstone (1993)’
 u’Die Hard: With a Vengeance (1995)’ u’Quick and the Dead, The (1995)’
 u’Dragonheart (1996)’ u’True Lies (1994)’ u’Sneakers (1992)’]

You can see that the movie most similar to “Toy Story” is “Toy Story,” and the one most similar to “GoldenEye” is “GoldenEye,” which is a nice sanity check that things are working properly (whew). The rest of the list for these two movies seems pretty relevant. This is good news considering we didn’t have to know anything about the content of these movies, just the interactions that people had with them. Now let’s look into making our Top Picks. 

recommender systems fig 04 Netflix

By computing all predicted ratings for each movie for an individual user, one can easily take the top X movies and present it in a row. LightFM has plenty of good examples showing this. One thing that isn’t abundantly clear is what to do when you have a new user (bummer). In this situation we can’t use standard matrix factorization because we haven’t collected any information about that user’s preferences.

One way to get around this is to suggest the most popular content on the page until we can provide more personalized recommendations. The item biases we calculated can assist in this process. The model we trained above automatically calculated item biases for us. These item biases not only normalize the feature vectors, but also can show us what the most popular movies are.

print(data[‘item_labels’][model.item_biases.argsort()][-10:])

[u’Pulp Fiction (1994)’ u’Toy Story (1995)’
 u’Shawshank Redemption, The (1994)’ u’Return of the Jedi (1983)’
 u”Schindler’s List (1993)” u’English Patient, The (1996)’
 u’Godfather, The (1972)’ u’Fargo (1996)’ u’Titanic (1997)’
 u’Star Wars (1977)’]

I’d say those are some pretty darn good movies!

Graph algorithms

One issue that many larger companies run into is sparsity of the user-item matrix. For example, Pinterest has to power recommendations for 100 billion ideas for 150 million people (yikes), so can expect sparse data for a significant number of users.

One way to tackle this problem is to break the recommendation engine into two parts: candidate generation and personalization. By using graph algorithms to provide 1000 candidates out of billions, it is possible to provide recommendations even when there’s extreme sparsity in the data. 

Graph algorithm example—Pinterest

Pinterest’s goal is to be a visual discovery engine for the Internet. As noted earlier, its Related Pins recommender system drives more than 40 percent of user engagement. However, trying to stuff that into a user-item matrix would cause a whole host of problems. Pinterest has a two-step method to generate relevant content: Use a candidate generation system called Pixie to generate around a 1000 relevant pins, and then further utilize machine learning to do ranking before presenting the pins to the user.

By breaking out pins and boards into two separate sets of nodes and having links represent pins, it is possible to create a bipartite graph of pins and boards:

recommender systems fig 05 Stream

Now when a user pins an item to one of his or her boards it is possible to start a random walk with reset at the specific pin. By taking two steps, one from pin to board, and another from board to pin, and repeating, it is possible to count the number of times each pin was hit and see how relevant it is.

This ends up being a very general model that can be used for lots of recommendation types. Twitter’s Who To Follow algorithm [PDF] works similarly for suggesting follow recommendations. The same algorithm could also be used for movie recommendations, product recommendations, or basically anything that can be broken out into two sets of nodes and made into a bipartite graph. You can read about more applications for graph algorithms in this blogpost.

So while not easy, it is possible to create optimal digital consumption nirvana (see, it’s catching on) for your users through personalization and machine learning. Matrix factorization and graph algorithms are two excellent places to start.

Stream engineer and data scientist Balazs Horanyi has always had a love of data, math, and programming. At Stream, he primarily focuses on personalizing feeds and application engagement using machine learning. 

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.