Spotify's Machine Learning Revolution
Table of Contents
- Introduction
- Spotify's Catalog and Playlists
- Reinforcement Learning and Music
- Programming Language for Music
- Scaling the Product for Users
- Machine Learning and Playlists
- Collaborative Filtering and Success
- Universal Embeddings and Semantic Vectors
- Personalization and Mainstream Recommendations
- Conclusion
Spotify's Catalog and Playlists
Spotify is a music streaming platform that boasts over 50 million songs and tracks, as well as over 3 billion playlists. This means that there are 60 times more playlists than there are songs, which is a staggering statistic. From a machine learning perspective, this presents an interesting challenge. The state space of all the tracks is vast, and there are countless journeys that can be taken through this world. The playlists can be thought of as people helping themselves and each other, creating interesting vectors through this space of tracks.
Reinforcement Learning and Music
If we only think about reinforcement learning, we can see that there are billions of paths that make Sense across many tens of millions of atomic units. We Are probably quite far away from having found all of them. Our job now is to help users navigate this vast space of music. When Spotify started, it was just a search box that was, for that time, pretty powerful. However, it wasn't enough. If You had the catalog and a good search tool, you could Create your own Sessions and a soundtrack for your entire life, perfectly personalized because you did it yourself. But most people aren't that good at music, and they just can't spend the time. Even if you're very good at news, it's going to be hard to keep up.
Programming Language for Music
To Scale the product for people who are good at music, Spotify created a programming language called play listing. If you were pretty good at music, you knew your new releases, your Backyard Low, your Stairway to Heaven. You could create a soundtrack for yourself using this playlist thing. It's like meta programming, a language for music. This sounds like your life, and people who were good at music were able to use it to create really good soundtracks for themselves.
Scaling the Product for Users
The problem was that there were Never enough editors to create playlists for everyone personally. So, Spotify acquired a company called Tunego of editors and professional playlisters and leveraged the maximum of human intelligence to help build these vectors through the track space for people. This broadened the product. The next step was to use statistical means where they could see what happened when they created a playlist. They could see how the songs performed, and they manually iterated the playlist to maximize performance for a large group of people.
Machine Learning and Playlists
The promise of machine learning was to go from group personalization using editors and tools into statistics to individualization. The 3 billion playlists on Spotify are fascinating data that is ripe for machine learning. People are grouping tracks for themselves that have some semantic meaning to them, and then they actually label it with a playlist name as well. In a sense, people are grouping tracks along semantic Dimensions and labeling them. Could you use that information to find that latent embedding? Spotify started playing around with collaborative filtering and saw tremendous success with it.
Collaborative Filtering and Success
Collaborative filtering is a technique used by recommender systems to make predictions about an interest of a user by collecting preferences or taste information from many users. Spotify found that people who playlisted a lot retained much better and had a great experience. The recommendations performed fantastically for people who had very unique taste, probably because all of them playlisted. However, they didn't perform so well for mainstreamers, who actually thought they were a bit too particular and unorthodox.
Universal Embeddings and Semantic Vectors
What's interesting about the 3 billion playlists on Spotify is that they represent a universal embedding that holds across people on this earth. The embeddings do finally reflect the people who playlisted. If you have a lot of indie lovers who playlist, your embeds can perform better. However, there were these latent similarities that were very powerful, and Spotify had them. People were taking these tens of millions of tracks and grouping them along different semantic vectors.
Personalization and Mainstream Recommendations
It was surprising that the algorithms performed best with music aficionados who were really into music and often had a certain taste geared towards a certain Type of music. The recommendations performed poorly for mainstreamers who thought they were too particular and unorthodox. Spotify had a complete opposite of what they expected, success within the hardest problem first, and then had to try to scale to more mainstream recommendations.
Conclusion
Spotify's vast catalog and billions of playlists present an interesting challenge for machine learning. The promise of machine learning is to go from group personalization using editors and tools into statistics to individualization. Collaborative filtering is a technique used by recommender systems to make predictions about an interest of a user by collecting preferences or taste information from many users. The 3 billion playlists on Spotify represent a universal embedding that holds across people on this earth. The algorithms performed best with music aficionados who were really into music and often had a certain taste geared towards a certain type of music.