How Many Fists Does It Take to Get a Good Movie Rating? - Using the Apriori Algorithm to Analyse Bud Spencer and Terence Hill Movies
7 min read
Disclaimer
As usual, I will not go too deep into the mathematics of the algorithm in order to not scare away or bore people without a math background or simply no interest in it. If you are interested in the calculations, feel free to reach out to me directly via mail. If you are a member on Medium, I can also recommend the blog post published by Towards Data Science: Apriori Algorithm for Association Rule Learning — How To Find Clear Links Between Transactions.
The apriori algorithm is a popular method in market basket analysis to identify pairs of items which are bought together frequently. This knowledge enables retailers to create bundles or discounts.
In general, the apriori algorithm identifies associations between data points based on frequency. It can also be used to identify possible drivers for high or low frequencies of pairings.
In this blog post, I would like to show you the possible applications of this simple algorithm outside of retail. Let’s find out whether Bud Spencer and Terence Hill were better off as a team or rather by themselves.
The Troublemakers
Growing up with only one radio and tv in the house, my sisters and me mostly had to listen to or watch what our parents preferred. Thankfully, their taste was usually quite good. They would watch movies combining both action and comedy. A sure entertainment in our household would always be an action comedy with Bud Spencer and Terence Hill.
I remember having watched movies starring only either of the two actors, but would prefer the ones with them as a team. Is it just my subjective memory that those movies were more fun to watch, or do other people agree as well? Let’s see what mathematics has to say about that by analysing the IMDB movie ratings.
If the Metric Doesn’t Fit - Then Make It Fit
For the analysis an algorithm is needed which can identify causality. The apriori algorithm can do that for specific data types. It is designed for data containing transactions, in other words the data type of interest needs to be countable.
The movie rating is not a countable metric. However, it can be transformed into a countable data point. Instead of looking at the rating values, I classify them into good and not good ratings.
The threshold for the decision is rather subjective. I decided on setting it at 6. The average rating for all movies starring the 2 actors is 6.3, with a max rating of 8.7. Bud Spencer’s highest rating is 7.4.
When setting the threshold higher than 6, too many movies are excluded, and this analysis would turn out rather meaningless for a blog post. You know the saying “Only trust a statistic you forged yourself” - that’s what’s basically happening to all statistics, they can never be fully objective. At least now you know my motives.
By applying this transformation on all Bud Spencer and Terence Hill movies, I can now count them and apply the apriori algorithm.
Which Actor Is The Hitting… I Mean Driving Force?
The algorithm consists of 4 steps, which I’ll explain straight away with our example (no boring math theory here, please read on):
Support
What is the share of movies with a good rating for either Bud Spencer and Terence Hill? This includes movies starring both actors as well.
→ The support for Terence Hill is slightly higher, as he has more movies with a good rating than Bud Spencer.
Confidence
Confidence (Bud → Terence): What is the probability that movies starring Terence Hill got a good rating given that Bud Spencer also starred in it?
Confidence (Terence → Bud): What is the probability that movies starring Bud Spencer got a good rating given that Terence Hill also starred in it?
→ Terence Hill’s confidence is slightly higher than Bud Spencer’s. Whenever a movie starring Bud Spencer got a good rating, the rating was good 61% of the time when Terence Hill also starred in it. Whenever a movie starring Terence Hill got a good rating, the rating was also good 55% of the time when Bud Spencer also starred in it.
Lift
How much does the probability increase that Bud Spencer’s movies get a good rating given that Terence Hill is also starring in it? And the other way round for Terence Hill?
→ The probability of a good rating increases by 21%. The lift is 1.21, which is always the same for both items, or actors in our example.
Conviction
This is where the causality is coming in now!
How likely is it that a movie starring Bud Spencer gets a good rating if Terence Hill is not starring in it? And the other way round?
→ Bud Spencer’s conviction is 1.21 and Terence Hill’s is 1.27. Both are >1, that means both actors benefit from starring together in a movie. Bud Spencer benefits a little bit more from Terence Hill. If the conviction were equal to 1, there would be no relationship. The higher the conviction - there is no limit - the stronger the relationship.
Conclusion
Even though the conviction for both actors is not that much greater than 1, I would still conclude that people prefer seeing both actors together in a movie.
Which Bud Spencer or Terence Hill movie is your favourite? Do you have another favourite team-up in the world of film? Let me know!
Hi, I'm Nadine. I empower people through comprehensive training and coaching in data analysis and mathematical modeling, equipping them with the tools and knowledge to excel professionally. If you’re interested in finding out how I can support you in your learning journey, book a free 30-minutes introduction call with me right here, send an email to nadine@mathemalytics.com, or connect with me on LinkedIn.
If you enjoy my blog posts and would like to support my work, you can buy me a coffee.