Forza Prediction Kit is an experimental project which locally stores the data we already log to AWS, and queries it to do what might be machine learning (I have no idea what machine learning really is). Before we dive into what this means, here are two use-cases for Forza Prediction Kit™:
Based on the recently viewed match tabs we can predict which tab a user will want to open the next time they open a match. Allowing the user to quickly check alll the lineups for upcoming games, or all the highlights for recent Champions League games.
Based on which matches a user has recently viewed we can predict non-favourite matches that are relevant for the user.
Neither of these features store any extra data. Instead prediction kit stores everything we already log to AWS locally and queries it, and then lets the results vote on which of the possible alternatives we should show.
It sounds a bit weird, but it’s not that complicated. Let’s dive into how the two above examples work. You might be able to follow along with just some basic JSON knowledge.
When we open a match we have five alternatives for which tab we should show. We could show the event list, statistics, standings, media or lineups. Today we show the event list unless the user opens the match from a notification or a link.
To predict which tab might be the most relevant for the user we’d look at the tabs the user has opened previously, say the tabs they had open for the last 10 matches they closed. The last 10 since matches closed before that are probably forgoten by the user, and we’re only looking at the last tab open when the match is closed to avoid the noise from switching around in the same match. view_loaded is something we already log to AWS and it has this information, which means it’s something prediction kit stores and can query.
We now have five alternatives to pick from, and ten events from the user’s history to decide with. The next step is to let each event vote on each alternative. The basic idea is that an viewed lineup event will vote on the lineup alternative. So if you viewed the lineups 5 times, the event list 3 times and statistics 2 times they will get 5, 3 and 2 votes respectively.
We are not done yet, though. The most recently event is more important in predicting which alternative to pick than the event that happened before it, and the same goes for the event that happened before the second, and so forth. To solve this we give each event’s vote a weight based on the order (x) the events happened.
The specific function is picked as it gives the most recent event a heavy vote, then sharply falls and nicely flattens.
We also want it to be easy to come back to the event list, so votes for the event list are multiplied by 1.5, but that’s it – we have now predicted a tab that the user would likely have switched to.
There are of course a lot of potential improvements, but I think this already works rather well. Note that we aren’t just restoring the previous state for the next match – we predicted a state the user wants based on their recent history.
If you are curios the following is the query I’ve used. Hopefully it should look somewhat familiar if you’ve mucked around with BigQuery (t1 is the viewed tab and t2 is what was viewed afterwards, we are looking for “match info” views followed by a loading a non “match info” view):
SELECT t1.page, t2.page as next_page
FROM (viewloaded t1, viewloaded t2)
WHERE t1.rowid = t2.rowid-1
AND t1.page LIKE 'MATCH INFO%'
AND NOT t2.page LIKE 'MATCH INFO%'
ORDER BY t1.rowid DESC
LIMIT 10
Suggesting matches for the user might seem like a very different task, but it’s quite similar. To begin with we have a much larger set of alternatives – all matches being played on a certain date that aren’t in the user’s favourites view. To predict which matches are interesting for the user we’ll look at the matches they have viewed before (again, viewed_match is something we already log so predict kit can query it).
Once we have the matches a user recently viewed we will again let each event (viewed_match) vote on each alternative (a non-favourite match). This time the voting is a bit more complicated as we have to consider the match, the teams and the tournament. My simple implementation could be expanded on a lot. As I’ve implemented it a full vote is given if the viewed_match is the match being voted on, half a vote is given if either the tournament or one of the teams is the same.
So a Barcelona – PSG (Champions League) event would give a full vote the Barcelona – PSG (Champions League) match, and half a vote to Monaco FC – Man City (Champions League) or Real Madrid – Barcelona (La Liga). This means that matches for the teams and competitions you’ve viewed would be suggested, and that specific matches you’ve viewed would be easy to find again.
Again we’d apply a weight based on the event order to make newer events count more then older events. This time with a less sharp initial drop as the most recently viewed match isn’t as strong a predictor as the most recently viewed tab. And again, that’s it – we have now predicted matches that the user would likely be interested in.
Hopefully the key takeaway from reading this is that it’s not necessarily hard to implement features that uses predictions based on the user’s history. The sky’s the limit on what it can be used for. Search suggestions? Suggesting the user removes stale favourites? Smart ads?
I also want to note that the reason I’ve used the same data store as AWS/BigQuery is that I hope it could allow a workflow where you discover something interesting in BigQuery and then be able to implement it quickly in the app.
If you want to test the two features I’ve implemented you’ll find them on the frz-prediction-kit branch. No time has been spent on performance though, and the database will slowly but surely grow and take up all your space. Again, it’s an experimental project.
This project was inspired by rambling to Gustav, and then remembering the essay Magic Ink by Bret Victor. I’ve recommended reading it before and I’ll do it again. To finish up, here’s another video: