Ever since Netflix announced that they would pay a million dollars to anyone who could significantly improve their recommendation engine, I’ve wondered what it would take. Now I think I know: a philosopher.
For those of you who have been wondering, dozens of individuals and teams have taken the challenge. They’ve downloaded the 10 million-record preference dataset from Netflix and crunched the numbers earnestly, with varying results. As of this writing, NIPS Reject is in the lead, with a lift over the Netflix algorithm of 6.13 percent. (Tough luck, WXYZ Consulting – you’ve been in the lead for nearly a month, but your 6.11 percent just got topped.)
With an additional 3.87 percentage points yet to be racked up, the road to victory is long — possibly impassable. If I understand my statistical modeling correctly, every unit of progress to that 10 percent goal will be a far tougher slog than the one before it. There clearly needs to be a breakthrough in how the problem is approached if anyone has a chance of winning. A couple of days ago, it occurred to me that the source of this breakthrough might be a better ontology.
Ontology is the study of logically structured categorical models. It helps us understand a particular domain of reality by looking at its essential elements, and especially, how they are interconnected. Because ontology proposes to explain big complicated things, this discipline was honed first by philosophers. More conventional scientists took a little longer to catch up. And as I learned earlier this week, philosophers seem to still have the upper hand. At least, that’s the case with my friend.
A university professor and doctor of philosophy, my friend was filling me in on his latest, fascinating endeavors, as we chatted over Christmas cookies and good Scotch.
When he isn’t teaching at an East Coast university, my friend is doing lucrative consulting work. The computer science company we works with is tapping into a huge demand among Fortune 100 companies for his brand of categorization. They combine this new way of seeing the data with the datamining muscle of leading-edge computer modeling.
He explained that these clients are drowning in data, but these data are in silos that imprison them. It’s hard to tease out the stories they have to tell, and impossible to combine them to make a more complete model of that industry’s “reality.”
My friend has an apparent talent for getting to the essential reality of his clients’ domains. And yes, as you can imagine, he’s doing very well for himself.
I won’t disclose the latest industry with whom he’s involved, but let’s say it’s water desalination. He described how engineers have fed their databases with terabytes of facts, but given little thought, beyond their initial purpose, to the structure of their databases. He helps remedy that with his brand of philosophy.
In a proof of concept meeting with the company, my friend announced to them what he proposed. Ever the showman, he said, “Gentlemen, what we’ll deliver to you is the Metaphysics of Desalination!”
They signed the next day.
Now I wonder if his skills couldn’t be put to this Netflix challenge. I suspect the first question he’d ask is, Why is it so tough? After all, prediction engines for other products, such as books and music, are fairly reliable.
The answer, I suspect, is that films appeal to us on so many more dimensions than songs or written stories. In a cinematic experience, there is just so much information to take in. What’s more, the alchemy of that information — those flickering images projected to give the illusion of movement — seems to take place uniquely in each of our heads.
In order to parse out movies into logical categories, I suspect that the first thing my friend would do is call of more input — perhaps appending data from a rich, relatively impartial source such as the Internet Movie Database. In other words, he’d ask for a second silo to “fuse” with the first.
He would then look at the elements and properties of the films without regard to the reviews of viewers. He would sort out those things that are merely a part of the film, without influence on the viewer, while taking special notice of the items that would likely cause a change in how the other elements are perceived.
It wouldn’t be easy, and it may not be possible. But the reward would be significant. It would also result in a new movie ontology, which is something I and other movie buffs would find endlessly fascinating, the way baseball fans pore over box scores.
As soon as my friend returns with his family to their New England home, I’m going to send this to him, as my own million dollar challenge. Although I’m going to have to scale it back a bit. Maybe another bottle of Scotch.