Timezone: »
Project Emporia is a recommendation engine for news. Based on the Matchbox technology ( http://research.microsoft.com/apps/pubs/default.aspx?id=79460 ) it uses a Bayesian probabilistic model to learn the preferences of users for recent news stories. When a person visits Project Emporia he can up or down vote each link according to her taste. The Matchbox model is then updated in real time so it can instantly improve its link recommendation. The news stories themselves are mined by crawling various RSS feeds and Twitter. In this way, Project Emporia performs Bayesian inference on more than 100,000,000 data points every day. Another feature of Project Emporia is the automatic classification of links into categories. The classification is based on a recently published classifier ( http://research.microsoft.com/apps/pubs/default.aspx?id=122779 ). More interestingly though, we have developed a pipeline which uses active learning to automatically discover links that cannot reliably be classified. These links are then automatically sent to Amazon Mechanical Turk for labelling, after which we spam filter the results and update the classification model.
Author Information
Jurgen Van Gael (Microsoft)
More from the Same Authors
-
2008 Poster: The Infinite Factorial Hidden Markov Model »
Jurgen Van Gael · Yee Whye Teh · Zoubin Ghahramani -
2008 Spotlight: The Infinite Factorial Hidden Markov Model »
Jurgen Van Gael · Yee Whye Teh · Zoubin Ghahramani