In my previous posts on Big Data and Machine Learning, we explored the core concepts of a recommendation engine using Apache Mahout . We covered how collaborative filtering recommendation engines compares the similarity of users or of items and makes recommendations on this notion of similarity and examining the neighborhood of users. We learned about the interfaces and implementations (of algorithms) available in Apache Mahout. We also demonstrated how a collaborative filtering recommendation engine nearly recommends the same items as a human with contextual information.
In this post, I’ll cover a case study of an application that we at 3Pillar Labs built that recommends fitness products to users based on their fitness data collected through smart sensors.
The essential solution consisted of collecting fitness data for users through smart sensors, finding similar users based on the fitness data, and recommending fitness products to the users. We needed a broad system that would cater to sensors from different manufacturers, and after a market study, we settled on Runkeeper. Runkeeper is a versatile service that supports sensors from Fitbit, Wahoo, Griffin and more. A registered user is also able to add exercise data directly to Runkeeper. Runkeeper makes this data available via the HealthGraph API. We used the following sensors for our application:
The sensors uploaded data to a nearby computer using Bluetooth, and a Runkeeper service running on this computer keeps uploading the data to Runkeeper.
Since the recommendation engine requires some initial data about users liking certain products, we needed a way to generate this seed data. So we built an elaborate rule engine that would generate some keywords based on data obtained from fitness sensors. A suitable example would be the keyword determination for weight data – the rule engine would take a look at the user’s weight data and try to determine if there was an increasing or decreasing trend. If an increasing trend were observed, it would correlate this data with activity data and determine if the user had recorded weight training activities; if this was the case, the keyword could be “gain weight.” We used these keywords to suggest products using the Google Product API, and once few users started liking some of these products, our seed data was generated.
You may recall that a CF recommendation engine can recommend items based on either user similarity or on item similarity. For our recommendation engine, we chose to use the user similarity model since it allowed us to compare fitness trends of similar users (more on that later).
In order to simplify the user experience, we chose not to have a rating system where a user would have to specify a value for expressing preference for a product; instead we deemed a product as preferred, if the user opted to “like” the product. To this end, we used a general Boolean preferences recommendation algorithm. Once a user had expressed a preference for one or more products, our application would generate recommendations from the CF engine.
An additional challenge we faced was that the recommendation data set needed to be updated in real time. Lucky for us, Mahout does the heavy lifting in this regard; the recommender interface provides for a refresh strategy. When it is invoked, it takes care of refreshing all the components right down to the data model. In order to scale the refresh capability, we implemented the data model on MongoDB.
As mentioned above, we wanted to feature fitness trends with similar users. Mahout’s user similarity model maintains a list of similar users for every user known to the recommendation engine. Thus it was a simple matter of asking Mahout for similar users and plotting their fitness trends.
For this application we used a bubble chart to project the calorie loss per week and weight of similar users.
The astute reader will recognize that CF algorithms are not just useful for making recommendations, they can also be used to project data from similar users or items as we did in our application.
We hope this three-part series on building recommendation engines has given you a general sense of developing real world applications. For further reading, you may be interested to know that CF algorithms can also be used for clustering and classification. The other major thread would be applying Hadoop to scale these recommendation algorithms.