In the last two parts of this series, we have been looking at how to design and implement a movie recommendations engine using the MovieLens' 20 million ratings dataset. We have looked at some of the most common and standard techniques out there namely Content based recommendations, Collaborative Filtering and Latent Factor based Matrix Factorization strategy. Clearly CF and MF approaches emerged as the winners due to their accuracy and […]
In the last post, we had started to design a movie recommendation engine using the 20 million ratings dataset available from MovieLens. We started with a Content Based Recommendation approach, where we built a classification/regression model for each user based on the tags and genres assigned to each movie he has rated. The assumption behind this approach is that, the rating that an user has given to a movie depends […]
In this post, we would be looking to design a movie recommendation engine with the MovieLens dataset. We will not be designing the architecture of such a system, but will be looking at different methods by which one can recommend movies to users that minimizes the root mean squared error of the predicted ratings from the actual ratings on a hold out validation dataset.
In the second post of this series we had listed down different vectorization algorithms used in our experiments for representing questions. Representations form the core of our intent clusters, because the assumption is that if a representation algorithm can capture syntactic as well as semantic meaning of the questions well, then if two questions which actually speak of the same intent, will have representations that are very close to each […]
In this post we would be looking at designing a social networking site similar to Twitter. Quite obviously we would not be designing every other feature on the site, but the important ones only. The most important feature on Twitter is the Feed (home timeline and profile timeline). The feeds on twitter drives user engagement and thus it needs to be designed in a scalable way such that it can […]
In this series of posts we will be looking to design a cab hailing service similar to Uber or Ola (in India). We will be mainly concerned about the technical design and challenges and not get into the logistics such as signup and recruitment of drivers, training drivers for customer satisfaction, number of cabs on street and so on. Even for the technical design, we will omit some of the […]
In continuation of my earlier posts on designing an automated question-answering system, in part three of the series we look into how to incorporate feedback into our system. Note that since getting labelled data is an expensive operation from the perspective of our company resources, the amount of feedback from human agents is very low (~ 2-3% of the total number of questions). So obviously with such less labelled data, […]
In this post we will look at the offline implementation architecture. Assuming that, there are currently about a 100 manual agents, each serving somewhere around 60-80 customers (non-unique) a day, i.e. a total of about 8K customer queries each day for our agents. And each customer session has an average of 5 question-answer rounds including statements, greetings, contextual and personal questions. Thus on average we generate 40K client-agent response pairs […]
Natural Language Question Answering system such as chatbots and AI conversational agents requires answering customer queries in an intelligent fashion. Many companies employ manual resources to answer customer queries and complaints. Apart from the high cost factor with employing people, many of the customer queries are repetitive in nature and most of the time, same intents are asked in different tones.