In the last post, we had started to design a movie recommendation engine using the 20 million ratings dataset available from MovieLens. We started with a Content Based Recommendation approach, where we built a classification/regression model for each user based on the tags and genres assigned to each movie he has rated. The assumption behind this approach is that, the rating that an user has given to a movie depends […]
Category: PROBLEM SOLVING
In this post, we would be looking to design a movie recommendation engine with the MovieLens dataset. We will not be designing the architecture of such a system, but will be looking at different methods by which one can recommend movies to users that minimizes the root mean squared error of the predicted ratings from the actual ratings on a hold out validation dataset.
Problem Statement Solution : This one looks like a very simple problem at first glance but I found it to be quite tricky during implementation. The straightforward solution is to pre-compute the prefix sums S(i), i.e. the sum of all integers from 0 to i-th index for all possible i, and then compute all possible range sums S(i, j), which is the sum of all integers from index i to […]
Problem Statement Solution : This is an interesting problem, not because it is difficult or tricky to find an efficient solution, but there are multiple approaches and each approach is dependent on the problem definition and problem test cases. Identifying words where the space between them has been mistakenly omitted is a classic problem is text processing because this is one way by which search engines such as Google provide […]
Problem Statement Solution : Let's try to build a 'bad' solution first. By 'bad', I mean the approach may not be the most optimal but will return correct results every time. One such approach is to list down all possible substrings and count the unique letters in each of them and then take their sum. This approach is perfectly reasonable approach but why it is not optimal ?
Problem Statement Solution : One approach that uses O(n) extra space, is to store for each node N, the pointer to the nodes with minimum and maximum values in the sub-tree rooted at N. Let's denote the minimum node rooted at N by N.min and the maximum node by N.max. Then for a sub-tree rooted at N, the sub-tree has a "defect", if : N.val < N.left.max.val and/or N.val > N.right.min.val […]
Problem Statement Solution : Observe that one can switch from one route to another route, if both the routes have at-least one stop in common. Starting with the source stop S, there could be multiple possible bus routes R that includes this stop. Thus for every possible route R that include the stop S, do a Breadth First Search to all possible routes R' reachable from this route. R' can […]
This post is motivated from trying to find better unsupervised vector representations for questions pertaining to the queries from customers to our agents. Earlier, in a series of posts, we have seen how to design and implement a clustering framework for customer questions, so that we can efficiently find the most appropriate answer and at the same time find out most similar questions to recommend to the customer.
Problem Statement Solution : A simple brute force approach would require for each pair of points in the matrix, whether the points form the top left and bottom right corner of a rectangle (all 1's) and compute its area. Since there are O(N2) points, thus there are O(N4) pairs of points. Checking whether there is a rectangle between them and computing its area takes O(N2) comparisons. Thus the total run-time is O(N6), […]
Problem Statement Solution : There were many different approaches that were coming to my mind while beginning to solve the problem, but all of them were non-linear space/time complexities. One thing that I realized was that, if there was some generic O(N) approach that enables us to find difference between any two consecutive numbers in the sorted array, then sorting the numbers itself would be linear. But that is not […]