Stokastik

Machine Learning, AI and Programming

Designing a Cab Hailing Service like Uber - MatchMaker

In this series of posts we will be looking to design a cab hailing service similar to Uber or Ola (in India). We will be mainly concerned about the technical design and challenges and not get into the logistics such as signup and recruitment of drivers, training drivers for customer satisfaction, number of cabs on street and so on. Even for the technical design, we will omit some of the […]

Continue Reading →

LeetCode : Maximal Rectangle

Problem Statement Solution : A simple brute force approach would require for each pair of points in the matrix, whether the points form the top left and bottom right corner of a rectangle (all 1's) and compute its area. Since there are O(N2) points, thus there are O(N4) pairs of points. Checking whether there is a rectangle between them and computing its area takes O(N2) comparisons. Thus the total run-time is O(N6), […]

Continue Reading →

Designing a Question-Question Similarity Framework - Part III

In continuation of my earlier posts on designing a question-question similarity framework, in part three of the series we look into how to incorporate limited amount of supervised feedback into our system. Note that since getting labelled data is an expensive operation from the perspective of our company resources, the amount of supervised feedback from human agents is very low (~ 2-3% of the total number of questions). So obviously […]

Continue Reading →

Using KD-Tree For Nearest Neighbor Search

This post is branched from my earlier posts on designing a question-question similarity system. In the first of those posts, I discussed the importance of speed of retrieval of most similar questions from the training data, given a question asked by a user in an online system. We designed few strategies, such as the HashMap based retrieval mechanism. The HashMap based retrieval assumes that at-least one word between the most […]

Continue Reading →

LeetCode : Maximum Gap

Problem Statement Solution : There were many different approaches that were coming to my mind while beginning to solve the problem, but all of them were non-linear space/time complexities. One thing that I realized was that, if there was some generic O(N) approach that enables us to find difference between any two consecutive numbers in the sorted array, then sorting the numbers itself would be linear. But that is not […]

Continue Reading →

Designing a Question-Question Similarity Framework - Part II

In this post we will look at the offline implementation. In our company, there are currently about a 100 manual agents, each serving somewhere around 60-80 customers (non-unique) a day, i.e. a total of about 8K customer queries each day for our agents. And each customer session has an average of 3 question-answer rounds including statements, greetings, contextual and personal questions. Thus on average we generate 24K QA pairs each […]

Continue Reading →

Designing a Question-Question Similarity Framework - Part I

Natural Language Question Answering system such as chatbots and AI conversational agents requires answering customer queries in an intelligent fashion. Many companies employ manual resources to answer customer queries and complaints. Apart from the high cost factor with employing people, many of the customer queries are repetitive in nature and most of the time same intents are asked in different tones.

Continue Reading →

LeetCode : Cutoff Trees for Golf

Problem Statement Solution : My solution in Python for this problem is not accepted as of yet due to "Time Limit Exceeded". There is no trick that reduces the worst case time complexity for this problem. The straightforward approach is to first sort the tree heights smallest to largest (ignoring the tree heights with height < 1) and then for each successive pair (A, B) of sorted tree heights, do […]

Continue Reading →

LeetCode : Largest Rectangle in Histogram

Problem Statement Solution : At first glance, one can say that a brute force approach would take time complexity. That is correct. Let me begin first with the brute force approach explanation. In order to find the largest rectangle possible, scan each of the unit one rectangle (i.e. the heights given in the input array) and then for each such rectangle i compute the maximum possible width it can be […]

Continue Reading →