Machine Learning, AI and Programming

Tag: Clustering

Fast Nearest Neighbour Search - Product Quantization

In this on-going series of fast nearest neighbor search algorithms, we are going to look at Product Quantization technique in this post. In the last post, we had looked at KD-Trees, which are effecient data structures for low dimensional embeddings and also in higher dimensions provided that the nearest neighbor search radius is small enough to prevent backtracking. Product Quantization or PQ does not create any tree indexing data structure […]

Continue Reading →

Designing Movie Recommendation Engines - Part I

In this post, we would be looking to design a movie recommendation engine with the MovieLens dataset. We will not be designing the architecture of such a system, but will be looking at different methods by which one can recommend movies to users that minimizes the root mean squared error of the predicted ratings from the actual ratings on a hold out validation dataset.

Continue Reading →

Understanding Variational AutoEncoders

This post is motivated from trying to find better unsupervised vector representations for questions pertaining to the queries from customers to our agents. Earlier, in a series of posts, we have seen how to design and implement a clustering framework for customer questions, so that we can efficiently find the most appropriate answer and at the same time find out most similar questions to recommend to the customer.

Continue Reading →

Designing an Automated Question-Answering System - Part I

Natural Language Question Answering system such as chatbots and AI conversational agents requires answering customer queries in an intelligent fashion. Many companies employ manual resources to answer customer queries and complaints. Apart from the high cost factor with employing people, many of the customer queries are repetitive in nature and most of the time, same intents are asked in different tones.

Continue Reading →

Initializing cluster centers with K-Means++

In K-Means algorithm, we are not guaranteed of a global minima since our algorithm converges only to a local minima. The local minima and the number of iterations required to reach the local minima, depends on the selection of the initial set of random centroids. In order to select the initial set of centroids for the K-Means clustering, there are many proposed methods, such as the Scatter and Gather methods, […]

Continue Reading →