Machine Learning, AI and Programming

Selecting the optimum number of clusters

Clustering algorithms comes with lots of challenges. For centroid based clustering algorithms like K-Means, the primary challenges are : Initialising the cluster centroids. Choosing the optimum number of clusters. Evaluating clustering quality in the absence of labels. Reduce dimensionality of data. In this post we will focus on different ways of choosing the optimum number of clusters. The basic idea is to minimize the sum of the within cluster sum […]

Continue Reading →

Text Classification with Adaboost

Boosting is a general technique by which multiple "weak" classifiers are combined to produce a "super strong" single classifier. The idea behind boosting technique is very simple. Boosting consists of incrementally building a final classifier from an ensemble of classifiers in a way such that the next classifier chosen should be able to perform better on training instances that the current classifier is not able to do.

Continue Reading →

Feature Selection with Mutual Information

Given two random variables X and Y, mutual information measures how much knowing one of these variables reduces uncertainty about the other. For example, if X and Y are independent, then knowing X does not give any information about Y and vice-verse, so their mutual information is zero. At the other extreme, if X is completely correlated with Y then all information conveyed by X is also conveyed by Y, […]

Continue Reading →

Expectation Maximization with an Example

In the last post, we introduced a technique called the Maximum Likelihood Estimation (MLE) to estimate unknown parameters of a probability distribution given a set of observations. Although it is a very useful technique, but it assumes that all information about the observation is available to us. Consider the example of a two coin toss : "Given two coins A and B, with probability of heads being 'p' and 'q' […]

Continue Reading →

Maximum Likelihood Estimation

Observations from a probability distribution, depends on the parameters of that model. For example, given an unbiased coin with equal probability of landing heads as well as tails, what is probability of observing the sequence "HHTH". Our knowledge from probability theory says that since the toss of a coin follows the binomial distribution, the probability of the observation should be 0.54 = 0.0625, but what if the coin was biased and the […]

Continue Reading →