In credit card fraud analysis, most datasets are highly skewed since the number of valid transactions far outweighs the number of fraudulent transactions (in most cases, the ratio of valid transactions to fraudulent transactions could be as skewed as 98% to 2%). Without fitting a classification model to the training data if we simply predict any unknown transaction as a valid transaction, we would be correct 98% of the time.
Category: MACHINE LEARNING
There are two aspects to OCR (Optical Character Recognition) correction, first one is that if the OCR error is consistent, i.e. makes the same mistakes uniformly across multiple documents, then assuming that the training documents will be almost similar to what we are going to expect at run time, then there is probably no need for OCR correction as the OCR will almost certainly make the same mistakes in the […]
With the recent advancements in Deep Learning and Artificial Intelligence, there has been continuous interest among machine learning enthusiasts and data scientists to explore frontiers in artificial intelligence on small to medium scale applications that was probably the realm of high speed supercomputers owned by a few tech giants only a few years ago. Few of such applications are Image and Speech Recognition, Language Translators, Automated Image Descriptions, Detecting Phrases […]
In K-Means algorithm, we are not guaranteed of a global minima since our algorithm converges only to a local minima. The local minima and the number of iterations required to reach the local minima, depends on the selection of the initial set of random centroids. In order to select the initial set of centroids for the K-Means clustering, there are many proposed methods, such as the Scatter and Gather methods, […]
One of the main drawbacks of R is the inefficiency of looping operations. Since R inherently is a functional programming language, many looping operations can be converted into map operations by choosing the appropriate functional forms. Although such a mapping operation speeds up the program, but sometimes we need still better speedups (if we compare similar programs written in C or C++). In such cases, we will see that by […]
Given a supervised classification problem with the set of N training examples along with the class labels , i.e. , we need to build a model to predict the class label for an unseen example. Some of the algorithms we have already encountered and some we will encounter in later posts such as the Logistic Regression, Naive Bayes, Adaboost, Gradient Boosting, KNN, Support Vector Machines, Neural Networks etc. In this […]
In an earlier post, we introduced one of the most widely used optimization technique, the gradient descent and its scalable variant, the Stochastic Gradient Descent. Although the SGD is an efficient and scalable technique to optimize a function, but the drawbacks with both gradient descent and SGD is that they are susceptible to find local optimum. The gradient descent technique is not suited to find local or global optimum with […]
Clustering algorithms comes with lots of challenges. For centroid based clustering algorithms like K-Means, the primary challenges are : Initialising the cluster centroids. Choosing the optimum number of clusters. Evaluating clustering quality in the absence of labels. Reduce dimensionality of data. In this post we will focus on different ways of choosing the optimum number of clusters. The basic idea is to minimize the sum of the within cluster sum […]
Boosting is a general technique by which multiple "weak" classifiers are combined to produce a "super strong" single classifier. The idea behind boosting technique is very simple. Boosting consists of incrementally building a final classifier from an ensemble of classifiers in a way such that the next classifier chosen should be able to perform better on training instances that the current classifier is not able to do.
Given two random variables X and Y, mutual information measures how much knowing one of these variables reduces uncertainty about the other. For example, if X and Y are independent, then knowing X does not give any information about Y and vice-verse, so their mutual information is zero. At the other extreme, if X is completely correlated with Y then all information conveyed by X is also conveyed by Y, […]