Stokastik

Machine Learning, AI and Programming

Monte Carlo Sampling Techniques

In the last post, we saw how to sample random values from a target probability distribution (both with discrete as well as continuous distributions) using techniques like inverse CDF method, the transformation method and so on. All of the earlier discussed methods falls under the category of Monte Carlo techniques. In this post, we will be discussing some of the other advanced Monte Carlo techniques and their importance in the […]

Continue Reading →

Sampling from Probability Distributions

Often we are required to sample random values from a specified distribution. The specified probability distribution could be either discrete or continuous. With discrete sampling, the generated samples can take only discrete states, for example, a coin toss experiment can either be a heads or a tails, a dice can only come up with result from the set {1, 2, 3, 4, 5, 6} and so on. Whereas with continuous sampling, the generated samples […]

Continue Reading →

Dimensionality Reduction using Restricted Boltzmann Machines

Restricted Boltzmann Machine is an unsupervised machine learning algorithm that is useful for doing dimensionality reduction, classification (deep neural networks), regression, collaborative filtering (for recommendation engines), topic modeling etc. The functionality of RBM's are somewhat similar to PCA (SVD), PLSA, LDA etc., which transforms features from the input data into a lower dimension space, capturing the dependencies between different features. RBM's has also been used successfully in problems involving missing/unobserved data. For […]

Continue Reading →

Classification with Imbalanced Data Sets

In credit card fraud analysis, most datasets are highly skewed since the number of valid transactions far outweighs the number of fraudulent transactions (in most cases, the ratio of valid transactions to fraudulent transactions could be as skewed as 98% to 2%). Without fitting a classification model to the training data if we simply predict any unknown transaction as a valid transaction, we would be correct 98% of the time. Even if we fit a […]

Continue Reading →

Spelling correction on OCR outputs

There are two aspects to OCR (Optical Character Recognition) correction, first one is that if the OCR error is consistent, i.e. makes the same mistakes uniformly across multiple document images, then assuming that the training document images will be almost similar to what we are going to expect at run time, then there is probably no need for OCR correction as the OCR will almost certainly make the same mistakes […]

Continue Reading →

Understanding Convolution for Deep Learning

With the recent advancements in Deep Learning and Artificial Intelligence, there has been continuous interest among machine learning enthusiasts and data scientists to explore frontiers in artificial intelligence on small to medium scale applications that was probably the realm of high speed supercomputers owned by a few tech giants only a few years ago. Few of such applications are Image and Speech Recognition, Language Translators, Automated Image Descriptions, Detecting Phrases […]

Continue Reading →

Initializing cluster centers with K-Means++

K-means clustering is a widely used method for unsupervised learning. Given a set of N unlabelled data points in a d-dimensional space, the objective is to group these points into 'k' clusters, such that the sum of distances of all data points from its cluster centroids is minimized (intuitively, this means that the data points in one cluster are 'similar' in the d-dimensional space). The following objective is to be […]

Continue Reading →

Building a NGramTokenizer with RCpp

Many supervised and unsupervised text classification and categorization problems require the use of N-Grams instead of just "Bag Of Words" as features. For example in sentiment analysis, the word "pleasant" hints at a positive sentiment whereas "not pleasant" hints at a negative sentiment. The advantages of using N-Grams over simple bag of words as features, is that N-grams helps to capture the association between consecutive words, the ordering between words […]

Continue Reading →

Matrix operations to the rescue in R

One of the main drawbacks of R is the inefficiency of looping operations. Since R inherently is a functional programming language, many looping operations can be converted into map operations by choosing the appropriate functional forms. Although such a mapping operation speeds up the program, but sometimes we need still better speedups (if we compare similar programs written in C or C++). In such cases, we will see that by […]

Continue Reading →

Kernels and Support Vector Machines

Given a supervised classification problem with the set of N training examples along with the class labels , i.e. , we need to build a model to predict the class label for an unseen example. Some of the algorithms we have already encountered and some we will encounter in later posts such as the Logistic Regression, Naive Bayes, Adaboost, Gradient Boosting, KNN, Support Vector Machines, Neural Networks etc. In this […]

Continue Reading →