# Stokastik

## Logistic Regression Analysis with Examples using R

In the last post we had seen how to perform a linear regression on a dataset with R. We had also seen how to interpret the outcome of the linear regression model and also analyze the solution using the R-Squared test for goodness of fit of the model, the t-test for significance of each variable in the model, F-statistic for significance of the overall model, Confidence intervals for the variable […]

## Linear Regression Analysis with Examples using R

In this post we are going to solve linear regression problems using R and analyze the solutions. We will be looking at multiple linear regression examples, in which the output variable is modeled as a function of more than one input variables, i.e. for the i-th example, the output is assumed to be of the following form :

## Building Gradient Boosted Trees from scratch

Gradient Boosted Trees are tree ensemble algorithms similar to Random Forests, but unlike Random Forests where trees are constructed independently from each other, in the gradient boosting algorithm, the tree in each round is boosted based on the errors from the tree in the previous round. Although there are more differences in how GBT reduces the error (bias + variance) compared to RF. In this post, we would be constructing boosted trees using […]

## Monte Carlo Sampling Techniques

In the last post, we saw how to sample random values from a target probability distribution (both with discrete as well as continuous distributions) using techniques like inverse CDF method, the transformation method and so on. All of the earlier discussed methods falls under the category of Monte Carlo techniques. In this post, we will be discussing some of the other advanced Monte Carlo techniques and their importance in the […]

## Sampling from Probability Distributions

Often we are required to sample random values from a specified distribution. The specified probability distribution could be either discrete or continuous. With discrete sampling, the generated samples can take only discrete states, for example, a coin toss experiment can either be a heads or a tails, a dice can only come up with result from the set {1, 2, 3, 4, 5, 6} and so on. Whereas with continuous sampling, the generated samples […]

## Dimensionality Reduction using Restricted Boltzmann Machines

Restricted Boltzmann Machine is an unsupervised machine learning algorithm that is useful for doing dimensionality reduction, classification (deep neural networks), regression, collaborative filtering (for recommendation engines), topic modeling etc. The functionality of RBM's are somewhat similar to PCA (SVD), PLSA, LDA etc., which transforms features from the input data into a lower dimension space, capturing the dependencies between different features. RBM's has also been used successfully in problems involving missing/unobserved data. For […]

## Classification with Imbalanced Data Sets

In credit card fraud analysis, most datasets are highly skewed since the number of valid transactions far outweighs the number of fraudulent transactions (in most cases, the ratio of valid transactions to fraudulent transactions could be as skewed as 98% to 2%). Without fitting a classification model to the training data if we simply predict any unknown transaction as a valid transaction, we would be correct 98% of the time.

## Spelling correction on OCR outputs

There are two aspects to OCR (Optical Character Recognition) correction, first one is that if the OCR error is consistent, i.e. makes the same mistakes uniformly across multiple documents, then assuming that the training documents will be almost similar to what we are going to expect at run time, then there is probably no need for OCR correction as the OCR will almost certainly make the same mistakes in the […]

## Understanding Convolution for Deep Learning

With the recent advancements in Deep Learning and Artificial Intelligence, there has been continuous interest among machine learning enthusiasts and data scientists to explore frontiers in artificial intelligence on small to medium scale applications that was probably the realm of high speed supercomputers owned by a few tech giants only a few years ago. Few of such applications are Image and Speech Recognition, Language Translators, Automated Image Descriptions, Detecting Phrases […]

## Initializing cluster centers with K-Means++

In K-Means algorithm, we are not guaranteed of a global minima since our algorithm converges only to a local minima. The local minima and the number of iterations required to reach the local minima, depends on the selection of the initial set of random centroids. In order to select the initial set of centroids for the K-Means clustering, there are many proposed methods, such as the Scatter and Gather methods, […]