Machine Learning, AI and Programming


Using Word Vectors in Multi-Class Text Classification

Earlier we have seen how instead of representing words in a text document as isolated features (or as N-grams), we can encode them into multidimensional vectors where each dimension of the vector represents some kind semantic or relational similarity with other words in the corpus. Machine Learning problems such as classification or clustering, requires documents to be represented as a document-feature matrix (with TF or TF-IDF weighting), thus we need some […]

Continue Reading →

Designing a Contextual Graphical Model for Words

I have been reading about Word Embedding methods that encode words found in text documents into multi-dimensional vectors. The purpose of encoding into vectors is to give "meaning" to words or phrases in a context. Traditional methods of document classification treat each word in isolation or at-most use a N-gram approach i.e. in vector space, the words are represented as one-hot vectors which are sparse and do not convey any meaning whereas […]

Continue Reading →

Understanding Word Vectors and Word2Vec

Quite recently I have been exploring the Word2Vec tool, for representing words in text documents as vectors. I got the initial ideas about word2vec utility from Google's code archive webpage. The idea behind coming up with this kind of utility caught my interest and later I went on to read the following papers by Mikolov et. al. to better understand the algorithm and its implementation. Efficient Estimation of Word Representations […]

Continue Reading →

Learning From Unlabelled Data - EM Approach

Accurately labelled data can be a bottleneck in many machine learning problems as they are difficult and expensive to obtain, and even if we obtain some labelled data, the labels might not be 100% accurate. Many startups working in machine learning space resort to crowdsourcing of the labelling task. Inspired by this research paper, I am going to try and use lots of unlabelled data in addition to small amounts of labelled data […]

Continue Reading →

Building a Classification Tree from scratch

In this post I am going to demonstrate an implementation of a classification tree from scratch for multi-label classification. Since most of my work involves working with text classification, hence the classification tree that I am going to demonstrate here has been built, keeping text classification problems in mind. The theoretical explanation about various components and modules about classification tree can be found in this paper. This implementation is not ready-made for […]

Continue Reading →

Logistic Regression Analysis with Examples using R

In the last post we had seen how to perform a linear regression on a dataset with R. We had also seen how to interpret the outcome of the linear regression model and also analyze the solution using the R-Squared test for goodness of fit of the model, the t-test for significance of each variable in the model, F-statistic for significance of the overall model, Confidence intervals for the variable […]

Continue Reading →

Building Gradient Boosted Trees from scratch

Gradient Boosted Trees are tree ensemble algorithms similar to Random Forests, but unlike Random Forests where trees are constructed independently from each other, in the gradient boosting algorithm, the tree in each round is boosted based on the errors from the tree in the previous round. Although there are more differences in how GBT reduces the error (bias + variance) compared to RF. In this post, we would be constructing boosted trees using […]

Continue Reading →

Monte Carlo Sampling Techniques

In the last post, we saw how to sample random values from a target probability distribution (both with discrete as well as continuous distributions) using techniques like inverse CDF method, the transformation method and so on. All of the earlier discussed methods falls under the category of Monte Carlo techniques. In this post, we will be discussing some of the other advanced Monte Carlo techniques and their importance in the […]

Continue Reading →

Sampling from Probability Distributions

Often we are required to sample random values from a specified distribution. The specified probability distribution could be either discrete or continuous. With discrete sampling, the generated samples can take only discrete states, for example, a coin toss experiment can either be a heads or a tails, a dice can only come up with result from the set {1, 2, 3, 4, 5, 6} and so on. Whereas with continuous sampling, the generated samples […]

Continue Reading →