Stokastik

Machine Learning, AI and Programming

Using Word Vectors in Multi-Class Text Classification

Earlier we have seen how instead of representing words in a text document as isolated features (or as N-grams), we can encode them into multidimensional vectors where each dimension of the vector represents some kind semantic or relational similarity with other words in the corpus. Machine Learning problems such as classification or clustering, requires documents to be represented as a document-feature matrix (with TF or TF-IDF weighting), thus we need some […]

Continue Reading →

Designing a Contextual Graphical Model for Words

I have been reading about Word Embedding methods that encode words found in text documents into multi-dimensional vectors. The purpose of encoding into vectors is to give "meaning" to words or phrases in a context. Traditional methods of document classification treat each word in isolation or at-most use a N-gram approach i.e. in vector space, the words are represented as one-hot vectors which are sparse and do not convey any meaning whereas […]

Continue Reading →

The Cost​ of my Uber Ride

Quite often, I take the Uber Pool ride to my office in the morning hours of Bangalore's heavy traffic. Although I get a bit disappointed every time a ride request comes to the driver (I prefer to take the sit beside the driver) but given that pooling is more economical and the thought that I am helping Bangalore traffic, makes me feel better. But I do get frustrated, when the […]

Continue Reading →

From EM to Embeddings

Expectation Maximization is a quite an old tool/concept in the Machine Learning domain. Although it is an old tool but it took me quite some time to grasp the concept and the intuition behind it given that most tutorials and articles out there explain it with heavy mathematical equations. But eventually I found out that, the maths behind the intuition is pretty simple to understand, only the long equations might […]

Continue Reading →

Understanding Word Vectors and Word2Vec

Quite recently I have been exploring the Word2Vec tool, for representing words in text documents as vectors. I got the initial ideas about word2vec utility from Google's code archive webpage. The idea behind coming up with this kind of utility caught my interest and later I went on to read the following papers by Mikolov et. al. to better understand the algorithm and its implementation. Efficient Estimation of Word Representations […]

Continue Reading →

Learning From Unlabelled Data - EM Approach

Accurately labelled data can be a bottleneck in many machine learning problems as they are difficult and expensive to obtain, and even if we obtain some labelled data, the labels might not be 100% accurate. Many startups working in machine learning space resort to crowdsourcing of the labelling task. Inspired by this research paper, I am going to try and use lots of unlabelled data in addition to small amounts of labelled data […]

Continue Reading →

Building a Classification Tree from scratch

In this post I am going to demonstrate an implementation of a classification tree from scratch for multi-label classification. Since most of my work involves working with text classification, hence the classification tree that I am going to demonstrate here has been built, keeping text classification problems in mind. The theoretical explanation about various components and modules about classification tree can be found in this paper. This implementation is not ready-made for […]

Continue Reading →

Logistic Regression Analysis with Examples using R

In the last post we had seen how to perform a linear regression on a dataset with R. We had also seen how to interpret the outcome of the linear regression model and also analyze the solution using the R-Squared test for goodness of fit of the model, the t-test for significance of each variable in the model, F-statistic for significance of the overall model, Confidence intervals for the variable […]

Continue Reading →

Building Gradient Boosted Trees from scratch

Gradient Boosted Trees are tree ensemble algorithms similar to Random Forests, but unlike Random Forests where trees are constructed independently from each other, in the gradient boosting algorithm, the tree in each round is boosted based on the errors from the tree in the previous round. Although there are more differences in how GBT reduces the error (bias + variance) compared to RF. In this post, we would be constructing boosted trees using […]

Continue Reading →