Stokastik

Machine Learning, AI and Programming

Tag: C++11

Building a classification pipeline with C++11, Cython and Scikit-Learn

We have earlier seen how using Cython increases the performance of Python code 50-60x, mostly due to static typing as compared to dynamic typing in pure Python. But we have also seen how one can wrap pure C++ classes and functions with Cython and export them as Python packages with improved speed. The codes we have dealt with so far using Cython were mostly generic modules like generating primes using […]

Continue Reading →

Learning From Unlabelled Data - EM Approach

Accurately labelled data can be a bottleneck in many machine learning problems as they are difficult and expensive to obtain, and even if we obtain some labelled data, the labels might not be 100% accurate. Many startups working in machine learning space resort to crowdsourcing of the labelling task. Inspired by this research paper, I am going to try and use lots of unlabelled data in addition to small amounts of labelled data […]

Continue Reading →

Building a Classification Tree from scratch

In this post I am going to demonstrate an implementation of a classification tree from scratch for multi-label classification. Since most of my work involves working with text classification, hence the classification tree that I am going to demonstrate here has been built, keeping text classification problems in mind. The theoretical explanation about various components and modules about classification tree can be found in this paper. This implementation is not ready-made for […]

Continue Reading →

Building Gradient Boosted Trees from scratch

Gradient Boosted Trees are tree ensemble algorithms similar to Random Forests, but unlike Random Forests where trees are constructed independently from each other, in the gradient boosting algorithm, the tree in each round is boosted based on the errors from the tree in the previous round. Although there are more differences in how GBT reduces the error (bias + variance) compared to RF. In this post, we would be constructing boosted trees using […]

Continue Reading →

Initializing cluster centers with K-Means++

In K-Means algorithm, we are not guaranteed of a global minima since our algorithm converges only to a local minima. The local minima and the number of iterations required to reach the local minima, depends on the selection of the initial set of random centroids. In order to select the initial set of centroids for the K-Means clustering, there are many proposed methods, such as the Scatter and Gather methods, […]

Continue Reading →