Stokastik

Machine Learning, AI and Programming

LeetCode : Swim in Rising Water

Problem Statement Solution : One possible solution is to see whether for each water level, he can swim from (0, 0) to (N-1, N-1) without waiting anywhere in between. To check whether he can swim from (0, 0) to (N-1, N-1) i.e. there exists a path, we can use DFS or BFS to traverse the matrix. He can swim from point A to B if the height of B is less […]

Continue Reading →

Neural Networks as a Function Approximator

For the past few days, I have been reading quite a lot of research papers, articles and blogs related to artificial neural networks and its transition towards deep learning. With so many different methods of selecting the best neural network architecture for a problem, the optimal hyper-parameters, the best optimization algorithm and so on, it becomes a little overwhelming to connect all the dots together when we ourselves start to […]

Continue Reading →

Building a Neural Network from scratch in Python

In this post I am going to build an artificial neural network from scratch. Although there exists a lot of advanced neural network libraries written using a variety of programming languages, the idea is not to re-invent the wheel but to understand what are the components required to make a workable neural network. A full-fledged industrial scale neural network might require a lot of research and experimentation with the dataset. Building a simple […]

Continue Reading →

Building an Incremental Named Entity Recognizer System

In the last post, we saw how to train a system to identify Part Of Speech tags for words in sentences. In essence we found out that discriminative models such as Neural Networks and Conditional Random Fields, outperforms other methods by 5-6% in prediction accuracy. In this post, we will look at another common problem in Natural Language Processing, known as the Named Entity Recognition (NER in short). The problem […]

Continue Reading →

Building a POS Tagger with Python NLTK and Scikit-Learn

In this post we are going to understand about Part-Of-Speech Taggers for the English Language and look at multiple methods of building a POS Tagger with the help of the Python NLTK and scikit-learn libraries. The available methods ranges from simple regular expression based taggers to classifier based (Naive Bayes, Neural Networks and Decision Trees) and then sequence model based (Hidden Markov Model, Maximum Entropy Markov Model and Conditional Random […]

Continue Reading →

Understanding Conditional Random Fields

Given a sequence of observations, many machine learning tasks require us to label each observation in the sequence with a corresponding class (or named entity) such that the overall likelihood of the labelling is maximized. For example, given a english sentence, i.e. a sequence of words, label each word with a Part-Of-Speech tag, such that the combined POS tag of the sentence is optimum. "Machine Learning is a field of […]

Continue Reading →

Machine Learning Interview Questions and Answers (Part III)

Which loss function is better for neural network training, logistic loss or the squared error loss and why ? The loss function depends mostly on the type of problem we are solving and the activation function. In case of regression where the values from the output units are normally distributed, the squared error is the preferred loss function whereas in a classification problem the output units follows the multinomial distribution, the […]

Continue Reading →

Optimization Methods for Deep Learning

In this post I am going to give brief overview of few of the common optimization techniques used in training a neural network from simple classification problems to deep learning. As we know, the critical part of a classification algorithm is to optimize the loss (objective) function in order to learn the correct parameters of the model. The type of the objective function (convex, non-convex, constrained, unconstrained etc.) along with […]

Continue Reading →

Machine Learning Interview Questions and Answers (Part II)

Why does negative sampling strategy works during training of word vectors ? In word2vec training the objective is to have semantically and syntactically similar words close to each other in terms of the cosine distance between their word vectors. In the skip-gram architecture, the probability of a word 'c' being predicted as a context word at the output node, given the target word 'w' and the input and output weights […]

Continue Reading →

Generative vs. Discriminative Spell Corrector

We have earlier seen two approaches of doing spelling corrections in text documents. Most of the spelling errors encountered are in either user generated contents or OCR outputs of document images. Presence of spelling errors introduce noise in data and as a result impact of important features gets diluted. Although the methods explained are  different in how they are implemented but theoretically both of them work on the same principle. […]

Continue Reading →