Stokastik

Machine Learning, AI and Programming

Neural Networks as a Function Approximator

For the past few days, I have been reading quite a lot of research papers, articles and blogs related to artificial neural networks and its transition towards deep learning. With so many different concepts and experimentations around selecting the best neural network architecture for a problem, selecting the best learning hyper-parameters, selecting the best optimization algorithm and so on, it becomes a little overwhelming to connect all the dots together […]

Continue Reading →

Building a Neural Network from scratch in Python

In this post I am going to build an artificial neural network from scratch. Although there exists lots of advanced libraries for neural networks written using a variety of programming languages, the idea is not to re-invent the wheel but to understand what are the components required to make a workable neural network. Although a full-fledged industrial scale neural network might require a lot of research and experimentations with the datasets. Building a simple […]

Continue Reading →

Building an Incremental Named Entity Recognizer System

In the last post, we saw how to train a system to identify Part Of Speech tags for words in sentences. In essence we found out that discriminative models such as Neural Networks and Conditional Random Fields, outperforms other methods by 5-6% in prediction accuracy. In this post, we will look at another common problem in Natural Language Processing, known as the Named Entity Recognition (NER in short). The problem […]

Continue Reading →

Building a POS Tagger with Python NLTK and Scikit-Learn

In this post we are going to understand about Part-Of-Speech Taggers for the English Language and look at multiple methods of building a POS Tagger with the help of the Python NLTK and scikit-learn libraries. The available methods ranges from simple regular expression based taggers to classifier based (Naive Bayes, Neural Networks and Decision Trees) and then sequence model based (Hidden Markov Model, Maximum Entropy Markov Model and Conditional Random […]

Continue Reading →

Understanding Conditional Random Fields

Given a sequence of observations, many machine learning tasks require us to label each observation in the sequence with a corresponding class (or named entity) such that the overall likelihood of the labelling is maximized. For example, given a english sentence, i.e. a sequence of words, label each word with a Part-Of-Speech tag, such that the combined POS tag of the sentence is optimum. "Machine Learning is a field of […]

Continue Reading →

Common and not so common Machine Learning Questions and Answers (Part III)

Which loss function is better for neural network training, logistic loss or the squared error loss and why ? The loss function depends mostly on the type of problem we are solving and the activation function. In case of regression where the values from the output units are normally distributed, the squared error is the preferred loss function whereas in a classification problem the output units follows the multinomial distribution, the […]

Continue Reading →

Optimization Methods for Deep Learning

In this post I am going to give brief overview of few of the common optimization techniques used in training a neural network from simple classification problems to deep learning. As we know, the critical part of a classification algorithm is to optimize the loss (objective) function in order to learn the correct parameters of the model. The type of the objective function (convex, non-convex, constrained, unconstrained etc.) along with […]

Continue Reading →

Common and not so common Machine Learning Questions and Answers (Part II)

Why does negative sampling strategy works during training of word vectors ? In word2vec training the objective is to have semantically and syntactically similar words close to each other in terms of the cosine distance between their word vectors. In the skip-gram architecture, the probability of a word 'c' being predicted as a context word at the output node, given the target word 'w' and the input and output weights […]

Continue Reading →

Generative vs. Discriminative Spell Corrector

We have earlier seen two approaches of doing spelling corrections in text documents. Most of the spelling errors encountered are in either user generated contents or OCR outputs of document images. Presence of spelling errors introduce noise in data and as a result impact of important features gets diluted. Although the methods explained are  different in how they are implemented but theoretically both of them work on the same principle. […]

Continue Reading →

Common and not so common Machine Learning Questions and Answers (Part I)

What is the role of activation function in Neural Networks ? The role of the activation function in a neural network is to produce a non-linear decision boundary via non-linear combinations of the weighted inputs. A neural network classifier is essentially a logistic regression classifier without the hidden layers. The non-linearity to a neural network is added by the hidden layers using a sigmoid or similar activation functions.

Continue Reading →