Stokastik

Machine Learning, AI and Programming

Convex Optimization in Machine Learning

In an earlier post, we introduced one of the most widely used optimization technique, the gradient descent and its scalable variant, the Stochastic Gradient Descent. Although the SGD is an efficient and scalable technique to optimize a function, but the drawbacks with both gradient descent and SGD is that they are susceptible to find local optimum. The gradient descent technique is not suited to find local or global optimum with […]

Continue Reading →

Selecting the optimum number of clusters

The primary difficulties in clustering algorithms are Choosing the optimum number of clusters and Initializing the cluster centroids. The problem of centroid initialization is solved in a different post (K-Means++ is an initialization algorithm that leads to stable clusters compared to random initialization). In this post we will focus on different ways of choosing the optimum number of clusters. The basic idea is to minimize the sum of the within […]

Continue Reading →

Text Classification with Adaboost

Boosting is a general technique by which multiple "weak" classifiers are combined to produce a "super strong" single classifier. The idea behind boosting technique is very simple. Boosting consists of incrementally building a final classifier from an ensemble of classifiers in a way such that the next classifier chosen should be able to perform better on training instances that the current classifier is not able to do well. In AdaBoost, the […]

Continue Reading →

Feature Selection with Mutual Information

Given two random variables X and Y, mutual information measures how much knowing one of these variables reduces uncertainty about the other. For example, if X and Y are independent, then knowing X does not give any information about Y and vice-verse, so their mutual information is zero. At the other extreme, if X is completely correlated with Y then all information conveyed by X is also conveyed by Y, […]

Continue Reading →

Solving the Bitcoin scalability problem

The blockchain is a gossip protocol whereby all state modifications to the ledger are broadcast to all participants. It is through this “gossip protocol” that consensus of the state, everyone’s balances, is agreed upon. If each node in the bitcoin network must know about every single transaction that occurs globally, that may create a significant drag on the ability of the network to encompass all global financial transactions. According to […]

Continue Reading →

How does Bitcoins work ?

When we do online transactions on our favorite e-commerce website, we pass the credit/debit card information to a payment gateway or the third party merchant (issuing the credit card) over a secure connection (HTTPS). The gateway or the merchant then validates this information and encrypts the transaction data and passes on to the issuing bank for clearance, after which the amount is credited to the seller account. One can quite obviously see that it is an […]

Continue Reading →

The Future lies with Decentralization, or does it ?

Although Bitcoins and other Cryptocurrencies are hailed as the greatest revolution in financial technology, since it is a new although a revolutionary concept, financial and economic regulatory bodies worldwide are still skeptical about it (security issues, frauds, money laundering, criminal activity fundings etc.) and thus only a handful of merchants world over accept them. But the good news is that more people are starting to feel confident about it. It […]

Continue Reading →

How cryptocurrency taught me a better concept of "money"

First of all let me admit that I do not have a formal economics or finance education and until sometimes back, like many others I used to think that money means the Rs. 100 or Rs. 500 currency note that we exchange with a business owner to purchase something of value to us. I did not think over why a paper note with some value printed on it could purchase me […]

Continue Reading →

Expectation Maximization with an Example

In the last post, we introduced a technique called the Maximum Likelihood Estimation (MLE) to estimate unknown parameters of a probability distribution given a set of observations. Although it is a very useful technique, but it assumes that all information about the observation is available to us. Consider the example of a two coin toss : "Given two coins A and B, with probability of heads being 'p' and 'q' […]

Continue Reading →

Maximum Likelihood Estimation

Observations from a probability distribution, depends on the parameters of that model. For example, given an unbiased coin with equal probability of landing heads as well as tails, what is probability of observing the sequence "HHTH". Our knowledge from probability theory says that since the toss of a coin follows the binomial distribution, the probability of the observation should be 0.54 = 0.0625, but what if the coin was biased and the […]

Continue Reading →