Machine Learning, AI and Programming


Constructing Binary Search Trees on the GPU

Binary Search Trees (BST) are one of the very important data structures required for effecient searching (lookup) and sorting problems. In machine learning, BSTs or variants of BSTs are used widely for various tasks such as fast nearest neighbor lookups (KD-Trees), Decision Trees etc. BSTs have an average case time complexity of O(logN) for searching where N is the number of possible values to search, but can go upto O(N) […]

Continue Reading →

Sorting numbers in parallel on GPU - Bitonic Sort

Sorting is one of the most common use cases in programming. There are lots of different effecient algorithms existing for sorting and most built-in libraries for specific programming languages have implemented them in a time effecient manner. Merge Sort, Quick Sort, Heap Sort are generic algorithms for sorting real numbers. They have an average-case run-time complexity of O(N*logN) where we are sorting N real numbers. Each of them has a […]

Continue Reading →

Implementing parallel algorithms using CUDA and C++11

Working with deep learning algorithms on GPUs especially the speedups gained in case of CNN and Tensor multiplications with GPU as compared to CPU got me fascinated about learning more about how GPUs work internally, how serial codes (algorithms) can be parallelized on GPU using the CUDA programming paradigm. I have always been a proponent of faster codes and always try out ways to improve speed of existing codes by […]

Continue Reading →