Machine Learning, AI and Programming


Sorting numbers in parallel on GPU - Bitonic Sort

Sorting is one of the most common use cases in programming. There are lots of different effecient algorithms existing for sorting and most built-in libraries for specific programming languages have implemented them in a time effecient manner. Merge Sort, Quick Sort, Heap Sort are generic algorithms for sorting real numbers. They have an average-case run-time complexity of O(N*logN) where we are sorting N real numbers. Each of them has a […]

Continue Reading →

Implementing parallel algorithms using CUDA and C++11

Working with deep learning algorithms on GPUs especially the speedups gained in case of CNN and Tensor multiplications with GPU as compared to CPU got me fascinated about learning more about how GPUs work internally, how serial codes (algorithms) can be parallelized on GPU using the CUDA programming paradigm. I have always been a proponent of faster codes and always try out ways to improve speed of existing codes by […]

Continue Reading →