Stokastik

Machine Learning, AI and Programming

Tag: Shared Memory

Implementing parallel algorithms using CUDA and C++11

Working with deep learning algorithms on GPUs especially the speedups gained in case of CNN and Tensor multiplications with GPU as compared to CPU got me fascinated about learning more about how GPUs work internally, how serial codes (algorithms) can be parallelized on GPU using the CUDA programming paradigm. I have always been a proponent of faster codes and always try out ways to improve speed of existing codes by […]

Continue Reading →