NOC:GPU Architectures and Programming


Lecture 1 - Review of basic COA w.r.t. performance


Lecture 2 - Review of basic COA w.r.t. performance


Lecture 3 - Review of basic COA w.r.t. performance


Lecture 4 - Review of basic COA w.r.t. performance


Lecture 5 - Intro to GPU architectures


Lecture 6 - Intro to GPU architectures


Lecture 7 - Intro to GPU architectures


Lecture 8 - Intro to GPU architectures


Lecture 9 - Intro to CUDA programming


Lecture 10 - Intro to CUDA programming (Continued...)


Lecture 11 - Intro to CUDA programming (Continued...)


Lecture 12 - Intro to CUDA programming (Continued...)


Lecture 13 - Multi-dimensional mapping of dataspace; Synchronization


Lecture 14 - Multi-dimensional mapping of dataspace; Synchronization (Continued...)


Lecture 15 - Multi-dimensional mapping of dataspace; Synchronization (Continued...)


Lecture 16 - Warp Scheduling and Divergence


Lecture 17 - Warp Scheduling and Divergence (Continued...)


Lecture 18 - Warp Scheduling and Divergence (Continued...)


Lecture 19 - Memory Access Coalescing


Lecture 20 - Memory Access Coalescing (Continued...)


Lecture 21 - Memory Access Coalescing (Continued...)


Lecture 22 - Memory Access Coalescing (Continued...)


Lecture 23 - Memory Access Coalescing (Continued...)


Lecture 24 - Memory Access Coalescing (Continued...)


Lecture 25 - Memory Access Coalescing (Continued...)


Lecture 26 - Memory Access Coalescing (Continued...)


Lecture 27 - Memory Access Coalescing (Continued...)


Lecture 28 - Optimizing Reduction Kernels


Lecture 29 - Optimizing Reduction Kernels (Continued...)


Lecture 30 - Optimizing Reduction Kernels (Continued...)


Lecture 31 - Optimizing Reduction Kernels (Continued...)


Lecture 32 - Optimizing Reduction Kernels (Continued...)


Lecture 33 - Optimizing Reduction Kernels (Continued...)


Lecture 34 - Optimizing Reduction Kernels (Continued...)


Lecture 35 - Kernel Fusion, Thread and Block Coarsening


Lecture 36 - Kernel Fusion, Thread and Block Coarsening (Continued...)


Lecture 37 - Kernel Fusion, Thread and Block Coarsening (Continued...)


Lecture 38 - Kernel Fusion, Thread and Block Coarsening (Continued...)


Lecture 39 - Kernel Fusion, Thread and Block Coarsening (Continued...)


Lecture 40 - Kernel Fusion, Thread and Block Coarsening (Continued...)


Lecture 41 - OpenCL - Runtime System


Lecture 42 - OpenCL - Runtime System (Continued...)


Lecture 43 - OpenCL - Runtime System (Continued...)


Lecture 44 - OpenCL - Runtime System (Continued...)


Lecture 45 - OpenCL - Runtime System (Continued...)


Lecture 46 - OpenCL - Runtime System (Continued...)


Lecture 47 - OpenCL - Runtime System (Continued...)


Lecture 48 - OpenCL - Heterogeneous Computing


Lecture 49 - OpenCL - Heterogeneous Computing (Continued...)


Lecture 50 - OpenCL - Heterogeneous Computing (Continued...)


Lecture 51 - OpenCL - Heterogeneous Computing (Continued...)


Lecture 52 - OpenCL - Heterogeneous Computing (Continued...)


Lecture 53 - OpenCL - Heterogeneous Computing (Continued...)


Lecture 54 - Efficient Neural Network Training/Inferencing


Lecture 55 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 56 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 57 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 58 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 59 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 60 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 61 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 62 - Efficient Neural Network Training/Inferencing (Continued...)


Lecture 63 - Efficient Neural Network Training/Inferencing (Continued...)