Implementing a Fast Attention Fusion Kernel
Writing Fast Attention on TPU — From Naive Kernel to Fused FlashAttention with Pallas Part...
Deep-Dive into TPU Profiling — XProf, HLO, and the Roofline Model Part 2 of the KernelForge series on writing, profiling, and optimizing custom TPU kernels in Python. Google Cloud credits...
./read_more.shExplore my thoughts on Machine Learning, AI, and more
Writing Fast Attention on TPU — From Naive Kernel to Fused FlashAttention with Pallas Part...
Running Containerised Batch Jobs on Google Cloud Platform I needed to OCR tens of thousands...
In today’s data-driven world, video content is everywhere—from security surveillance and industrial monitoring to social...
Large Language Models (LLMs) are powerful predictors. Given a sequence of text, they excel at...
Speech synthesize has become more human like within last few years and gained a higher...
In a previous post, we discussed about Tensorflow graphs and sessions. Since building a computation...
Customer churn or customer attrition is the loss of existing customers from a service or...