keshan@blog:~$ ./home.sh

The Ratchet Loop — Systematically Optimizing a TPU Kernel to the Hardware Ceiling

The Ratchet Loop — Systematically Optimizing a TPU Kernel to the Hardware Ceiling Part 3 of the KernelForge series on writing, profiling, and optimizing custom TPU kernels in Python. Part...

./read_more.sh

~/articles

Explore my thoughts on Machine Learning, AI, and more

Profiling TPU Kernels - XProf, HLO, and the Roofline Model

Deep-Dive into TPU Profiling — XProf, HLO, and the Roofline Model Part 2 of the...

Keshan Sanjaya Sodimana

Apr 25, 2026 26 min read

Implementing a Fast Attention Fusion Kernel

Writing Fast Attention on TPU — From Naive Kernel to Fused FlashAttention with Pallas Part...

Keshan Sanjaya Sodimana

Apr 21, 2026 27 min read

Running Containerised Batch Jobs on Google Cloud Platform

Running Containerised Batch Jobs on Google Cloud Platform I needed to OCR tens of thousands...

Keshan Sanjaya Sodimana

Apr 19, 2026 14 min read

From Video to Structured Insight: A Practical Guide to Gemini-Powered Video Analysis

In today’s data-driven world, video content is everywhere—from security surveillance and industrial monitoring to social...

Keshan Sanjaya Sodimana

Jun 21, 2025 8 min read

The Art of Controlled Randomness: A Deep Dive into Sampling techniques in LLMs

Large Language Models (LLMs) are powerful predictors. Given a sequence of text, they excel at...

Keshan Sanjaya Sodimana

Apr 18, 2025 14 min read

List of open source tools and resources for building a voice

Speech synthesize has become more human like within last few years and gained a higher...

Keshan Sanjaya Sodimana

Sep 13, 2018 2 min read

Estimators; An easy way to work with Tensorflow

In a previous post, we discussed about Tensorflow graphs and sessions. Since building a computation...

Keshan Sanjaya Sodimana

Aug 31, 2018 5 min read

● Connected — session active

bash 18:14 12 posts

The Ratchet Loop — Systematically Optimizing a TPU Kernel to the Hardware Ceiling

~/articles

Profiling TPU Kernels - XProf, HLO, and the Roofline Model

Implementing a Fast Attention Fusion Kernel

Running Containerised Batch Jobs on Google Cloud Platform

From Video to Structured Insight: A Practical Guide to Gemini-Powered Video Analysis

The Art of Controlled Randomness: A Deep Dive into Sampling techniques in LLMs

List of open source tools and resources for building a voice

Estimators; An easy way to work with Tensorflow

~/subscribe