Tags → #transformer

23 Feb 2025
Flash Attention on Tenstorrent Hardware
Trying to understand how Flash Attention works on Tenstorrent and how it compares to CUDA
7 Sept 2024
Understanding AdaNorm
Understanding Adaptive Layer Normalization. First introduced in the DiT paper
31 Jul 2024
Understanding Squared attention
Just a brief explanation of how attention mechanism works. As well as the quadratic scaling of attention.
1 May 2024
Layernorm
layer normalization of GPT by Andrej Karpathy
4 Apr 2024
Understanding the Text Corpus and Training Datasets of GPT-3
Exploring GPT-3's diverse training datasets for language model pretraining development.
30 Mar 2024
Decoder Transformer
How I understand the Decoder Transformer in Generative Text Models
27 Mar 2024
Evolution of large language models
A brief history of large language models, from bigrams to transformers
25 Mar 2024
Named Entity Recognition for Mammograph Radiology Reporting
This was my attempt at ner for medical reporting in technofest -- had to step down before qualifying stage