skip to content
Tags → #transformer
-
Understanding Adaptive Layer Normalization. First introduced in the DiT paper
-
Just a brief explanation of how attention mechanism works. As well as the quadratic scaling of attention.
-
layer normalization of GPT by Andrej Karpathy
-
Exploring GPT-3's diverse training datasets for language model pretraining development.
-
How I understand the Decoder Transformer in Generative Text Models
-
A brief history of large language models, from bigrams to transformers
-
This was my attempt at ner for medical reporting in technofest -- had to step down before qualifying stage