2025-11-10NLPTransformersGoogle MLStanford
The First Principles of Transformers: A Technical Study Journey
A synthesis of my deep dives into NLP, Transformer architectures, and ML engineering foundations.
The First Principles of Transformers: A Technical Study Journey
Deeply understanding the "why" behind the tools we use is critical. Over the 2024 holiday season, I dedicated myself to a first-principles study of NLP and Transformer architectures.
Curriculum & Resources
My study was guided by some of the most rigorous materials available:
- Stanford CME 295: Transformers & Large Language Models. This covered everything from tokenization and Word2Vec to modern variants like MQA/GQA and RoPE.
- Google ML Education: Focusing on the "Rules of ML," production readiness, and the Deep Learning Tuning Playbook.
- 3Blue1Brown: Using visual intuition to master the mechanics of attention.
Key Technical Takeaways
- Attention Mechanics: Moving beyond the black box to understand Multi-Head, Multi-Query, and Grouped-Query Attention.
- Training & Tuning: Deep dives into SFT (Supervised Fine-Tuning), LoRA (Low-Rank Adaptation), and alignment techniques like RLHF and DPO.
- Agentic Reasoning: Bridging the gap between a model that "talks" and a system that "acts" via tool-use and ReAct loops.
Rigorous study is what allows us to bridge the gap between "it works" and "I know why it works."

