Back to Research
2025-11-10NLPTransformersGoogle MLStanford

The First Principles of Transformers: A Technical Study Journey

A synthesis of my deep dives into NLP, Transformer architectures, and ML engineering foundations.

The First Principles of Transformers: A Technical Study Journey

Deeply understanding the "why" behind the tools we use is critical. Over the 2024 holiday season, I dedicated myself to a first-principles study of NLP and Transformer architectures.

Curriculum & Resources

My study was guided by some of the most rigorous materials available:

  • Stanford CME 295: Transformers & Large Language Models. This covered everything from tokenization and Word2Vec to modern variants like MQA/GQA and RoPE.
  • Google ML Education: Focusing on the "Rules of ML," production readiness, and the Deep Learning Tuning Playbook.
  • 3Blue1Brown: Using visual intuition to master the mechanics of attention.

Key Technical Takeaways

  1. Attention Mechanics: Moving beyond the black box to understand Multi-Head, Multi-Query, and Grouped-Query Attention.
  2. Training & Tuning: Deep dives into SFT (Supervised Fine-Tuning), LoRA (Low-Rank Adaptation), and alignment techniques like RLHF and DPO.
  3. Agentic Reasoning: Bridging the gap between a model that "talks" and a system that "acts" via tool-use and ReAct loops.

Rigorous study is what allows us to bridge the gap between "it works" and "I know why it works."