

Ever since the release of ChatGPT in November 2022, the advances in AI and its growth in the public consciousness have been tremendous. At the heart of these epochal changes lies a single machine learning component: the transformer. Like the transistor of the 1960s, the transformer of the 2020s has given us an efficient, composable structure for solving generalized problems, and the transformer can be replicated, scaled up, modularized, and miniaturized however we might need. While the large language models (LLMs) that underpin products like ChatGPT are the most popular, these are but one configuration of the transformer.This book is written for the engineers, machine learning scientists, data scientists, and technologists who are either working with LLMs or transformers, or who are currently trying to break into the field. This technology is so new that machine learning interviews for such positions have not yet been standardized and commoditized to the level of LeetCode, so a broad familiarity with the core concepts is required. Indeed, it is possible to be an expert on one aspect of the LLM space yet still be blindsided by a comparatively rudimentary question on another.Table of ContentsI. Architecture FundamentalsChapter 1. A &8658AttentionChapter 2. V &8658Vanilla TransformerChapter 3. E &8658EmbeddingsChapter 4. C &8658Chinchilla Scaling LawsChapter 5. I &8658InstructGPTChapter 6. R &8658RoPEChapter 7. M &8658Mixture of ExpertsII. Lossless OptimizationsChapter 8. K &8658KV CacheChapter 9. H &8658H100Chapter 10. F &8658FlashAttentionChapter 11. N &8658NCCLChapter 12. P &8658Pipeline ParallelismChapter 13. T &8658Tensor ParallelismChapter 14. Z &8658ZeROIII. Lossy OptimizationsChapter 15. Q &8658QuantizationChapter 16. W &8658WxAyKVzChapter 17. G &8658GPTQChapter 18. L &8658LoRAChapter 19. B &8658BitNetChapter 20. D &8658DistillationChapter 21. S &8658Structured Sparsity