Learn more: https://bit.ly/4tts8MQ

Large language models can feel opaque, especially when you’re dealing with slow inference, hallucinations, memory bottlenecks, or output you can’t fully explain.

Today, we’re launching Transformers in Practice, a course taught by Sharon Zhou, VP of Engineering & AI at AMD.

The course focuses on understanding what’s actually happening inside transformer-based models so you can reason about their behavior, debug issues

more effectively, and make better deployment decisions.

You’ll learn:

  • How transformers generate text one token at a time, and how sampling affects output - What attention, positional encoding, and transformer layers

are actually doing - Why hallucinations happen and how techniques like RAG and constrained generation help - How optimizations like quantization, KV

caching, flash attention, and speculative decoding improve inference efficiency on GPUs

Throughout the course, interactive visualizations help build intuition for concepts that are often difficult to grasp through theory alone.

This course will give you a practical understanding of transformers from both the model and systems perspectives.

Enroll now: https://bit.ly/4tts8MQ