LLM Self-Attention Visualization
Watch how a transformer's self-attention mechanism connects tokens. Each colored line represents the attention weight from one token to another. Drag to orbit, scroll to zoom.
How Attention Works
In a transformer, every token attends to every other token simultaneously. The attention weights determine how much focus each token puts on others when computing its representation.
- Query/Key: Determine which tokens should attend to which
- Value: The information that gets passed along
- Multiplication: Attention = softmax(QKT)V
This Demo Shows
- 5 input tokens in a sentence
- Attention connections (lines) between token pairs
- Line thickness and color intensity represent attention weight
- Animated pulse demonstrates the attention flow