LLM Self-Attention Visualization

Watch how a transformer's self-attention mechanism connects tokens. Each colored line represents the attention weight from one token to another. Drag to orbit, scroll to zoom.

How Attention Works

In a transformer, every token attends to every other token simultaneously. The attention weights determine how much focus each token puts on others when computing its representation.

Query/Key: Determine which tokens should attend to which
Value: The information that gets passed along
Multiplication: Attention = softmax(QK^T)V

This Demo Shows

5 input tokens in a sentence
Attention connections (lines) between token pairs
Line thickness and color intensity represent attention weight
Animated pulse demonstrates the attention flow