LLM Self-Attention Visualization

Watch how a transformer's self-attention mechanism connects tokens. Each colored line represents the attention weight from one token to another. Drag to orbit, scroll to zoom.

How Attention Works

In a transformer, every token attends to every other token simultaneously. The attention weights determine how much focus each token puts on others when computing its representation.

  • Query/Key: Determine which tokens should attend to which
  • Value: The information that gets passed along
  • Multiplication: Attention = softmax(QKT)V

This Demo Shows

  • 5 input tokens in a sentence
  • Attention connections (lines) between token pairs
  • Line thickness and color intensity represent attention weight
  • Animated pulse demonstrates the attention flow