EXP_03 · LLM/CNN Deep Architecture Visualization
GoogolMind NeuralFlow
Live
Step 1 / 8
1 — Input Text & Tokenization
Tokens click any token to inspect it
2 — Token Embeddings
E[token_id] = Embedding_Matrix[token_id, :] ∈ ℝd_model
Embedding matrix each column = one token vector | rows = dimensions
3 — Positional Encoding sin/cos at different frequencies
PE(pos, 2i) = sin(pos / 100002i/d)    PE(pos, 2i+1) = cos(pos / 10000(2i+1)/d)
PE signal
+
X = Embed + PE (input to transformer)
4 — Q, K, V Projections three learned linear maps per head
Q = X·WQ    K = X·WK    V = X·WV     W ∈ ℝd_model × d_head
5 — Self-Attention Scores hover any cell to see the relationship
Scores = Q · KT / √dk     AttnWeights = softmax(Scores)
Raw scores (QKᵀ / √dk)
→ softmax →
Attention weights (rows sum to 1.0)
6 — Value Aggregation + Residual context-enriched representations
AttnOut = AttnWeights · V     X' = LayerNorm(X + AttnOut)
Context-aware token vectors each token now sees all others
Residual stream magnitude bars show per-dim values after LayerNorm
7 — Feed-Forward Network applied independently to each token position
FFN(x) = GeLU(x·W1+b1)·W2+b2     d_ff = 4×d_model
After FFN + LayerNorm final per-token representations
8 — Next Token Prediction logits → softmax → sample
logits = hlast·Wvocab     P(w) = softmax(logits / temperature)
Temperature
0.8
Generated sequence
Model Config
d_model8
Num Heads2
d_ff multiplier
Vocabulary Sample
Step Explanation
Selected Token
Click a token to inspect its embedding, attention patterns, and activations