One Page Note for AI

Posts

Showing posts from September, 2025

flash attension

September 24, 2025

# Example: N = d = d_v = 4, scale = 1/sqrt(d) = 1/2 Q = [ [1, 0, 2, 0], [0, 1, 1, 0], [1, 1, 0, 1], [0, 2, 1, 1] ] K = [ [2, 1, 0, 0], [0, 1, 1, 0], [1, 0, 1, 1], [0, 0, 2, 1] ] V = [ [1, 0], [0, 1], [1, 1], [2, 1] ] ======================== 1) NORMAL ATTENTION ======================== Scores: S = (Q K^T) / 2 → S ∈ R^{4×4} S = [ [1.0, 1.0, 1.5, 2.0], [0.5, 1.0, 0.5, 1.0], [1.5, 0.5, 1.0, 0.5], [1.0, 1.5, 1.0, 1.5] ] Row-wise softmax P = softmax_row(S) (approximate decimals) P ≈ [ [0.1570, 0.1570, 0.2588, 0.4269], [0.1887, 0.3113, 0.1887, 0.3113], [0.4271, 0.1571, 0.2589, 0.1571], [0.1888, 0.3112, 0.1888, 0.3112] ] Output: O = P V (approximate) O ≈ [ [1.2696, 0.8427], [1.0000, 0.8113], [1.0000, 0.5731], [1.0000, 0.8112] ] ======================================== 2) FLASHATTENTION TILING (same numbers) =======================================...

Eigen decomposition, Singular Value Decomposition (SVD) and Principal Component Analysis

September 24, 2025

1. Eigen decomposition for a square matrix: Av = λv: an eigenvector v is a direction that A only scales (by eigenvalue λ) without rotation. To find eigenvalues λ, solve det(A - λI) = 0. Then express A = VΛV⁻¹: eigen decomposition rewrites A in terms of its eigenvectors V and eigenvalues Λ. 2. Singular Value Decomposition (SVD) for any matrix: Eigen decomposition of XᵀX: eigenvectors = right singular vectors V, eigenvalues = squared singular values σ². Eigen decomposition of XXᵀ: eigenvectors = left singular vectors U, eigenvalues = squared singular values σ². Full factorization: X = UΣVᵀ: U (m×m) = left singular vectors, V (n×n) = right singular vectors, ...