矩阵分解是机器学习的基石技术,从传统的推荐系统到现代大语言模型的参数高效微调(LoRA),都离不开矩阵分解的思想。
奇异值分解 (SVD) 基本形式 任意矩阵 $A \in \mathbb{R}^{m \times n}$ 可以分解为:
$$ A = U \Sigma V^T $$
其中:
$U \in \mathbb{R}^{m \times m}$:左奇异向量(正交矩阵)
$\Sigma \in \mathbb{R}^{m \times n}$:奇异值对角矩阵
$V \in \mathbb{R}^{n \times n}$:右奇异向量(正交矩阵)
Truncated SVD 保留前 $r$ 个最大奇异值:
$$ A \approx A_r = U_r \Sigma_r V_r^T $$
这是最优 的秩 $r$ 近似(Eckart-Young 定理):
$$ A_r = \arg\min_{\text{rank}(B) = r} |A - B|_F $$
Randomized SVD 当矩阵规模巨大时,精确 SVD 计算代价过高。Randomized SVD 提供了高效的近似方法。
算法实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import numpy as npfrom scipy import linalgdef randomized_svd (A, n_components, n_oversamples=10 , n_iter=4 ): """ Randomized SVD for large matrices. Args: A: Input matrix (m x n) n_components: Number of singular values to compute n_oversamples: Additional random vectors for accuracy n_iter: Number of power iterations Returns: U, s, Vt: Truncated SVD components """ m, n = A.shape n_random = n_components + n_oversamples Q = np.random.randn(n, n_random) for _ in range (n_iter): Q, _ = linalg.lu(A @ Q, permute_l=True ) Q, _ = linalg.lu(A.T @ Q, permute_l=True ) Q, _ = linalg.qr(A @ Q, mode='economic' ) B = Q.T @ A Uhat, s, Vt = linalg.svd(B, full_matrices=False ) U = Q @ Uhat return U[:, :n_components], s[:n_components], Vt[:n_components, :]
复杂度对比
方法
时间复杂度
空间复杂度
精确 SVD
$O(mn \cdot \min(m,n))$
$O(mn)$
Randomized SVD
$O(mn \cdot r)$
$O((m+n) \cdot r)$
Truncated SVD (Lanczos)
$O(mn \cdot r)$
$O((m+n) \cdot r)$
现代应用:LoRA LoRA (Low-Rank Adaptation) 是大语言模型参数高效微调的核心技术,直接利用了低秩分解的思想。
LoRA 原理 预训练权重 $W_0$ 固定,只训练低秩增量:
$$ W = W_0 + \Delta W = W_0 + BA $$
其中 $B \in \mathbb{R}^{d \times r}$,$A \in \mathbb{R}^{r \times k}$,$r \ll \min(d, k)$。
实现示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import torchimport torch.nn as nnclass LoRALayer (nn.Module): def __init__ (self, in_features, out_features, rank=4 , alpha=1.0 ): super ().__init__() self .rank = rank self .alpha = alpha self .W = nn.Linear(in_features, out_features, bias=False ) self .W.weight.requires_grad = False self .A = nn.Linear(in_features, rank, bias=False ) self .B = nn.Linear(rank, out_features, bias=False ) nn.init.kaiming_uniform_(self .A.weight) nn.init.zeros_(self .B.weight) self .scaling = alpha / rank def forward (self, x ): return self .W(x) + self .scaling * self .B(self .A(x))
参数效率 对于 LLaMA-7B:
方法
可训练参数
显存占用
全量微调
7B (100%)
~140GB
LoRA (r=8)
4.7M (0.07%)
~14GB
LoRA (r=16)
9.4M (0.13%)
~16GB
其他应用 1. 推荐系统 矩阵分解用于协同过滤:
$$ R \approx U V^T $$
1 2 3 4 5 6 7 from surprise import SVD, Dataset, Readerreader = Reader(rating_scale=(1 , 5 )) data = Dataset.load_from_df(df[['user' , 'item' , 'rating' ]], reader) model = SVD(n_factors=100 ) model.fit(trainset)
2. 文本表示 (LSA) 潜在语义分析:
1 2 3 4 5 6 7 8 from sklearn.decomposition import TruncatedSVDfrom sklearn.feature_extraction.text import TfidfVectorizervectorizer = TfidfVectorizer(max_features=10000 ) X = vectorizer.fit_transform(documents) svd = TruncatedSVD(n_components=100 ) X_reduced = svd.fit_transform(X)
3. 图像压缩 1 2 3 4 5 6 7 8 9 10 11 from PIL import Imageimport numpy as npdef compress_image (image_path, n_components=50 ): img = np.array(Image.open (image_path).convert('L' )) U, s, Vt = np.linalg.svd(img, full_matrices=False ) compressed = U[:, :n_components] @ np.diag(s[:n_components]) @ Vt[:n_components, :] return compressed.astype(np.uint8)
数值稳定性 条件数 $$ \kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}} $$
条件数过大会导致数值不稳定。
正则化 SVD 1 2 3 4 5 def regularized_svd (A, lambda_reg=0.01 ): """Add regularization for numerical stability.""" U, s, Vt = np.linalg.svd(A, full_matrices=False ) s_reg = s / (s**2 + lambda_reg) return U, s_reg, Vt
延伸阅读
Halko et al., Finding Structure with Randomness (2011)
Hu et al., LoRA: Low-Rank Adaptation of Large Language Models (2021)
NumPy SVD Documentation
转载请注明出处