Exported Terms

MatrixTensorFactor — Module

Matrix-Tensor Factorization

Nonnegative Matrix-Tensor Factorization

MatrixTensorFactor.nnmtf — Function

nnmtf(Y::AbstractArray, R::Integer; kwargs...)

Non-negatively matrix-tensor factorizes an order N tensor Y with a given "rank" R.

For an order $N=3$ tensor, this factorizes $Y \approx A B$ where $\displaystyle Y[i,j,k] \approx \sum_{r=1}^R A[i,r]*B[r,j,k]$ and the factors $A, B \geq 0$ are nonnegative.

For higher orders, this becomes $\displaystyle Y[i1,i2,...,iN] \approx \sum_{r=1}^R A[i1,r]*B[r,i2,...,iN].$

Note there may NOT be a unique optimal solution

Arguments

Y::AbstractArray{T,N}: tensor to factorize
R::Integer: rank to factorize Y (size(A)[2] and size(B)[1])

Keywords

maxiter::Integer=100: maxmimum number of iterations
tol::Real=1e-3: desiered tolerance for the convergence criterion
rescale_AB::Bool=true: scale B at each iteration so that the factors (horizontal slices) have similar 3-fiber sums.
rescale_Y::Bool=true: Preprocesses the input Y to have normalized 3-fiber sums (on average), and rescales the final B so Y=A*B.
normalize::Symbol=:fibres: part of B that should be normalized (must be in IMPLIMENTED_NORMALIZATIONS)
projection::Symbol=:nnscale: constraint to use and method for enforcing it (must be in IMPLIMENTED_PROJECTIONS)
criterion::Symbol=:ncone: how to determine if the algorithm has converged (must be in IMPLIMENTED_CRITERIA)
stepsize::Symbol=:lipshitz: used for the gradient decent step (must be in IMPLIMENTED_STEPSIZES)
momentum::Bool=false: use momentum updates
delta::Real=0.9999: safeguard for maximum amount of momentum (see eq (3.5) Xu & Yin 2013)
R_max::Integer=size(Y)[1]: maximum rank to try if R is not given
projectionA::Symbol=projection: projection to use on factor A (must be in IMPLIMENTED_PROJECTIONS)
projectionB::Symbol=projection: projection to use on factor B (must be in IMPLIMENTED_PROJECTIONS)
A_init::AbstractMatrix=nothing: initial A for the iterative algorithm. Should be kept as nothing if R is not given.
B_init::AbstractArray=nothing: initial B for the iterative algorithm. Should be kept as nothing if R is not given.

Returns

A::Matrix{Float64}: the matrix A in the factorization Y ≈ A * B
B::Array{Float64, N}: the tensor B in the factorization Y ≈ A * B
rel_errors::Vector{Float64}: relative errors at each iteration
norm_grad::Vector{Float64}: norm of the full gradient at each iteration
dist_Ncone::Vector{Float64}: distance of the -gradient to the normal cone at each iteration
If R was estimated, also returns the optimal R::Integer

Implimentation of block coordinate decent updates

We calculate the partial gradients and corresponding Lipshitz constants like so:

\[\begin{align} \boldsymbol{P}^{t}[q,r] &=\textstyle{\sum}_{jk} \boldsymbol{\mathscr{B}}^n[q,j,k] \boldsymbol{\mathscr{B}}^n[r,j,k]\\ \boldsymbol{Q}^{t}[i,r] &=\textstyle{\sum}_{jk}\boldsymbol{\mathscr{Y}}[i,j,k] \boldsymbol{\mathscr{B}}^n[r,j,k] \\ \nabla_{A} f(\boldsymbol{A}^{t},\boldsymbol{\mathscr{B}}^{t}) &= \boldsymbol{A}^{t} \boldsymbol{P}^{t} - \boldsymbol{Q}^{t} \\ L_{A} &= \left\lVert \boldsymbol{P}^{t} \right\rVert_{2}. \end{align}\]

Similarly for $\boldsymbol{\mathscr{B}}$:

\[\begin{align} \boldsymbol{T}^{t+1}&=(\boldsymbol{A}^{t+\frac12})^\top \boldsymbol{A}^{t+\frac12}\\ \boldsymbol{\mathscr{U}}^{t+1}&=(\boldsymbol{A}^{t+\frac12})^\top \boldsymbol{\mathscr{Y}} \\ \nabla_\boldsymbol{\mathscr{B}} f(\boldsymbol{A}^{t+\frac12},\boldsymbol{\mathscr{B}}^{t}) &= \boldsymbol{T}^{t+1} \boldsymbol{\mathscr{B}}^{t} - \boldsymbol{\mathscr{U}}^{t+1} \\ L_B &= \left\lVert \boldsymbol{T}^{t+1} \right\rVert_{2}. \end{align}\]

To ensure the iterates stay "close" to normalized, we introduce a renormalization step after the projected gradient updates:

\[\begin{align} \boldsymbol{C} [r,r]&=\frac{1}{J}\textstyle{\sum}_{jk} \boldsymbol{\mathscr{B}}^{t+\frac12}[r,j,k]\\ \boldsymbol{A}^{t+1}&= \boldsymbol{A}^{t+\frac12} \boldsymbol{C}\\ \boldsymbol{\mathscr{B}}^{t+1}&= (\boldsymbol{C}^{t+1})^{-1}\boldsymbol{\mathscr{B}}^{t+\frac12}. \end{align}\]

We typicaly use the following convergence criterion:

\[d(-\nabla \ell(\boldsymbol{A}^{t},\boldsymbol{\mathscr{B}}^{t}), N_{\mathcal{C}}(\boldsymbol{A}^{t},\boldsymbol{\mathscr{B}}^{t}))^2\leq\delta^2 R(I+JK).\]