Rank Estimation
Overview
In some applications, we may have a tenor we believe to be closely approximated by a low rank tenor, but do not know what rank to use. In this setting, use rank_detect_factorize rather than factorize without a rank=R keyword. The estimated rank will be added to kwargs, and the decomposition at this rank returned.
R = 3
T = Tucker1((10, 10, 10), R)
Y = array(T)
options = (model=Tucker1, curvature_method=splines) # default values
decomposition, stats, kwargs, final_rel_errors = rank_detect_factorize(Y; options...)
@assert kwargs[:rank] == RThe rank will be estimated by factorizing the input tensor at every possible rank from 1 up to the maximum rank and recording the relative error. We can pick the rank which balances small error (good fit) and small rank (low complexity). Our criteria is the rank which maximizes the curvature of the error vs. rank function, or in the case of the :breakpoints method, a proxy for the curvature.
Note rank_detect_factorize also returns the stats and final relative errors of each rank tested.
BlockTensorFactorization.Core.rank_detect_factorize — Function
rank_detect_factorize(Y; kwargs...)Wraps factorize() with rank detection.
Selects the rank that maximizes the standard curvature of the Relative Error (as a function of rank).
Keywords
online_rank_estimation:false. Set totrueto stop testing larger ranks after the first peak in curvaturecurvature_method::splines. Can also pick:finite_differences(faster but less accurate) orcircles(fastest and smallest memory but more sensitive to results fromfactorize). Set to:breakpointsto pick the rankRthat minimizes least-squares error in the modelf(r) = a + b(min(r, R) - R) + c(max(r, R) - R)and the errors.model:Tucker1. Only rank detection withTucker1andCPDecompositionis currently implementedmax_rank:max_possible_rank(Y, model). Test ranks from1up tomax_rank. Defaults to largest possible rank under the modelrank:nothing. If a rank is passed, rank detection is ignored andfactorize(Y; kwargs...)is called
Any other keywords from factorize, full list given by default_kwargs.
BlockTensorFactorization.Core.max_possible_rank — Function
max_possible_rank(Y, model)Returns the maximum rank possible Y could have under the model.
For matrices I × J this is 1:min(I, J). This is can be extended to tensors for different type of decompositions.
Tucker-1 rank is ≤ min(I, prod(J1,...,JN)) for tensors I × J1 × … × JN.
The CP-rank is ≤ minimum_{n} (prod(I1,...,IN) / In) for tensors I1 × … × IN in general. Although some shapes have have tighter upper bounds. For example, 2 × I × I tensors over ℝ have a maximum rank of floor(3I/2).
If you run rank_detect_factorize with different methods and get the same rank, you can feel more confident in the detected rank.
Estimating Curvature
This package has implemented a few methods for estimating curvature. These are the following list of methods.
:finite_differences
:circles
:splinesWe can also use the following curvature-proxy methods.
:breakpointsThey can be passed to curvature and standard_curvature as a method keyword, or rank_detect_factorize with the curvature_method keyword.
BlockTensorFactorization.Core.curvature — Function
curvature(y::AbstractVector{<:Real}; method=:finite_differences)Approximates the signed curvature of a function given evenly spaced samples.
Possible methods
:finite_differences: Approximates first and second derivative with 3rd order finite differences. Seed_dxandd2_dx2.:splines: Curvature of a third order spline. Seed_dx_and_d2_dx2_spline.:circles: Inverse radius of a circle through rolling three points. Seecircle_curvature.:breakpoints: WARNING does not compute a value that approximates the curvature of a continuous function. Computes the inverse least-squares error off.(eachindex(y); z)andyfor allz in eachindex(y)wheref(x; z) = a + b(min(x, z) - z) + c(max(x, z) - z). Useful ifylooks like two lines. Seebreakpoint_curvature.
BlockTensorFactorization.Core.standard_curvature — Function
standard_curvature(y::AbstractVector{<:Real}; method=:finite_differences)Approximates the signed curvature of a function, scaled to the unit box $[0,1]^2$.
Assumes the function is 1 at 0 and (after x dimension is scaled) 0 at 1.
See curvature.
Possible methods
:finite_differences: Approximates first and second derivative with 3rd order finite differences. Seed_dxandd2_dx2.:splines: Curvature of a third order spline. Seed_dx_and_d2_dx2_spline.:circles: Inverse radius of a circle through rolling three points. Seecircle_curvature.:breakpoints: WARNING does not compute a value that approximates the curvature of a continuous function. Computes the inverse least-squares error off.(eachindex(y); z)andyfor allz in eachindex(y)wheref(x; z) = a + b(min(x, z) - z) + c(max(x, z) - z). Useful ifylooks like two lines. Seebreakpoint_curvature.
These methods have been specialized to typical relative error vs. rank curves. We assume a relative error of $1$ at rank $0$ since the only rank $0$ tenor is the $0$ tenor. And that the relative error at the maximum possible rank is $0$, and stays zero if we try to use a larger rank.
How do these methods work?
Finite Differences
We approximate the first and second derivatives separately with three point finite differences and calculate the curvature with the formula
\[k(x) = \frac{y''(x)}{(1 + y'(x)^2)^{3/2}}.\]
The derivatives are approximated using centred three point finite differences,[1]
\[y'(x_i) \approx \frac{1}{2\Delta x}(y_{i+1} - y_{i-1})\quad\text{and}\quad y''(x_i)\approx \frac{1}{\Delta x^2}(y_{i+1} - 2y_i + y_{i-1}).\]
The end-points use forward/back-differences. For the first derivative, this is
\[y'(x_1) \approx \frac{1}{2\Delta x}(-3y_{1} + 4y_2 - y_{3})\quad\text{and}\quad y'(x_I) \approx \frac{1}{2\Delta x}(-y_{I-2} + 4y_{I-1} - 3y_{I}),\]
and for the second derivative, this is
\[y''(x_1) \approx \frac{1}{\Delta x^2}(y_{1} - 2y_2 + y_{3})\quad\text{and}\quad y''(x_I) \approx \frac{1}{\Delta x^2}(y_{I-2} - 2y_{I-1} + y_{I}).\]
BlockTensorFactorization.Core.d_dx — Function
d_dx(y::AbstractVector{<:Real})Approximate first derivative with finite elements.
$\frac{d}{dx}y[i] \approx \frac{1}{2\Delta x}(y[i+1] - y[i-1])$
Assumes y[i] = y(x[i]) are samples with unit spaced inputs Δx = x[i+1] - x[i] = 1.
BlockTensorFactorization.Core.d2_dx2 — Function
d2_dx2(y::AbstractVector{<:Real})Approximate second derivative with finite elements.
$\frac{d^2}{dx^2}y[i] \approx \frac{1}{\Delta x^2}(y[i-1] - 2y[i] + y[i+1])$
Assumes y[i] = y(x[i]) are samples with unit spaced inputs Δx = x[i+1] - x[i] = 1.
Splines
We calculate a third order spline
\[g_i(x) = a_i(x-x_i)^3+b_i(x-x_i)^2+c_i(x-x_i)+d_i,\quad x_1 \leq x \leq x_{i+1}\]
and use the coefficients to calculate the curvature of the spline
\[k(x_i) = \frac{2b_i}{(1+c_i^2)^{3/2}}.\]
We assume the following boundary conditions;
- $g_{1}(x_{1}-\Delta x)=1$, $y$-intercept $(0,1)$
- $g_{I}(x_{I}+\Delta x)=y_{I}$, repeated right end-point ($y_{I+1}=y_{I}$)
- $g_{I}''(x_{I}+\Delta x)=0$, flat right end-point,
in addition to the usual continuity and smoothness assumptions of a third order spline:
- $g_{i}(x_{i})=y_{i}$
- $g_{i}(x_{i+1})=g_{i+1}(x_{i+1})$
- $g_{i}'(x_{i+1})=g_{i+1}'(x_{i+1})$
- $g_{i}''(x_{i+1})=g_{i+1}''(x_{i+1})$.
We can solve for the coefficients by first solving the tri-diagonal system $Mb=v$, where
\[M=\begin{bmatrix} 6 & 0 & & & \\ 1 & 4 & 1 & & \\ & \ddots& \ddots& \ddots& \\ & & 1 & 4 & 1& \\ & & & 0 & 1 \end{bmatrix},\quad b =\begin{bmatrix} b_{1} \\ b_{2}\\ \vdots \\ b_{I} \\ b_{I+1} \end{bmatrix},\quad \text{and} \quad v = \frac{3}{\Delta x^2}\begin{bmatrix} 1-2y_{1}+y_{2} \\ y_{3}-y_{1} \\ \vdots \\ y_{I+1}-y_{I-1} \\ 0 \end{bmatrix},\]
and then calculate
\[a_{i} = \frac{1}{3\Delta x}(b_{i+1}-b_{i})\quad \text{and}\quad c_{i}=\frac{{y_{i+1}-y_{i}}}{\Delta x}-\frac{\Delta x}{3}(b_{i+1}-2b_{i}).\]
BlockTensorFactorization.Core.cubic_spline_coefficients — Function
cubic_spline_coefficients(y::AbstractVector{<:Real}; h=1)Calculates the list of coefficients a, b, c, d for an interpolating spline of y[i]=y(x[i]).
The spline is defined as $f(x) = g_i(x)$ on $x[i] \leq x \leq x[i+1]$ where
$g_i(x) = a[i](x-x[i])^3 + b[i](x-x[i])^2 + c[i](x-x[i]) + d[i]$
Uses the following boundary conditions
- $g_1(x[1]-h) = 1$ (i.e. the $y$-intercept is $(0,1)$ for uniform spaced
x=1:I) - $g_I(x[I]+h) = y[I]$ (i.e. repeated right end-point)
- $g_I''(x[I]+h) = 0$ (i.e. flat/no-curvature one spacing after end-point)
BlockTensorFactorization.Core.spline_mat — Function
spline_mat(n)Creates the Tridiagonal matrix to solve for coefficients b. See cubic_spline_coefficients.
BlockTensorFactorization.Core.make_spline — Function
make_spline(y::AbstractVector{<:Real}; h=1)Returns a function f(x) that is an interpolating/extrapolating spline for y, with uniform stepsize h between the x-values of the knots.
BlockTensorFactorization.Core.d_dx_and_d2_dx2_spline — Function
d_dx_and_d2_dx2_spline(y::AbstractVector{<:Real}; h=1)Extracts the first and second derivatives of the splines from y at the knots
Circles
We compute the radius $r_i$ of a circle passing through three neighbouring points $(x_{i-1}, y_{i-1})$, $(x_{i}, y_{i})$, and $(x_{i+1}, y_{i+1})$. The curvature magnitude is approximately the inverse of this radius;
\[\lvert k(x_i)\rvert \approx \frac{1}{r_i}.\]
We assume two additional points so we can estimate the curvature at the boundary. They are
\[(x_{0}, y_{0}) = (0, 1)\quad\text{and}\quad (x_{I+1}, y_{I+1}) = (x_{I}+\Delta x, 0).\]
To obtain a the signed curvature $k(x_i)$, we check if the middle point $(x_{i}, y_{i})$ is above (negative curvature) or below (positive curvature) the line segment connecting $(x_{i-1}, y_{i-1})$ and $(x_{i+1}, y_{i+1})$.
BlockTensorFactorization.Core.three_point_circle — Function
three_point_circle((a,f),(b,g),(c,h))Calculates radius r and center point (p, q) of the circle passing through the three points in the xy-plane.
Example
r, (p, q) = three_point_circle((1,2), (2,1), (5,2))
(r, (p, q)) == (√5, (3, 3))BlockTensorFactorization.Core.circle_curvature — Function
circle_curvature(y::AbstractVector{<:Real}; h=1, estimate_endpoints=true)Inverse radius of a the circle passing through each 3 adjacent points on y, (0,y[i-1]), (h,y[i]), and (2h,y[i+1]).
If estimate_endpoints=true, assumes the function that y comes from is 1 to the left of the given values, and 0 to the right. This is typical of relative error decay as a function of rank. If false, pads the boundary with the adjacent curvature.
See three_point_circle.
BlockTensorFactorization.Core.signed_circle_curvature — Function
signed_circle_curvature((a,f),(b,g),(c,h))Signed inverse radius of the circle passing through the 3 points in the xy-plane.
See three_point_circle.
Two-line Breakpoint
This method does not approximate the curvature of a function passing through the points $\{(x_i, y_i)\}$, but picks the best rank based on the optimal breakpoint. The optimal breakpoint $r$ is the breakpoint of segmented linear model that minimizes the model error.[2] This means
\[r = \argmin_{z \in \{1,\dots, R\}} \sum_{i=1}^I (f_z(x_i) - y_i)^2,\]
where
\[f_z(x; a_z, b_z, c_z) = a_z + b_z(\min(x,z) - z) + c_z(\max(x,z) - z)\]
and the coefficients $(a_z, b_z, c_z)$ are selected to minimize the error between the model and the data,
\[(a_z, b_z, c_z) = \argmin_{a,b,c\in\mathbb{R}} \sum_{i=1}^I (f_z(x_i;a,b,c) - y_i)^2.\]
These coefficients have the closed form solution
\[\begin{bmatrix} a_{z} \\ b_{z} \\ c_{z} \end{bmatrix} =(M^\top M)^{-1}M^\top \begin{bmatrix} y_{1} \\ \vdots \\ y_{I} \end{bmatrix},\quad\text{where}\quad M = \begin{bmatrix} 1 & \min(x_{1},z) - z&\max(x_{1},z) - z \\ \vdots & \vdots & \vdots\\ 1 & \min(x_{I},z) - z&\max(x_{I},z) - z \end{bmatrix}.\]
Geometrically, the breakpoint model is two half-infinite lines on $(-\infty, z]$ and $[z,\infty)$ that meet continuously at $(z,a)$ with slopes $b$ and $c$ respectively.
BlockTensorFactorization.Core.best_breakpoint — Function
best_breakpoint(xs, ys; breakpoints=xs)Breakpoint z in breakpoints that minimizes the breakpoint_error.
BlockTensorFactorization.Core.breakpoint_error — Function
breakpoint_error(xs, ys, z)Squared L2 error between the best breakpoint model (with breakpoint z) evaluated at xs and the data ys.
BlockTensorFactorization.Core.breakpoint_model — Function
breakpoint_model(a, b, c, z)Returns a function x -> a + b*(min(x, z) - z) + c*(max(x, z) - z).
BlockTensorFactorization.Core.breakpoint_model_coefficients — Function
breakpoint_model_coefficients(xs, ys, breakpoint)Least squares fit data $(x_i, y_i)$
$\min_{a,b,c} 0.5\sum_{i} (f(x_i; a,b,c) - y_i)^2$
with the model
$f(x; a,b,c) = a + b(\min(x, z) - x) + c(\max(x, z) - x)$
for some fixed $z$.
BlockTensorFactorization.Core.breakpoint_curvature — Function
breakpoint_curvature(y)This is a hacked way to fit the data y with a breakpoint model, which can be called by k = standard_curvature(...; model=:breakpoints)
This lets us call argmax(k) to get the breakpoint that minimizes the model error.
- 1M. Abramowitz and I. A. Stegun, "Handbook of mathematical functions: with formulas, graphs and mathematical tables", Unabridged, Unaltered and corr. Republ. of the 1964 ed. in Dover books on advanced mathematics. New York: Dover publ, 1972. (Table 25.2, p. 914)
- 2J. E. Saylor, K. E. Sundell, and G. R. Sharman, "Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data," Earth and Planetary Science Letters, vol. 512, pp. 46–58, Apr. 2019, doi: 10.1016/j.epsl.2019.01.044. (Section 3.2)