Rank Estimation

Overview

In some applications, we may have a tenor we believe to be closely approximated by a low rank tenor, but do not know what rank to use. In this setting, use rank_detect_factorize rather than factorize without a rank=R keyword. The estimated rank will be added to kwargs, and the decomposition at this rank returned.

R = 3
T = Tucker1((10, 10, 10), R)
Y = array(T)
options = (model=Tucker1, curvature_method=splines) # default values
decomposition, stats, kwargs, final_rel_errors = rank_detect_factorize(Y; options...)

@assert kwargs[:rank] == R

The rank will be estimated by factorizing the input tensor at every possible rank from 1 up to the maximum rank and recording the relative error. We can pick the rank which balances small error (good fit) and small rank (low complexity). Our criteria is the rank which maximizes the curvature of the error vs. rank function, or in the case of the :breakpoints method, a proxy for the curvature.

Note rank_detect_factorize also returns the stats and final relative errors of each rank tested.

BlockTensorFactorization.Core.rank_detect_factorize — Function

rank_detect_factorize(Y; kwargs...)

Wraps factorize() with rank detection.

Selects the rank that maximizes the standard curvature of the Relative Error (as a function of rank).

Keywords

online_rank_estimation: false. Set to true to stop testing larger ranks after the first peak in curvature
curvature_method: :splines. Can also pick :finite_differences (faster but less accurate) or circles (fastest and smallest memory but more sensitive to results from factorize). Set to :breakpoints to pick the rank R that minimizes least-squares error in the model f(r) = a + b(min(r, R) - R) + c(max(r, R) - R) and the errors.
model: Tucker1. Only rank detection with Tucker1 and CPDecomposition is currently implemented
max_rank: max_possible_rank(Y, model). Test ranks from 1 up to max_rank. Defaults to largest possible rank under the model
rank: nothing. If a rank is passed, rank detection is ignored and factorize(Y; kwargs...) is called

Any other keywords from factorize, full list given by default_kwargs.

BlockTensorFactorization.Core.max_possible_rank — Function

max_possible_rank(Y, model)

Returns the maximum rank possible Y could have under the model.

For matrices I × J this is 1:min(I, J). This is can be extended to tensors for different type of decompositions.

Tucker-1 rank is ≤ min(I, prod(J1,...,JN)) for tensors I × J1 × … × JN.

The CP-rank is ≤ minimum_{n} (prod(I1,...,IN) / In) for tensors I1 × … × IN in general. Although some shapes have have tighter upper bounds. For example, 2 × I × I tensors over ℝ have a maximum rank of floor(3I/2).

Tip

If you run rank_detect_factorize with different methods and get the same rank, you can feel more confident in the detected rank.

Estimating Curvature

This package has implemented a few methods for estimating curvature. These are the following list of methods.

:finite_differences
:circles
:splines

We can also use the following curvature-proxy methods.

:breakpoints

They can be passed to curvature and standard_curvature as a method keyword, or rank_detect_factorize with the curvature_method keyword.

BlockTensorFactorization.Core.curvature — Function

curvature(y::AbstractVector{<:Real}; method=:finite_differences)

Approximates the signed curvature of a function given evenly spaced samples.

Possible methods

:finite_differences: Approximates first and second derivative with 3rd order finite differences. See d_dx and d2_dx2.
:splines: Curvature of a third order spline. See d_dx_and_d2_dx2_spline.
:circles: Inverse radius of a circle through rolling three points. See circle_curvature.
:breakpoints: WARNING does not compute a value that approximates the curvature of a continuous function. Computes the inverse least-squares error of f.(eachindex(y); z) and y for all z in eachindex(y) where f(x; z) = a + b(min(x, z) - z) + c(max(x, z) - z). Useful if y looks like two lines. See breakpoint_curvature.

BlockTensorFactorization.Core.standard_curvature — Function

standard_curvature(y::AbstractVector{<:Real}; method=:finite_differences)

Approximates the signed curvature of a function, scaled to the unit box $[0,1]^2$.

Assumes the function is 1 at 0 and (after x dimension is scaled) 0 at 1.

See curvature.

Possible methods

:finite_differences: Approximates first and second derivative with 3rd order finite differences. See d_dx and d2_dx2.
:splines: Curvature of a third order spline. See d_dx_and_d2_dx2_spline.
:circles: Inverse radius of a circle through rolling three points. See circle_curvature.
:breakpoints: WARNING does not compute a value that approximates the curvature of a continuous function. Computes the inverse least-squares error of f.(eachindex(y); z) and y for all z in eachindex(y) where f(x; z) = a + b(min(x, z) - z) + c(max(x, z) - z). Useful if y looks like two lines. See breakpoint_curvature.

These methods have been specialized to typical relative error vs. rank curves. We assume a relative error of $1$ at rank $0$ since the only rank $0$ tenor is the $0$ tenor. And that the relative error at the maximum possible rank is $0$, and stays zero if we try to use a larger rank.

How do these methods work?

Finite Differences

We approximate the first and second derivatives separately with three point finite differences and calculate the curvature with the formula

\[k(x) = \frac{y''(x)}{(1 + y'(x)^2)^{3/2}}.\]

The derivatives are approximated using centred three point finite differences,^[1]

\[y'(x_i) \approx \frac{1}{2\Delta x}(y_{i+1} - y_{i-1})\quad\text{and}\quad y''(x_i)\approx \frac{1}{\Delta x^2}(y_{i+1} - 2y_i + y_{i-1}).\]

The end-points use forward/back-differences. For the first derivative, this is

\[y'(x_1) \approx \frac{1}{2\Delta x}(-3y_{1} + 4y_2 - y_{3})\quad\text{and}\quad y'(x_I) \approx \frac{1}{2\Delta x}(-y_{I-2} + 4y_{I-1} - 3y_{I}),\]

and for the second derivative, this is

\[y''(x_1) \approx \frac{1}{\Delta x^2}(y_{1} - 2y_2 + y_{3})\quad\text{and}\quad y''(x_I) \approx \frac{1}{\Delta x^2}(y_{I-2} - 2y_{I-1} + y_{I}).\]

BlockTensorFactorization.Core.d_dx — Function

d_dx(y::AbstractVector{<:Real})

Approximate first derivative with finite elements.

$\frac{d}{dx}y[i] \approx \frac{1}{2\Delta x}(y[i+1] - y[i-1])$

Assumes y[i] = y(x[i]) are samples with unit spaced inputs Δx = x[i+1] - x[i] = 1.

BlockTensorFactorization.Core.d2_dx2 — Function

d2_dx2(y::AbstractVector{<:Real})

Approximate second derivative with finite elements.

$\frac{d^2}{dx^2}y[i] \approx \frac{1}{\Delta x^2}(y[i-1] - 2y[i] + y[i+1])$

Assumes y[i] = y(x[i]) are samples with unit spaced inputs Δx = x[i+1] - x[i] = 1.

Splines

We calculate a third order spline

\[g_i(x) = a_i(x-x_i)^3+b_i(x-x_i)^2+c_i(x-x_i)+d_i,\quad x_1 \leq x \leq x_{i+1}\]

and use the coefficients to calculate the curvature of the spline

\[k(x_i) = \frac{2b_i}{(1+c_i^2)^{3/2}}.\]

We assume the following boundary conditions;

$g_{1}(x_{1}-\Delta x)=1$, $y$-intercept $(0,1)$
$g_{I}(x_{I}+\Delta x)=y_{I}$, repeated right end-point ($y_{I+1}=y_{I}$)
$g_{I}''(x_{I}+\Delta x)=0$, flat right end-point,

in addition to the usual continuity and smoothness assumptions of a third order spline:

$g_{i}(x_{i})=y_{i}$
$g_{i}(x_{i+1})=g_{i+1}(x_{i+1})$
$g_{i}'(x_{i+1})=g_{i+1}'(x_{i+1})$
$g_{i}''(x_{i+1})=g_{i+1}''(x_{i+1})$.

We can solve for the coefficients by first solving the tri-diagonal system $Mb=v$, where

\[M=\begin{bmatrix} 6 & 0 & & & \\ 1 & 4 & 1 & & \\ & \ddots& \ddots& \ddots& \\ & & 1 & 4 & 1& \\ & & & 0 & 1 \end{bmatrix},\quad b =\begin{bmatrix} b_{1} \\ b_{2}\\ \vdots \\ b_{I} \\ b_{I+1} \end{bmatrix},\quad \text{and} \quad v = \frac{3}{\Delta x^2}\begin{bmatrix} 1-2y_{1}+y_{2} \\ y_{3}-y_{1} \\ \vdots \\ y_{I+1}-y_{I-1} \\ 0 \end{bmatrix},\]

and then calculate

\[a_{i} = \frac{1}{3\Delta x}(b_{i+1}-b_{i})\quad \text{and}\quad c_{i}=\frac{{y_{i+1}-y_{i}}}{\Delta x}-\frac{\Delta x}{3}(b_{i+1}-2b_{i}).\]

BlockTensorFactorization.Core.cubic_spline_coefficients — Function

cubic_spline_coefficients(y::AbstractVector{<:Real}; h=1)

Calculates the list of coefficients a, b, c, d for an interpolating spline of y[i]=y(x[i]).

The spline is defined as $f(x) = g_i(x)$ on $x[i] \leq x \leq x[i+1]$ where

$g_i(x) = a[i](x-x[i])^3 + b[i](x-x[i])^2 + c[i](x-x[i]) + d[i]$

Uses the following boundary conditions

$g_1(x[1]-h) = 1$ (i.e. the $y$-intercept is $(0,1)$ for uniform spaced x=1:I)
$g_I(x[I]+h) = y[I]$ (i.e. repeated right end-point)
$g_I''(x[I]+h) = 0$ (i.e. flat/no-curvature one spacing after end-point)

BlockTensorFactorization.Core.spline_mat — Function

spline_mat(n)

Creates the Tridiagonal matrix to solve for coefficients b. See cubic_spline_coefficients.

BlockTensorFactorization.Core.make_spline — Function

make_spline(y::AbstractVector{<:Real}; h=1)

Returns a function f(x) that is an interpolating/extrapolating spline for y, with uniform stepsize h between the x-values of the knots.

BlockTensorFactorization.Core.d_dx_and_d2_dx2_spline — Function

d_dx_and_d2_dx2_spline(y::AbstractVector{<:Real}; h=1)

Extracts the first and second derivatives of the splines from y at the knots

Circles

We compute the radius $r_i$ of a circle passing through three neighbouring points $(x_{i-1}, y_{i-1})$, $(x_{i}, y_{i})$, and $(x_{i+1}, y_{i+1})$. The curvature magnitude is approximately the inverse of this radius;

\[\lvert k(x_i)\rvert \approx \frac{1}{r_i}.\]

We assume two additional points so we can estimate the curvature at the boundary. They are

\[(x_{0}, y_{0}) = (0, 1)\quad\text{and}\quad (x_{I+1}, y_{I+1}) = (x_{I}+\Delta x, 0).\]

To obtain a the signed curvature $k(x_i)$, we check if the middle point $(x_{i}, y_{i})$ is above (negative curvature) or below (positive curvature) the line segment connecting $(x_{i-1}, y_{i-1})$ and $(x_{i+1}, y_{i+1})$.

BlockTensorFactorization.Core.three_point_circle — Function

three_point_circle((a,f),(b,g),(c,h))

Calculates radius r and center point (p, q) of the circle passing through the three points in the xy-plane.

Example

r, (p, q) = three_point_circle((1,2), (2,1), (5,2))
(r, (p, q)) == (√5, (3, 3))

BlockTensorFactorization.Core.circle_curvature — Function

circle_curvature(y::AbstractVector{<:Real}; h=1, estimate_endpoints=true)

Inverse radius of a the circle passing through each 3 adjacent points on y, (0,y[i-1]), (h,y[i]), and (2h,y[i+1]).

If estimate_endpoints=true, assumes the function that y comes from is 1 to the left of the given values, and 0 to the right. This is typical of relative error decay as a function of rank. If false, pads the boundary with the adjacent curvature.

See three_point_circle.

BlockTensorFactorization.Core.signed_circle_curvature — Function

signed_circle_curvature((a,f),(b,g),(c,h))

Signed inverse radius of the circle passing through the 3 points in the xy-plane.

See three_point_circle.

Two-line Breakpoint

This method does not approximate the curvature of a function passing through the points $\{(x_i, y_i)\}$, but picks the best rank based on the optimal breakpoint. The optimal breakpoint $r$ is the breakpoint of segmented linear model that minimizes the model error.^[2] This means

\[r = \argmin_{z \in \{1,\dots, R\}} \sum_{i=1}^I (f_z(x_i) - y_i)^2,\]

where

\[f_z(x; a_z, b_z, c_z) = a_z + b_z(\min(x,z) - z) + c_z(\max(x,z) - z)\]

and the coefficients $(a_z, b_z, c_z)$ are selected to minimize the error between the model and the data,

\[(a_z, b_z, c_z) = \argmin_{a,b,c\in\mathbb{R}} \sum_{i=1}^I (f_z(x_i;a,b,c) - y_i)^2.\]

These coefficients have the closed form solution

\[\begin{bmatrix} a_{z} \\ b_{z} \\ c_{z} \end{bmatrix} =(M^\top M)^{-1}M^\top \begin{bmatrix} y_{1} \\ \vdots \\ y_{I} \end{bmatrix},\quad\text{where}\quad M = \begin{bmatrix} 1 & \min(x_{1},z) - z&\max(x_{1},z) - z \\ \vdots & \vdots & \vdots\\ 1 & \min(x_{I},z) - z&\max(x_{I},z) - z \end{bmatrix}.\]

Geometrically, the breakpoint model is two half-infinite lines on $(-\infty, z]$ and $[z,\infty)$ that meet continuously at $(z,a)$ with slopes $b$ and $c$ respectively.

BlockTensorFactorization.Core.best_breakpoint — Function

best_breakpoint(xs, ys; breakpoints=xs)

Breakpoint z in breakpoints that minimizes the breakpoint_error.

BlockTensorFactorization.Core.breakpoint_error — Function

breakpoint_error(xs, ys, z)

Squared L2 error between the best breakpoint model (with breakpoint z) evaluated at xs and the data ys.

See breakpoint_model and breakpoint_model_coefficients.

BlockTensorFactorization.Core.breakpoint_model — Function

breakpoint_model(a, b, c, z)

Returns a function x -> a + b*(min(x, z) - z) + c*(max(x, z) - z).

BlockTensorFactorization.Core.breakpoint_model_coefficients — Function

breakpoint_model_coefficients(xs, ys, breakpoint)

Least squares fit data $(x_i, y_i)$

$\min_{a,b,c} 0.5\sum_{i} (f(x_i; a,b,c) - y_i)^2$

with the model

$f(x; a,b,c) = a + b(\min(x, z) - x) + c(\max(x, z) - x)$

for some fixed $z$.

BlockTensorFactorization.Core.breakpoint_curvature — Function

breakpoint_curvature(y)

This is a hacked way to fit the data y with a breakpoint model, which can be called by k = standard_curvature(...; model=:breakpoints)

This lets us call argmax(k) to get the breakpoint that minimizes the model error.

See breakpoint_model_coefficients.

1M. Abramowitz and I. A. Stegun, "Handbook of mathematical functions: with formulas, graphs and mathematical tables", Unabridged, Unaltered and corr. Republ. of the 1964 ed. in Dover books on advanced mathematics. New York: Dover publ, 1972. (Table 25.2, p. 914)
2J. E. Saylor, K. E. Sundell, and G. R. Sharman, "Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data," Earth and Planetary Science Letters, vol. 512, pp. 46–58, Apr. 2019, doi: 10.1016/j.epsl.2019.01.044. (Section 3.2)