Rank Estimation

Overview

In some applications, we may have a tenor we believe to be closely approximated by a low rank tenor, but do not know what rank to use. In this setting, use rank_detect_factorize rather than factorize without a rank=R keyword. The estimated rank will be added to kwargs, and the decomposition at this rank returned.

R = 3
T = Tucker1((10, 10, 10), R)
Y = array(T)
options = (model=Tucker1, curvature_method=splines) # default values
decomposition, stats, kwargs, final_rel_errors = rank_detect_factorize(Y; options...)

@assert kwargs[:rank] == R

The rank will be estimated by factorizing the input tensor at every possible rank from 1 up to the maximum rank and recording the relative error. We can pick the rank which balances small error (good fit) and small rank (low complexity). Our criteria is the rank which maximizes the curvature of the error vs. rank function, or in the case of the :breakpoints method, a proxy for the curvature.

Note rank_detect_factorize also returns the stats and final relative errors of each rank tested.

BlockTensorFactorization.Core.rank_detect_factorizeFunction
rank_detect_factorize(Y; kwargs...)

Wraps factorize() with rank detection.

Selects the rank that maximizes the standard curvature of the Relative Error (as a function of rank).

Keywords

  • online_rank_estimation: false. Set to true to stop testing larger ranks after the first peak in curvature
  • curvature_method: :splines. Can also pick :finite_differences (faster but less accurate) or circles (fastest and smallest memory but more sensitive to results from factorize). Set to :breakpoints to pick the rank R that minimizes least-squares error in the model f(r) = a + b(min(r, R) - R) + c(max(r, R) - R) and the errors.
  • model: Tucker1. Only rank detection with Tucker1 and CPDecomposition is currently implemented
  • max_rank: max_possible_rank(Y, model). Test ranks from 1 up to max_rank. Defaults to largest possible rank under the model
  • rank: nothing. If a rank is passed, rank detection is ignored and factorize(Y; kwargs...) is called

Any other keywords from factorize, full list given by default_kwargs.

source
BlockTensorFactorization.Core.max_possible_rankFunction
max_possible_rank(Y, model)

Returns the maximum rank possible Y could have under the model.

For matrices I × J this is 1:min(I, J). This is can be extended to tensors for different type of decompositions.

Tucker-1 rank is ≤ min(I, prod(J1,...,JN)) for tensors I × J1 × … × JN.

The CP-rank is ≤ minimum_{n} (prod(I1,...,IN) / In) for tensors I1 × … × IN in general. Although some shapes have have tighter upper bounds. For example, 2 × I × I tensors over ℝ have a maximum rank of floor(3I/2).

source
Tip

If you run rank_detect_factorize with different methods and get the same rank, you can feel more confident in the detected rank.

Estimating Curvature

This package has implemented a few methods for estimating curvature. These are the following list of methods.

:finite_differences
:circles
:splines

We can also use the following curvature-proxy methods.

:breakpoints

They can be passed to curvature and standard_curvature as a method keyword, or rank_detect_factorize with the curvature_method keyword.

BlockTensorFactorization.Core.curvatureFunction
curvature(y::AbstractVector{<:Real}; method=:finite_differences)

Approximates the signed curvature of a function given evenly spaced samples.

Possible methods

  • :finite_differences: Approximates first and second derivative with 3rd order finite differences. See d_dx and d2_dx2.
  • :splines: Curvature of a third order spline. See d_dx_and_d2_dx2_spline.
  • :circles: Inverse radius of a circle through rolling three points. See circle_curvature.
  • :breakpoints: WARNING does not compute a value that approximates the curvature of a continuous function. Computes the inverse least-squares error of f.(eachindex(y); z) and y for all z in eachindex(y) where f(x; z) = a + b(min(x, z) - z) + c(max(x, z) - z). Useful if y looks like two lines. See breakpoint_curvature.
source
BlockTensorFactorization.Core.standard_curvatureFunction
standard_curvature(y::AbstractVector{<:Real}; method=:finite_differences)

Approximates the signed curvature of a function, scaled to the unit box $[0,1]^2$.

Assumes the function is 1 at 0 and (after x dimension is scaled) 0 at 1.

See curvature.

Possible methods

  • :finite_differences: Approximates first and second derivative with 3rd order finite differences. See d_dx and d2_dx2.
  • :splines: Curvature of a third order spline. See d_dx_and_d2_dx2_spline.
  • :circles: Inverse radius of a circle through rolling three points. See circle_curvature.
  • :breakpoints: WARNING does not compute a value that approximates the curvature of a continuous function. Computes the inverse least-squares error of f.(eachindex(y); z) and y for all z in eachindex(y) where f(x; z) = a + b(min(x, z) - z) + c(max(x, z) - z). Useful if y looks like two lines. See breakpoint_curvature.
source

These methods have been specialized to typical relative error vs. rank curves. We assume a relative error of $1$ at rank $0$ since the only rank $0$ tenor is the $0$ tenor. And that the relative error at the maximum possible rank is $0$, and stays zero if we try to use a larger rank.

How do these methods work?

Finite Differences

We approximate the first and second derivatives separately with three point finite differences and calculate the curvature with the formula

\[k(x) = \frac{y''(x)}{(1 + y'(x)^2)^{3/2}}.\]

The derivatives are approximated using centred three point finite differences,[1]

\[y'(x_i) \approx \frac{1}{2\Delta x}(y_{i+1} - y_{i-1})\quad\text{and}\quad y''(x_i)\approx \frac{1}{\Delta x^2}(y_{i+1} - 2y_i + y_{i-1}).\]

The end-points use forward/back-differences. For the first derivative, this is

\[y'(x_1) \approx \frac{1}{2\Delta x}(-3y_{1} + 4y_2 - y_{3})\quad\text{and}\quad y'(x_I) \approx \frac{1}{2\Delta x}(-y_{I-2} + 4y_{I-1} - 3y_{I}),\]

and for the second derivative, this is

\[y''(x_1) \approx \frac{1}{\Delta x^2}(y_{1} - 2y_2 + y_{3})\quad\text{and}\quad y''(x_I) \approx \frac{1}{\Delta x^2}(y_{I-2} - 2y_{I-1} + y_{I}).\]

BlockTensorFactorization.Core.d_dxFunction
d_dx(y::AbstractVector{<:Real})

Approximate first derivative with finite elements.

$\frac{d}{dx}y[i] \approx \frac{1}{2\Delta x}(y[i+1] - y[i-1])$

Assumes y[i] = y(x[i]) are samples with unit spaced inputs Δx = x[i+1] - x[i] = 1.

source
BlockTensorFactorization.Core.d2_dx2Function
d2_dx2(y::AbstractVector{<:Real})

Approximate second derivative with finite elements.

$\frac{d^2}{dx^2}y[i] \approx \frac{1}{\Delta x^2}(y[i-1] - 2y[i] + y[i+1])$

Assumes y[i] = y(x[i]) are samples with unit spaced inputs Δx = x[i+1] - x[i] = 1.

source

Splines

We calculate a third order spline

\[g_i(x) = a_i(x-x_i)^3+b_i(x-x_i)^2+c_i(x-x_i)+d_i,\quad x_1 \leq x \leq x_{i+1}\]

and use the coefficients to calculate the curvature of the spline

\[k(x_i) = \frac{2b_i}{(1+c_i^2)^{3/2}}.\]

We assume the following boundary conditions;

  1. $g_{1}(x_{1}-\Delta x)=1$, $y$-intercept $(0,1)$
  2. $g_{I}(x_{I}+\Delta x)=y_{I}$, repeated right end-point ($y_{I+1}=y_{I}$)
  3. $g_{I}''(x_{I}+\Delta x)=0$, flat right end-point,

in addition to the usual continuity and smoothness assumptions of a third order spline:

  1. $g_{i}(x_{i})=y_{i}$
  2. $g_{i}(x_{i+1})=g_{i+1}(x_{i+1})$
  3. $g_{i}'(x_{i+1})=g_{i+1}'(x_{i+1})$
  4. $g_{i}''(x_{i+1})=g_{i+1}''(x_{i+1})$.

We can solve for the coefficients by first solving the tri-diagonal system $Mb=v$, where

\[M=\begin{bmatrix} 6 & 0 & & & \\ 1 & 4 & 1 & & \\ & \ddots& \ddots& \ddots& \\ & & 1 & 4 & 1& \\ & & & 0 & 1 \end{bmatrix},\quad b =\begin{bmatrix} b_{1} \\ b_{2}\\ \vdots \\ b_{I} \\ b_{I+1} \end{bmatrix},\quad \text{and} \quad v = \frac{3}{\Delta x^2}\begin{bmatrix} 1-2y_{1}+y_{2} \\ y_{3}-y_{1} \\ \vdots \\ y_{I+1}-y_{I-1} \\ 0 \end{bmatrix},\]

and then calculate

\[a_{i} = \frac{1}{3\Delta x}(b_{i+1}-b_{i})\quad \text{and}\quad c_{i}=\frac{{y_{i+1}-y_{i}}}{\Delta x}-\frac{\Delta x}{3}(b_{i+1}-2b_{i}).\]

BlockTensorFactorization.Core.cubic_spline_coefficientsFunction
cubic_spline_coefficients(y::AbstractVector{<:Real}; h=1)

Calculates the list of coefficients a, b, c, d for an interpolating spline of y[i]=y(x[i]).

The spline is defined as $f(x) = g_i(x)$ on $x[i] \leq x \leq x[i+1]$ where

$g_i(x) = a[i](x-x[i])^3 + b[i](x-x[i])^2 + c[i](x-x[i]) + d[i]$

Uses the following boundary conditions

  • $g_1(x[1]-h) = 1$ (i.e. the $y$-intercept is $(0,1)$ for uniform spaced x=1:I)
  • $g_I(x[I]+h) = y[I]$ (i.e. repeated right end-point)
  • $g_I''(x[I]+h) = 0$ (i.e. flat/no-curvature one spacing after end-point)
source
BlockTensorFactorization.Core.make_splineFunction
make_spline(y::AbstractVector{<:Real}; h=1)

Returns a function f(x) that is an interpolating/extrapolating spline for y, with uniform stepsize h between the x-values of the knots.

source

Circles

We compute the radius $r_i$ of a circle passing through three neighbouring points $(x_{i-1}, y_{i-1})$, $(x_{i}, y_{i})$, and $(x_{i+1}, y_{i+1})$. The curvature magnitude is approximately the inverse of this radius;

\[\lvert k(x_i)\rvert \approx \frac{1}{r_i}.\]

We assume two additional points so we can estimate the curvature at the boundary. They are

\[(x_{0}, y_{0}) = (0, 1)\quad\text{and}\quad (x_{I+1}, y_{I+1}) = (x_{I}+\Delta x, 0).\]

To obtain a the signed curvature $k(x_i)$, we check if the middle point $(x_{i}, y_{i})$ is above (negative curvature) or below (positive curvature) the line segment connecting $(x_{i-1}, y_{i-1})$ and $(x_{i+1}, y_{i+1})$.

BlockTensorFactorization.Core.three_point_circleFunction
three_point_circle((a,f),(b,g),(c,h))

Calculates radius r and center point (p, q) of the circle passing through the three points in the xy-plane.

Example

r, (p, q) = three_point_circle((1,2), (2,1), (5,2))
(r, (p, q)) == (√5, (3, 3))
source
BlockTensorFactorization.Core.circle_curvatureFunction
circle_curvature(y::AbstractVector{<:Real}; h=1, estimate_endpoints=true)

Inverse radius of a the circle passing through each 3 adjacent points on y, (0,y[i-1]), (h,y[i]), and (2h,y[i+1]).

If estimate_endpoints=true, assumes the function that y comes from is 1 to the left of the given values, and 0 to the right. This is typical of relative error decay as a function of rank. If false, pads the boundary with the adjacent curvature.

See three_point_circle.

source

Two-line Breakpoint

This method does not approximate the curvature of a function passing through the points $\{(x_i, y_i)\}$, but picks the best rank based on the optimal breakpoint. The optimal breakpoint $r$ is the breakpoint of segmented linear model that minimizes the model error.[2] This means

\[r = \argmin_{z \in \{1,\dots, R\}} \sum_{i=1}^I (f_z(x_i) - y_i)^2,\]

where

\[f_z(x; a_z, b_z, c_z) = a_z + b_z(\min(x,z) - z) + c_z(\max(x,z) - z)\]

and the coefficients $(a_z, b_z, c_z)$ are selected to minimize the error between the model and the data,

\[(a_z, b_z, c_z) = \argmin_{a,b,c\in\mathbb{R}} \sum_{i=1}^I (f_z(x_i;a,b,c) - y_i)^2.\]

These coefficients have the closed form solution

\[\begin{bmatrix} a_{z} \\ b_{z} \\ c_{z} \end{bmatrix} =(M^\top M)^{-1}M^\top \begin{bmatrix} y_{1} \\ \vdots \\ y_{I} \end{bmatrix},\quad\text{where}\quad M = \begin{bmatrix} 1 & \min(x_{1},z) - z&\max(x_{1},z) - z \\ \vdots & \vdots & \vdots\\ 1 & \min(x_{I},z) - z&\max(x_{I},z) - z \end{bmatrix}.\]

Geometrically, the breakpoint model is two half-infinite lines on $(-\infty, z]$ and $[z,\infty)$ that meet continuously at $(z,a)$ with slopes $b$ and $c$ respectively.

  • 1M. Abramowitz and I. A. Stegun, "Handbook of mathematical functions: with formulas, graphs and mathematical tables", Unabridged, Unaltered and corr. Republ. of the 1964 ed. in Dover books on advanced mathematics. New York: Dover publ, 1972. (Table 25.2, p. 914)
  • 2J. E. Saylor, K. E. Sundell, and G. R. Sharman, "Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data," Earth and Planetary Science Letters, vol. 512, pp. 46–58, Apr. 2019, doi: 10.1016/j.epsl.2019.01.044. (Section 3.2)