Every content here is my original work.Creative Commons Licence
Universitas Scripta is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
Your cookies may be used by Google and mathjax. See Google's privacy policy.

Monday, 3 June 2024

Intersections and Proximities of p-planes.

 In \(D\)-dimensions, \(p_1\)-plane and \(p_2\)-plane has intersection of \((D-p_1-p_2)\)-plane at least or with more dimensions. If \(D-p_1-p_2 <0\), they can have 0 to \(\mathrm{min}\{p_1,p_2\}\) dimension proximity object or intersections. (i.e. no intersection guaranteed). This is because parameter of each dimension of plane gives constraint to the \(D\)-dimensional coordinate equation.

The two objects may be written by

\[\vec{x}(a) = \vec{v}_0 + \sum_{k=1}^{p_1}a_k \vec{v}_k,\]

\[\vec{y}(b) = \vec{w}_0 + \sum_{k=1}^{p_2}b_k \vec{w}_k.\]


The proximity or intersection maybe given by

\[\mathrm{min}\left|\vec{x} - \vec{y}\right|,\]

which may be represented by this extrema equation:

\[\begin{cases}
0=\frac{\partial}{\partial a_k}\left|\vec{x} - \vec{y}\right|^2 = 2\left(\vec{x} - \vec{y}\right)\cdot\frac{\partial}{\partial a_k}\left(\vec{x} - \vec{y}\right)= 2\left(\vec{x} - \vec{y}\right)\cdot\frac{\partial \vec{x}}{\partial a_k}
\\
0\frac{\partial}{\partial b_k}\left|\vec{x} - \vec{y}\right|^2 = 2\left(\vec{x} - \vec{y}\right)\cdot\frac{\partial}{\partial b_k}\left(\vec{x} - \vec{y}\right) = -2\left(\vec{x} - \vec{y}\right)\cdot\frac{\partial \vec{y}}{\partial b_k}
\end{cases}\]

\[\Rightarrow\begin{cases}
0=\left(\vec{x} - \vec{y}\right)\cdot\frac{\partial \vec{x}}{\partial a_k}=\vec{v}_k\cdot\left(\vec{x} - \vec{y}\right)
\\
0=\left(\vec{x} - \vec{y}\right)\cdot\frac{\partial \vec{y}}{\partial b_k}=\vec{w}_k\cdot\left(\vec{x} - \vec{y}\right)
\end{cases}\]

\[\Rightarrow\begin{cases}
0=\vec{v}_k\cdot\left(\vec{x} - \vec{y}\right) =\vec{v}_k \cdot\left( \vec{v}_0 - \vec{w}_0\right) + \sum_{l=1}^{p_1}\vec{v}_k\cdot\vec{v}_l a_l - \sum_{l=1}^{p_1}\vec{v}_k\cdot\vec{w}_l b_l
\\
0=-\vec{w}_k\cdot\left(\vec{x} - \vec{y}\right) =-\vec{w}_k \cdot\left( \vec{v}_0 - \vec{w}_0\right) - \sum_{l=1}^{p_2}\vec{w}_k\cdot\vec{v}_l a_l+ \sum_{l=1}^{p_2}\vec{w}_k\cdot\vec{w}_l b_l
\end{cases}\]

The equations may be re-written by

\[\begin{pmatrix}*&*&*\\\vec{v}_k\cdot\left( \vec{v}_0 - \vec{w}_0\right)&\vec{v}_k\cdot\vec{v}_l&-\vec{v}_k\cdot\vec{w}_l \\-\vec{w}_k\cdot\left( \vec{v}_0 - \vec{w}_0\right)&-\vec{w}_k\cdot\vec{v}_l&\vec{w}_k\cdot\vec{w}_l\end{pmatrix}\begin{pmatrix}1 \\ a_l \\ b_l \end{pmatrix}=\begin{pmatrix}* \\ 0 \\ 0 \end{pmatrix}\]

or

\[\begin{pmatrix}\vec{v}_k\cdot\vec{v}_l&-\vec{v}_k\cdot\vec{w}_l \\-\vec{w}_k\cdot\vec{v}_l&\vec{w}_k\cdot\vec{w}_l\end{pmatrix}\begin{pmatrix}a_l \\ b_l \end{pmatrix}=\begin{pmatrix} -\vec{v}_k\cdot\left( \vec{v}_0 - \vec{w}_0\right) \\ \vec{w}_k\cdot\left( \vec{v}_0 - \vec{w}_0\right) \end{pmatrix}\]

With convention of column vector \(\vec{v}\)'s, and 

\[{\mathbf{m}_0}_{(D \times 1)} := \vec{v}_0-\vec{w}_0 ,\quad \mathbf{M}_{(D \times (p_1 + p_2))} := \begin{pmatrix} \vec{v}_k & -\vec{w}_k \end{pmatrix},\quad \mathbf{a}_{((p_1 + p_2 )\times 1)}=\begin{pmatrix}a_k \\ b_k \end{pmatrix},\]

the equations will be re-written by

\[ (\mathbf{M}^T \mathbf{M}) \mathbf{a}= -\mathbf{M}^T \mathbf{m}_0. \]

So, the solution of parameters is

\[\mathbf{a}= -(\mathbf{M}^T \mathbf{M})^{-1} \mathbf{M}^T \mathbf{m}_0\]

unless \(\mathbf{M}^T \mathbf{M}\) is singular.


If singular, to separate singular (freed) and non-singular (constraint) components, we need to diagonalize it. We may try SVD to avoid catastrophic result.

\[(\mathbf{M}^T \mathbf{M}) =: \mathbf{U}\mathbf{\Sigma} \mathbf{V}^T\]

where \(U\) and \(V\) are orthogonal (Hermitian) and \(\Sigma\) is a nonnegative diagonal matrix.

The parameters will be rotated by the orthogonal matrices to diagonalize the matrix to see the free variable.

\[\mathbf{U}\mathbf{\Sigma} \mathbf{V}^T\mathbf{a}= - \mathbf{M}^T \mathbf{m}_0 \;\Rightarrow \;\mathbf{\Sigma} \mathbf{V}^T\mathbf{a}= -\mathbf{U}^T \mathbf{M}^T \mathbf{m}_0=: \begin{pmatrix}m_r \\ 0\end{pmatrix}\]

where in the rows with the zero values in \(\Sigma\), the vector component is also zeros, so the solution for that row is indefinite. We know indefinite because it is zero divide by zero.

\[ \mathbf{V}^T\mathbf{a}=\mathbf{\Sigma}^{-1} \begin{pmatrix}m_r \\ z\end{pmatrix} = \begin{pmatrix}\Sigma_r^{-1} m_r \\ 0^{-1} \cdot 0\end{pmatrix}\].

Now, substitute the indefinite numbers to free parameters \(z_i\).

\[ \mathbf{V}^T\mathbf{a}= \begin{pmatrix}\Sigma_r^{-1} m_r \\ z_i\end{pmatrix}\;\Rightarrow \;\mathbf{a}= \mathbf{V}\begin{pmatrix}\Sigma_r^{-1} m_r \\ z_i\end{pmatrix}.\]


Also, we can find the shortest distance using the equation and the solution. The squared distance is

\[\left|\vec{x} - \vec{y}\right|^2 = \left| \mathbf{m}_0 + \mathbf{M} \mathbf{a}\right|^2 = \mathbf{m}_0^T \mathbf{m}_0 +2 \mathbf{a}^T \mathbf{M}^T \mathbf{m}_0  +\mathbf{a}^T \mathbf{M}^T\mathbf{M}\mathbf{a} \]

\[= \mathbf{m}_0^T \mathbf{m}_0 + \mathbf{a}^T \mathbf{M}^T \mathbf{m}_0  = \mathbf{m}_0^T \left(\mathbf{I} - \mathbf{M}(\mathbf{M}^T\mathbf{M})^{-1} \mathbf{M}^T\right)\mathbf{m}_0\]

Saturday, 25 May 2024

Point cloud Similarities

 Point to distribution:

\(\vec{p}=(p_x, p_y) \Rightarrow P(x,y)= \frac{1}{2\pi \sigma_x \sigma_y} \exp\left\{-\frac{x^2}{2\sigma_x^2}-\frac{y^2}{2\sigma_y^2}\right\}\)

\(\vec{p}=\sum_{i=1}^D p^i \hat{e}_i  \Rightarrow P(x^i)= (2\pi)^{-D/2}\left(\prod_{i=1}^D \sigma_{x^i}^{-1}\right) \exp\left\{-\sum_{i=1}^D \frac{{(x^i)}^2}{2{(\sigma_{x^i})}^2}\right\}\)

Correlation Function for between one point distributions:

\(P(x^i; x_0^i) := P(x^i - x_0^i)\)

\(\left<P(x^i)P(x^i; x_0^i)\right> = {(4\pi)}^{-D/2} \left(\prod_{i=1}^D \sigma_{x^i}^{-1}\right) \exp\left\{-\sum_{i=1}^D \frac{{(x_0^i)}^2}{4{(\sigma_{x^i})}^2}\right\} \)

For two point clouds \(\{\vec{p}_i\}\) and \(\{\vec{q}_i\}\),

\(P(x^i;\{\vec{p}_j\}) = \sum_j P(x^i; p_j^i)\)

The correlation of two point clouds is

\(C(\{\vec{p}_j\}, \{\vec{q}_j\}) = \frac{\left<P(x^i;\{\vec{p}_j\})P(x^i;\{\vec{q}_j\}) \right>}{\sqrt{\left<P(x^i;\{\vec{p}_j\}) P(x^i;\{\vec{p}_j\}) \right>\left<P(x^i;\{\vec{q}_j\}) P(x^i;\{\vec{q}_j\}) \right>}}\)

We want to write \(\left<P(x^i;\{\vec{p}_j\}) P(x^i;\{\vec{q}_j\}) \right>\) with \(P(x^i;\{\vec{p}_j\}) = \sum_j P(x^i, p_j^i)\).

\(C'(\{\vec{p}_j\}, \{\vec{q}_j\}):=\left<P(x^i;\{\vec{p}_j\}) P(x^i;\{\vec{q}_j\}) \right>\)
\(\qquad \qquad\qquad = \left< \sum_{j,k} P(x^i; p_j^i) P(x^i; q_k^i) \right>\)
\(\qquad \qquad\qquad \sim\sum_{j,k} \exp\left\{-\sum_{i=1}^D \frac{{(p_j^i - q_k^i)}^2}{4{(\sigma_{x^i})}^2}\right\} \)

To maximize \(C'(\{\vec{p}_j\}, \{\vec{q}_j\})\) by shift of \(q\), we need to \(\frac{d}{d\Delta q^i}C'(\{\vec{p}_j\}, \{\vec{q}_j + \Delta\vec{q}\})\).

\[\frac{d}{d\Delta q^i}C'(\{\vec{p}_j\}, \{\vec{q}_j + \Delta\vec{q}\}) \sim \left. \frac{d}{d\Delta q^i}\sum_{j,k} \exp\left\{-\sum_{i=1}^D \frac{{(p_j^i - q_k^i - \Delta q^i)}^2}{4{(\sigma_{x^i})}^2}\right\} \right|_{\Delta q = 0} \]
\[=\left. -\frac{\sigma_{x^i}^2}{2}\sum_{j,k}(p_j^i - q_k^i - \Delta q^i) \exp\left\{-\sum_{i=1}^D \frac{{(p_j^i - q_k^i - \Delta q^i)}^2}{4{(\sigma_{x^i})}^2}\right\}\right|_{\Delta q = 0} \]
\[= -\frac{\sigma_{x^i}^2}{2}\sum_{j,k}(p_j^i - q_k^i) \exp\left\{-\sum_{i=1}^D \frac{{(p_j^i - q_k^i )}^2}{4{(\sigma_{x^i})}^2}\right\} \]

In the same sense,

\(
\left.\frac{d^2}{{(d\Delta q^i)}^2}C'(\{\vec{p}_j\}, \{\vec{q}_j + \Delta\vec{q}\})\right|_{\Delta q = 0} \)
\[\sim
 \sum_{j,k}\left[\frac{\sigma_{x^i}^4 (p_j^i - q_k^i)^2}{4} + \frac{\sigma_{x^i}^2}{2}\right] \exp\left\{-\sum_{i=1}^D \frac{{(p_j^i - q_k^i )}^2}{4{(\sigma_{x^i})}^2}\right\}
\]

We can do Newton's method using these 1st and 2nd derivatives.

Saturday, 18 May 2024

Interpolation function training

Ideal interpolation kernel is sinc function, which consists of all frequencies upto Nyquist frequency. However, this involves infinite-dimension convolution in discrete calculation, so we apply window function. In signal procession, thus, interpolation kernel is windowed-sinc function [https://www.analog.com/media/en/technical-documentation/dsp-book/dsp_book_Ch16.pdf], and the window function can vary to achieve (1) to narrow the window width to reduce calculation, (2) and in the same time to increase accuracy and reduce artifacts. We have to satisfy these two contradictory conditions, so we have to compromise and optimise conditions.

    The kernel generally must satisfy f(0) = 1 and f(n != 0 but integer) = 0 and additionally I force to satisfy to be continuous function, to avoid catastrophic artifacts. Generic window function may be written in Fourier series, (cosine if symmetric) [https://en.wikipedia.org/wiki/Window_function] 

\[ \displaystyle
w(t)=
\begin{cases}
 \sum_{k=0}^{N}a_k \cos k \pi t  + \sum_{k=1}^{N} b_k \sin k\pi t\quad\text{if  \(t\in[-1,1]\)}
\\
0\quad\text{else}
\end{cases} \]

with absolute constraints

\[\textbf{Constraint 1: }\quad w(0) = \sum_{k=0}^{N} a_k = 1,\]

\[\textbf{Constraint 2: }\quad w(1) =\sum_{k=0}^{N}(-1)^k a_k = 0, \]

assuming the window width to be [-1, 1] (to use in practice, scale this in \(t\)-direction) .


Projection of the window parameters \(\{a_k, b_k\}\) to the constraints.

    From \(\{a_k, b_k\}\) vector space, we have two constraints to absolutely be satisfied. Analogously to this situation: we must project a point to two planes simultaneosly, thus making it on the line, the intersection of the two planes; The projection must move perperdicular to either plane. We can develop a training process only moving on the constraints, but because of calculation errors, we have to develop a way to project on the constraints anyway. The training process to optimise the kernel to the given big data of audio may correspond to the loss optimization process of this \((2N+1)\)-dimensional point moving to the minima or maxima.

    Let us concretise the projection idea more. The \((2N+1)\)-dimensional point always satisfy 2 constraints above, so the valid space satisfying the constraints is \((2N-1)\) dimensions. In the quotient space, the kernel corresponding to a certain point must be \(2\)-dimensional, which passes the given point and is parallel to the two perpendicular vectors of the two constraint \(2N\)-plane. The intersection between the proper \((2N-1)\) dimensions and the 2D-kernel is the projected point we want.

    The two perpendicular vectors is the same as the coefficients of the equations of the planes, so the 2D kernel including a point \(\{a'_k, b'_k\}\) is written by 

\[ \{a_k, b_k\}  = \{a'_k, b'_k\} + \{1, 0\} x + \{(-1)^k, 0\} y \]

To obtain the intersection of this kernel to the two constraints \(w(0), w(1)\), we may substitute the  \(\{a_k, b_k\}\) expression of the kernel to the equation of each constraint.

\[ w(0) = \sum_{k=0}^{N} (a'_k + x + (-1)^k y) = 1 \]

\[ w(1) = \sum_{k=0}^{N} (-1)^k(a'_k + x + (-1)^k y) = 0 \]

From these equations, we have to obtain \(x\) and \(y\) to obtain how much the intersection have moved inside the kernel space from the original point. Simplified,

\[ w(0) =(N+1) x + \frac{1 + (-1)^N}{2}y + \sum_{k=0}^{N} a'_k  = 1 \]

\[ w(1) = \frac{1 + (-1)^N}{2} x + (N+1)y + \sum_{k=0}^{N} (-1)^k a'_k = 0 \]

More simplified,

\[
\begin{pmatrix}
(N+1) & \frac{1 + (-1)^N}{2}
\\ \frac{1 + (-1)^N}{2} & (N+1)
\end{pmatrix}
\begin{pmatrix} x\\y \end{pmatrix}
=
\begin{pmatrix}
1-\sum_{k=0}^{N} a'_k 
\\
-\sum_{k=0}^{N} (-1)^k a'_k 
\end{pmatrix}
\]

\[
\Rightarrow
\begin{pmatrix} x\\y \end{pmatrix}
=
\frac{1}{(N+1)^2 - \frac{1 + (-1)^N}{2}}
\begin{pmatrix}
(N+1) & -\frac{1 + (-1)^N}{2}
\\ -\frac{1 + (-1)^N}{2} & (N+1)
\end{pmatrix}
\begin{pmatrix}
1-\sum_{k=0}^{N} a'_k 
\\
-\sum_{k=0}^{N} (-1)^k a'_k 
\end{pmatrix}
\]

\[\Rightarrow
\begin{pmatrix} x+y\\x-y \end{pmatrix}=
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} \mathrm{diag}\{1, -1\}
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\begin{pmatrix}
1-\sum_{k=0}^{N} a'_k -\sum_{k=0}^{N} (-1)^k a'_k 
\\
1-\sum_{k=0}^{N} a'_k +\sum_{k=0}^{N} (-1)^k a'_k 
\end{pmatrix}
\]

\[\Rightarrow
\begin{pmatrix} x+y\\x-y \end{pmatrix}=
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} \mathrm{diag}\{1, -1\}
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\begin{pmatrix}
1-2\sum_{k=0}^{N} \frac{1 + (-1)^k}{2} a'_k 
\\
1-2\sum_{k=0}^{N} \frac{1 + (-1)^{k+1}}{2} a'_k 
\end{pmatrix}
\]

Thus, the projected point \(\{a_k, b_k\}\) from the original point \(\{a'_k, b'_k\}\) is

\[
\begin{cases}
a_k = a'_k + x + (-1)^k y,\quad k\in[0,N],
\\
b_k = b'_k,\quad k\in[1,N].
\end{cases}
\]

\[
\displaystyle
\Rightarrow
\begin{cases}
a_k = a'_k +
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} (-1)^k
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\left(
1 -2\sum_{m=0}^{N}\frac{1+ (-1)^{k+m}}{2} a'_m 
\right)
\\
b_k = b'_k,\quad k\in[1,N].
\end{cases}
\]

    Thus, here we may define projection operator \(P\) s.t.

\[
\displaystyle
\begin{cases}
P(a_k) := a_k +
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} (-1)^k
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\left(
1 -2\sum_{m=0}^{N}\frac{1+ (-1)^{k+m}}{2} a_m 
\right)
\\
P(b_k) := b_k,\quad k\in[1,N].
\end{cases}
\]


Gradient vector projection to the constraints

Now, as doing projection each iteration of training is inefficient, we want to develop to project each step only on the constraints.

    From the original valid point \(\{a^0_k, b^0_k\}\) s.t. \(P(a^0_k\) = a^0_k, we want to consider new train step \(\{a^0_k + \Delta a_k, b^0_k + \Delta b_k\}\) where \(\Delta a_k, \Delta b_k\) may not be determined with constraints but only gradient of loss. Then, we must project \(P(a^0_k + \Delta a_k)\) to finalize the step.

\[
\small
\begin{aligned}
P(a^0_k+\Delta a_k) &= 
a^0_k +\Delta a_k+
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} (-1)^k
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\left(
1 -2\sum_{m=0}^{N}\frac{1+ (-1)^{k+m}}{2} (a^0_m + \Delta a_m)
\right)
\\ &= P(a^0_k) + \Delta a_k
-2
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} (-1)^k
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\sum_{m=0}^{N}\frac{1+ (-1)^{k+m}}{2} \Delta a_m
\end{aligned}
\]

    Thus, here we define the new projection operator \(\Delta P\) for the differentiation vector of the coefficient point:

\[
\Delta P(\Delta a_k) :=  \Delta a_k
-2
\dfrac{
  (N+1) -\frac{1+(-1)^N}{2} (-1)^k
}{(N+1)^2 - \frac{1+(-1)^N}{2}}
\sum_{m=0}^{N}\frac{1+ (-1)^{k+m}}{2} \Delta a_m
\]


Pseudocodes

\(P\) and \(\Delta P\) can be written in PyTorch codes: