Content
Signal models
In this chapter, the model that will be used is one that represents a signal as the output of a causal linear shift-invariant filter that has a rational system function of the form
H
(
z
)
=
B
q
(
z
)
A
p
(
z
)
=
∑
k
=
0
q
b
q
(
k
)
z
−
k
1
+
∑
k
=
1
p
a
p
(
k
)
z
−
k
H(z)=\frac{B_{q}(z)}{A_{p}(z)}=\frac{\sum_{k=0}^{q} b_{q}(k) z^{-k}}{1+\sum_{k=1}^{p} a_{p}(k) z^{-k}}
H(z)=Ap(z)Bq(z)=1+∑k=1pap(k)z−k∑k=0qbq(k)z−k
Model identification
For deterministic signals, given observations
x
[
n
]
,
n
=
0
,
⋅
⋅
⋅
,
N
−
1
x[n], n = 0, · · · ,N − 1
x[n],n=0,⋅⋅⋅,N−1 and filter order
p
,
q
p, q
p,q, find the parameters of
H
(
z
)
H(z)
H(z) such that the modeled output signal
x
^
[
n
]
=
h
[
n
]
\hat x[n] = h[n]
x^[n]=h[n] best matches the observations.
For stochastic signals, we will try to match the correlation sequences:
r
x
[
k
]
=
r
h
[
k
]
r_x [k] = r_h[k]
rx[k]=rh[k].
Determine models: Least Squares
Minimizing the error with Least squares criterion:
E
L
S
=
∑
n
=
0
∞
∣
x
[
n
]
−
h
[
n
]
∣
2
\mathcal E_{LS}=\sum_{n=0}^\infty |x[n]-h[n]|^2
ELS=n=0∑∞∣x[n]−h[n]∣2
Using Parseval’s theorem, the least square error may be written in the frequency domain:
E
L
S
=
1
2
π
∫
−
π
π
∣
X
(
e
j
ω
)
−
H
(
e
j
ω
)
∣
2
d
ω
\mathcal E_{LS}=\frac{1}{2\pi}\int_{-\pi}^{\pi}|X(e^{j\omega})-H(e^{j\omega})|^2d\omega
ELS=2π1∫−ππ∣X(ejω)−H(ejω)∣2dω
This representation allows us to express
E
L
S
\mathcal{E}_{L S}
ELS explicitly in terms of the model parameters
a
p
(
k
)
a_p(k)
ap(k) and
b
q
(
k
)
b_q(k)
bq(k):
{
∂
E
L
S
∂
a
∗
[
k
]
=
0
,
k
=
1
,
⋯
,
p
∂
E
L
S
∂
b
∗
[
k
]
=
0
,
k
=
0
,
⋯
,
q
\left\{\begin{array}{l} \frac{\partial \mathcal{E}_{L S}}{\partial a^{*}[k]}=0, \quad k=1, \cdots, p \\ \frac{\partial \mathcal{E}_{L S}}{\partial b^{*}[k]}=0, \quad k=0, \cdots, q \end{array}\right.
{∂a∗[k]∂ELS=0,k=1,⋯,p∂b∗[k]∂ELS=0,k=0,⋯,q
The result is not mathematically tractable and not amenable to real-time signal processing applications.
Alternative that leads to tractable results: consider “weighted” error:
Since
X
(
z
)
−
H
(
z
)
=
X
(
z
)
−
B
q
(
z
)
/
A
p
(
z
)
X(z)-H(z)=X(z)-B_q(z)/A_p(z)
X(z)−H(z)=X(z)−Bq(z)/Ap(z) contains fraction, we can consider
A
p
(
z
)
(
X
(
z
)
−
H
(
z
)
)
=
A
p
(
z
)
X
(
z
)
−
B
q
(
z
)
A_p(z)(X(z)-H(z))=A_p(z)X(z)-B_q(z)
Ap(z)(X(z)−H(z))=Ap(z)X(z)−Bq(z), i.e.,
E
(
z
)
=
A
p
(
z
)
E
′
(
z
)
=
A
p
(
z
)
X
(
z
)
−
B
q
(
z
)
E(z)=A_p(z)E'(z)=A_p(z)X(z)-B_q(z)
E(z)=Ap(z)E′(z)=Ap(z)X(z)−Bq(z)
Now the error equation is linear.
Based on this, we can develop techniques like Pade Approximation, Prony’s Method, Shank’s Method.
Determine models: Pade Approximation
We have
p
+
q
+
1
p+q+1
p+q+1 model parameters. They can be solved by matching
p
+
q
+
1
p+q+1
p+q+1 signal samples exactly:
h
[
n
]
=
x
[
n
]
,
n
=
0
,
⋯
,
p
+
q
h[n]=x[n], \quad n=0,\cdots,p+q
h[n]=x[n],n=0,⋯,p+q
From the definition of
H
(
z
)
H(z)
H(z),
H
(
z
)
=
B
(
z
)
A
(
z
)
⇒
H
(
z
)
A
(
z
)
=
B
(
z
)
⇒
h
[
n
]
∗
a
[
n
]
=
b
[
n
]
⇒
h
[
n
]
+
∑
k
=
1
p
a
[
k
]
h
[
n
−
k
]
=
b
[
n
]
\begin{aligned} H(z)=\frac{B(z)}{A(z)} & \Rightarrow H(z) A(z)=B(z) \Rightarrow h[n] * a[n]=b[n] \\ & \Rightarrow \quad h[n]+\sum_{k=1}^{p} a[k] h[n-k]=b[n] \end{aligned}
H(z)=A(z)B(z)⇒H(z)A(z)=B(z)⇒h[n]∗a[n]=b[n]⇒h[n]+k=1∑pa[k]h[n−k]=b[n]
where
h
[
n
]
=
0
,
n
<
0
,
h[n]=0, n<0,
h[n]=0,n<0, and
b
[
n
]
=
0
,
n
<
0
b[n]=0, n<0
b[n]=0,n<0 or
n
>
q
n>q
n>q.
Since we assume
h
[
n
]
=
x
[
n
]
h[n]=x[n]
h[n]=x[n],
x
[
n
]
+
∑
k
=
1
p
a
[
k
]
x
[
n
−
k
]
=
{
b
[
n
]
,
n
=
0
,
⋯
,
q
0
,
n
=
q
+
1
,
⋯
,
p
+
q
x[n]+\sum_{k=1}^{p} a[k] x[n-k]=\left\{\begin{array}{ll} b[n], & n=0, \cdots, q \\ 0, & n=q+1, \cdots, p+q \end{array}\right.
x[n]+k=1∑pa[k]x[n−k]={b[n],0,n=0,⋯,qn=q+1,⋯,p+q
These equations can be rearranged in a matrix form:
[
x
(
0
)
0
⋯
0
x
(
1
)
x
(
0
)
⋯
0
x
(
2
)
x
(
1
)
⋯
0
⋮
⋮
⋮
x
(
q
)
x
(
q
−
1
)
⋯
x
(
q
−
p
)
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
x
(
q
+
1
)
x
(
q
)
⋯
x
(
q
−
p
+
1
)
⋮
⋮
⋮
x
(
q
+
p
)
x
(
q
+
p
−
1
)
⋯
x
(
q
)
]
[
1
a
p
(
1
)
a
p
(
2
)
⋮
a
p
(
p
)
]
=
[
b
q
(
0
)
b
q
(
1
)
b
q
(
2
)
⋮
b
q
(
q
)
−
−
0
⋮
0
]
\left[\begin{array}{cccc} x(0) & 0 & \cdots & 0 \\ x(1) & x(0) & \cdots & 0 \\ x(2) & x(1) & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ x(q) & x(q-1) & \cdots & x(q-p) \\ ----- & ----- & -----& -----\\ x(q+1) & x(q) & \cdots & x(q-p+1) \\ \vdots & \vdots & & \vdots \\ x(q+p) & x(q+p-1) & \cdots & x(q) \end{array}\right] \left[\begin{array}{c} 1 \\ a_{p}(1) \\ a_{p}(2) \\ \vdots \\ a_{p}(p) \end{array}\right]= \left[\begin{array}{c} b_{q}(0) \\ b_{q}(1) \\ b_{q}(2) \\ \vdots \\ b_{q}(q)\\ --\\ 0\\ \vdots\\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x(0)x(1)x(2)⋮x(q)−−−−−x(q+1)⋮x(q+p)0x(0)x(1)⋮x(q−1)−−−−−x(q)⋮x(q+p−1)⋯⋯⋯⋯−−−−−⋯⋯000⋮x(q−p)−−−−−x(q−p+1)⋮x(q)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1ap(1)ap(2)⋮ap(p)⎦⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡bq(0)bq(1)bq(2)⋮bq(q)−−0⋮0⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
-
In the first step of the Pade approximation method, solving for the coefficients a p ( k ) a_p(k) ap(k), we use the last p p p equations as indicated by the partitioning, i.e.,
[ x ( q + 1 ) x ( q ) ⋯ x ( q − p + 1 ) x ( q + 2 ) x ( q + 1 ) ⋯ x ( q − p + 2 ) ⋮ ⋮ ⋮ x ( q + p ) x ( q + p − 1 ) ⋯ x ( q ) ] [ 1 a p ( 1 ) ⋮ a p ( p ) ] = [ 0 0 ⋮ 0 ] \left[\begin{array}{cccc} x(q+1) & x(q) & \cdots & x(q-p+1) \\ x(q+2) & x(q+1) & \cdots & x(q-p+2) \\ \vdots & \vdots & & \vdots \\ x(q+p) & x(q+p-1) & \cdots & x(q) \end{array}\right]\left[\begin{array}{c} 1 \\ a_{p}(1) \\ \vdots \\ a_{p}(p) \end{array}\right]=\left[\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array}\right] ⎣⎢⎢⎢⎡x(q+1)x(q+2)⋮x(q+p)x(q)x(q+1)⋮x(q+p−1)⋯⋯⋯x(q−p+1)x(q−p+2)⋮x(q)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡1ap(1)⋮ap(p)⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡00⋮0⎦⎥⎥⎥⎤
This can be rearranged as
[ x ( q ) x ( q − 1 ) ⋯ x ( q − p + 1 ) x ( q + 1 ) x ( q ) ⋯ x ( q − p + 2 ) ⋮ ⋮ ⋮ x ( q + p − 1 ) x ( q + p − 2 ) ⋯ x ( q ) ] [ a p ( 1 ) a p ( 2 ) ⋮ a p ( p ) ] = − [ x ( q + 1 ) x ( q + 2 ) ⋮ x ( q + p ) ] \left[\begin{array}{cccc} x(q) & x(q-1) & \cdots & x(q-p+1) \\ x(q+1) & x(q) & \cdots & x(q-p+2) \\ \vdots & \vdots & & \vdots \\ x(q+p-1) & x(q+p-2) & \cdots & x(q) \end{array}\right]\left[\begin{array}{c} a_{p}(1) \\ a_{p}(2) \\ \vdots \\ a_{p}(p) \end{array}\right]=-\left[\begin{array}{c} x(q+1) \\ x(q+2) \\ \vdots \\ x(q+p) \end{array}\right] ⎣⎢⎢⎢⎡x(q)x(q+1)⋮x(q+p−1)x(q−1)x(q)⋮x(q+p−2)⋯⋯⋯x(q−p+1)x(q−p+2)⋮x(q)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡ap(1)ap(2)⋮ap(p)⎦⎥⎥⎥⎤=−⎣⎢⎢⎢⎡x(q+1)x(q+2)⋮x(q+p)⎦⎥⎥⎥⎤
Using matrix notation it may be expressed concisely as
X q a ‾ p = − x q + 1 \mathbf X_q\overline {\mathbf a}_p=-\mathbf x_{q+1} Xqap=−xq+1 -
Plug the solution back to find the b ( k ) b(k) b(k):
[ x ( 0 ) 0 0 ⋯ 0 x ( 1 ) x ( 0 ) 0 ⋯ 0 x ( 2 ) x ( 1 ) x ( 0 ) ⋯ 0 ⋮ ⋮ ⋮ ⋮ x ( q ) x ( q − 1 ) x ( q − 2 ) ⋯ x ( q − p ) ] [ 1 a p ( 1 ) a p ( 2 ) ⋮ a p ( p ) ] = [ b q ( 0 ) b q ( 1 ) b q ( 2 ) ⋮ b q ( q ) ] , \left[\begin{array}{ccccc} x(0) & 0 & 0 & \cdots & 0 \\ x(1) & x(0) & 0 & \cdots & 0 \\ x(2) & x(1) & x(0) & \cdots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ x(q) & x(q-1) & x(q-2) & \cdots & x(q-p) \end{array}\right]\left[\begin{array}{c} 1 \\ a_{p}(1) \\ a_{p}(2) \\ \vdots \\ a_{p}(p) \end{array}\right]=\left[\begin{array}{c} b_{q}(0) \\ b_{q}(1) \\ b_{q}(2) \\ \vdots \\ b_{q}(q) \end{array}\right], ⎣⎢⎢⎢⎢⎢⎡x(0)x(1)x(2)⋮x(q)0x(0)x(1)⋮x(q−1)00x(0)⋮x(q−2)⋯⋯⋯⋯000⋮x(q−p)⎦⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1ap(1)ap(2)⋮ap(p)⎦⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎡bq(0)bq(1)bq(2)⋮bq(q)⎦⎥⎥⎥⎥⎥⎤,
which can also be written in matrix form:
X 0 a p = b q \mathbf X_0 \mathbf a_p=\mathbf b_q X0ap=bq
Disadvantages of the Pade approximation
- The resulting model doesn’t have to be stable
- Outside the used interval [ 0 , p + q ] [0, p+q] [0,p+q], the approximation may be very bad
- If X q \mathrm{X}_{q} Xq is singular, then a solution does not always exist that matches all signal values. (Sometimes a solution exists of lower order for A ( z ) , B ( z ) , A(z), B(z), A(z),B(z), i.e., smaller p , q , p, q, p,q, but then no guarantee on matching for samples beyond p + q p+q p+q.)
Determine models: Prony’s Method
As before,
E
(
z
)
=
A
p
(
z
)
E
′
(
z
)
=
A
p
(
z
)
X
(
z
)
−
B
q
(
z
)
E(z)=A_p(z)E'(z)=A_p(z)X(z)-B_q(z)
E(z)=Ap(z)E′(z)=Ap(z)X(z)−Bq(z), i.e.,
e
[
n
]
=
{
x
[
n
]
+
∑
k
=
1
p
a
[
k
]
x
[
n
−
k
]
−
b
[
n
]
,
n
=
0
,
⋯
,
q
x
[
n
]
+
∑
k
=
1
p
a
[
k
]
x
[
n
−
k
]
,
n
>
q
e[n]=\left\{\begin{array}{ll} x[n]+\sum_{k=1}^{p} a[k] x[n-k]-b[n], & n=0, \cdots, q \\ x[n]+\sum_{k=1}^{p} a[k] x[n-k], & n>q \end{array}\right.
e[n]={x[n]+∑k=1pa[k]x[n−k]−b[n],x[n]+∑k=1pa[k]x[n−k],n=0,⋯,qn>q
-
For Pade, we first solved e [ n ] = 0 , n = q + 1 , ⋯ , q + p e[n]=0, n=q+1, \cdots, q+p e[n]=0,n=q+1,⋯,q+p to find A ( z ) A(z) A(z).
-
For Prony, solve
min { a [ k ] } ∑ n = q + 1 ∞ ∣ e [ n ] ∣ 2 = min { a [ k ] } ∑ n = q + 1 ∞ ∣ x [ n ] + ∑ k = 1 p a [ k ] x [ n − k ] ∣ 2 \min _{\{a[k]\}} \sum_{n=q+1}^{\infty}|e[n]|^{2}=\min _{\{a[k]\}} \sum_{n=q+1}^{\infty}\left|x[n]+\sum_{k=1}^{p} a[k] x[n-k]\right|^{2} {a[k]}minn=q+1∑∞∣e[n]∣2={a[k]}minn=q+1∑∞∣∣∣∣∣x[n]+k=1∑pa[k]x[n−k]∣∣∣∣∣2
There are two ways to derive Prony’s Method.
-
In terms of finding the least squares solution to a set of overdetermined linear equations. Specifically, note that in Prony’s method we would like to find a set of coefficients a p ( k ) a_p(k) ap(k) such that
x ( n ) + ∑ k = 1 p a p ( k ) x ( n − k ) = 0 ; n > q x(n)+\sum_{k=1}^p a_p(k)x(n-k)=0;\quad n>q x(n)+k=1∑pap(k)x(n−k)=0;n>q
i.e., we would like e ( n ) e(n) e(n) to be equal to zero for n > q n > q n>q. In matrix form, these equations may be expressed as
[ x ( q ) x ( q − 1 ) ⋯ x ( q − p + 1 ) x ( q + 1 ) x ( q ) ⋯ x ( q − p + 2 ) x ( q + 2 ) x ( q + 1 ) ⋯ x ( q − p + 3 ) ⋮ ⋮ ⋮ ] [ a p ( 1 ) a p ( 2 ) ⋮ a p ( p ) ] = − [ x ( q + 1 ) x ( q + 2 ) x ( q + 3 ) ⋮ ] \left[\begin{array}{cccc} x(q) & x(q-1) & \cdots & x(q-p+1) \\ x(q+1) & x(q) & \cdots & x(q-p+2) \\ x(q+2) & x(q+1) & \cdots & x(q-p+3) \\ \vdots & \vdots & & \vdots \end{array}\right]\left[\begin{array}{c} a_{p}(1) \\ a_{p}(2) \\ \vdots \\ a_{p}(p) \end{array}\right]=-\left[\begin{array}{c} x(q+1) \\ x(q+2) \\ x(q+3) \\ \vdots \end{array}\right] ⎣⎢⎢⎢⎡x(q)x(q+1)x(q+2)⋮x(q−1)x(q)x(q+1)⋮⋯⋯⋯x(q−p+1)x(q−p+2)x(q−p+3)⋮⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡ap(1)ap(2)⋮ap(p)⎦⎥⎥⎥⎤=−⎣⎢⎢⎢⎡x(q+1)x(q+2)x(q+3)⋮⎦⎥⎥⎥⎤
or equivalently, as
X q a p ‾ = − x q + 1 \mathbf X_q\overline{\mathbf a_p}=-\mathbf x_{q+1} Xqap=−xq+1
The least squares solution is found by solving the normal equations
( X q H X q ) a p ‾ = − X q H x q + 1 (\mathbf X_q^H \mathbf X_q)\overline{\mathbf a_p}=-\mathbf X_q^H\mathbf x_{q+1} (XqHXq)ap=−XqHxq+1 -
In terms of setting the partial derivatives equal to zero. Define
E p , q = ∑ n = q + 1 ∞ ∣ e ( n ) ∣ 2 = ∑ n = q + 1 ∞ ∣ x ( n ) + ∑ l = 1 p a p ( l ) x ( n − l ) ∣ 2 \mathcal{E}_{p, q}=\sum_{n=q+1}^{\infty}|e(n)|^{2}=\sum_{n=q+1}^{\infty}\left|x(n)+\sum_{l=1}^{p} a_{p}(l) x(n-l)\right|^{2} Ep,q=n=q+1∑∞∣e(n)∣2=n=q+1∑∞∣∣∣∣∣x(n)+l=1∑pap(l)x(n−l)∣∣∣∣∣2
By setting the partial derivatives of E p , q \mathcal E_{p,q} Ep,q with respect to a p ∗ ( k ) a_p^*(k) ap∗(k) equal to zero, we obtain
∑ n = q + 1 ∞ e ( n ) x ∗ ( n − k ) = 0 ; k = 1 , 2 , ⋯ , p \sum_{n=q+1}^{\infty} e(n)x^*(n-k)=0;\qquad k=1,2,\cdots,p n=q+1∑∞e(n)x∗(n−k)=0;k=1,2,⋯,p
which is known as the orthogonality principle.Substituting e ( n ) = x ( n ) + ∑ k = 1 p a p ( k ) x ( n − k ) e(n)=x(n)+\sum_{k=1}^p a_p(k)x(n-k) e(n)=x(n)+∑k=1pap(k)x(n−k) and defining
r x ( k , l ) = ∑ n = q + 1 ∞ x ( n − l ) x ∗ ( n − k ) , r_x(k,l)=\sum_{n=q+1}^\infty x(n-l)x^*(n-k), rx(k,l)=n=q+1∑∞x(n−l)x∗(n−k),
which is a conjugate symmetric function r x ( k , l ) = r x ∗ ( l , k ) r_x(k,l)=r_x^*(l,k) rx(k,l)=rx∗(l,k), we have
∑ l = 1 p a p ( l ) r x ( k , l ) = − r x ( k , 0 ) ; k = 1 , 2 , ⋯ , p . \sum_{l=1}^p a_p(l)r_x(k,l)=-r_x(k,0);\qquad k=1,2,\cdots,p. l=1∑pap(l)rx(k,l)=−rx(k,0);k=1,2,⋯,p.
It is referred to as the Prony normal equations. In matrix form, the normal equations are
[ r x ( 1 , 1 ) r x ( 1 , 2 ) r x ( 1 , 3 ) ⋯ r x ( 1 , p ) r x ( 2 , 1 ) r x ( 2 , 2 ) r x ( 2 , 3 ) ⋯ r x ( 2 , p ) r x ( 3 , 1 ) r x ( 3 , 2 ) r x ( 3 , 3 ) ⋯ r x ( 3 , p ) ⋮ ⋮ ⋮ ⋮ r x ( p , 1 ) r x ( p , 2 ) r x ( p , 3 ) ⋯ r x ( p , p ) ] [ a p ( 1 ) a p ( 2 ) a p ( 3 ) ⋮ a p ( p ) ] = − [ r x ( 1 , 0 ) r x ( 2 , 0 ) r x ( 3 , 0 ) ⋮ r x ( p , 0 ) ] \left[\begin{array}{ccccc} r_{x}(1,1) & r_{x}(1,2) & r_{x}(1,3) & \cdots & r_{x}(1, p) \\ r_{x}(2,1) & r_{x}(2,2) & r_{x}(2,3) & \cdots & r_{x}(2, p) \\ r_{x}(3,1) & r_{x}(3,2) & r_{x}(3,3) & \cdots & r_{x}(3, p) \\ \vdots & \vdots & \vdots & & \vdots \\ r_{x}(p, 1) & r_{x}(p, 2) & r_{x}(p, 3) & \cdots & r_{x}(p, p) \end{array}\right]\left[\begin{array}{c} a_{p}(1) \\ a_{p}(2) \\ a_{p}(3) \\ \vdots \\ a_{p}(p) \end{array}\right]=-\left[\begin{array}{c} r_{x}(1,0) \\ r_{x}(2,0) \\ r_{x}(3,0) \\ \vdots \\ r_{x}(p, 0) \end{array}\right] ⎣⎢⎢⎢⎢⎢⎡rx(1,1)rx(2,1)rx(3,1)⋮rx(p,1)rx(1,2)rx(2,2)rx(3,2)⋮rx(p,2)rx(1,3)rx(2,3)rx(3,3)⋮rx(p,3)⋯⋯⋯⋯rx(1,p)rx(2,p)rx(3,p)⋮rx(p,p)⎦⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡ap(1)ap(2)ap(3)⋮ap(p)⎦⎥⎥⎥⎥⎥⎤=−⎣⎢⎢⎢⎢⎢⎡rx(1,0)rx(2,0)rx(3,0)⋮rx(p,0)⎦⎥⎥⎥⎥⎥⎤
which may be written concisely as follows
R x a p ‾ = − r x \mathbf R_x\overline{\mathbf a_p}=-\mathbf r_x Rxap=−rx
where R x \mathbf R_x Rx is a p × p p\times p p×p Hermitian matrix. -
In fact, observing the matrix X q \mathbf X_q Xq and R x \mathbf R_x Rx, it can be found that
R x = X q H X q , \mathbf R_x=\mathbf X_q^H \mathbf X_q, Rx=XqHXq,
and similarly
r x = X q H x q + 1 \mathbf r_x=\mathbf X_q^H\mathbf x_{q+1} rx=XqHxq+1
Thus, R x a p ‾ = − r x \mathbf R_x\overline{\mathbf a_p}=-\mathbf r_x Rxap=−rx can also be written as ( X q H X q ) a p ‾ = − X q H x q + 1 (\mathbf X_q^H \mathbf X_q)\overline{\mathbf a_p}=-\mathbf X_q^H\mathbf x_{q+1} (XqHXq)ap=−XqHxq+1, which is the normal equation in method one.
Once the coefficients
a
p
(
k
)
a_p(k)
ap(k) have been found, the coefficients
b
n
b_n
bn can be obtained by
b
q
(
n
)
=
x
(
n
)
+
∑
k
=
1
p
a
p
(
k
)
x
(
n
−
k
)
.
b_q(n)=x(n)+\sum_{k=1}^p a_p(k)x(n-k).
bq(n)=x(n)+k=1∑pap(k)x(n−k).
Alternatively, Shank’s method switches back to
e
′
[
n
]
e′[n]
e′[n] and minimizes over the entire data. see Hayes, 4.4.2.
The minimum error is (due to the orthogonality principle)
ε
p
,
q
=
∥
e
∥
2
=
e
H
e
=
(
X
q
a
p
‾
+
x
q
+
1
)
H
e
=
x
q
+
1
H
e
\varepsilon_{p,q}=\|\mathbf e\|^2=\mathbf e^H\mathbf e=(\mathbf X_q\overline {\mathbf a_p}+\mathbf x_{q+1})^H\mathbf e=\mathbf x_{q+1}^H \mathbf e
εp,q=∥e∥2=eHe=(Xqap+xq+1)He=xq+1He
This can further be written as
ϵ
p
,
q
=
x
q
+
1
H
(
X
q
a
p
‾
+
x
q
+
1
)
=
(
x
q
+
1
H
X
q
)
a
p
‾
+
x
q
+
1
H
x
q
+
1
\epsilon_{p, q}=\mathbf{x}_{q+1}^{H}\left(\mathbf{X}_{q} \overline{\mathbf{a}_p}+\mathbf{x}_{q+1}\right)=\left(\mathbf{x}_{q+1}^{H} \mathbf{X}_{q}\right) \overline{\mathbf{a}_p}+\mathbf{x}_{q+1}^{H} \mathbf{x}_{q+1}
ϵp,q=xq+1H(Xqap+xq+1)=(xq+1HXq)ap+xq+1Hxq+1
In terms of the autocorrelation sequence,
r
x
(
0
,
0
)
=
∑
n
=
q
+
1
∞
x
∗
[
n
]
x
[
n
]
=
x
q
+
1
H
x
q
+
1
,
r
x
(
0
,
k
)
=
∑
n
=
q
+
1
∞
x
∗
[
n
]
x
[
n
−
k
]
=
[
x
q
+
1
H
X
q
]
k
r_{x}(0,0)=\sum_{n=q+1}^{\infty} x^{*}[n] x[n]=\mathbf{x}_{q+1}^{H} \mathbf{x}_{q+1}, \quad r_{x}(0, k)=\sum_{n=q+1}^{\infty} x^{*}[n] x[n-k]=\left[\mathbf{x}_{q+1}^{H} \mathbf{X}_{q}\right]_{k}
rx(0,0)=n=q+1∑∞x∗[n]x[n]=xq+1Hxq+1,rx(0,k)=n=q+1∑∞x∗[n]x[n−k]=[xq+1HXq]k
this can be written as
ϵ
p
,
q
=
r
x
(
0
,
0
)
+
∑
k
=
1
p
a
[
k
]
r
x
(
0
,
k
)
\epsilon_{p, q}=r_{x}(0,0)+\sum_{k=1}^{p} a[k] r_{x}(0, k)
ϵp,q=rx(0,0)+k=1∑pa[k]rx(0,k)
This can be combined with the equation
R
x
a
p
‾
=
−
r
x
\mathbf R_x \overline {\mathbf a_p}=-\mathbf r_x
Rxap=−rx as
[
r
x
(
0
,
0
)
r
x
(
0
,
1
)
r
x
(
0
,
2
)
⋯
r
x
(
0
,
p
)
r
x
(
1
,
0
)
r
x
(
1
,
1
)
r
x
(
1
,
2
)
⋯
r
x
(
1
,
p
)
r
x
(
2
,
0
)
r
x
(
2
,
1
)
r
x
(
2
,
2
)
⋯
r
x
(
2
,
p
)
⋮
⋱
⋮
r
x
(
p
,
0
)
r
x
(
p
,
1
)
r
x
(
p
,
2
)
⋯
r
x
(
p
,
p
)
]
[
1
a
[
1
]
a
[
2
]
⋮
a
[
p
]
]
=
[
ϵ
p
,
q
0
0
⋮
0
]
\left[\begin{array}{c|cccc} r_{x}(0,0) & r_{x}(0,1) & r_{x}(0,2) & \cdots & r_{x}(0, p) \\ \hline r_{x}(1,0) & r_{x}(1,1) & r_{x}(1,2) & \cdots & r_{x}(1, p) \\ r_{x}(2,0) & r_{x}(2,1) & r_{x}(2,2) & \cdots & r_{x}(2, p) \\ \vdots & \ddots & & \vdots & \\ r_{x}(p, 0) & r_{x}(p, 1) & r_{x}(p, 2) & \cdots & r_{x}(p, p) \end{array}\right]\left[\begin{array}{c} 1 \\ \hline a[1] \\ a[2] \\ \vdots \\ a[p] \end{array}\right]=\left[\begin{array}{c} \epsilon_{p, q} \\ \hline 0 \\ 0 \\ \vdots \\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎢⎡rx(0,0)rx(1,0)rx(2,0)⋮rx(p,0)rx(0,1)rx(1,1)rx(2,1)⋱rx(p,1)rx(0,2)rx(1,2)rx(2,2)rx(p,2)⋯⋯⋯⋮⋯rx(0,p)rx(1,p)rx(2,p)rx(p,p)⎦⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1a[1]a[2]⋮a[p]⎦⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎡ϵp,q00⋮0⎦⎥⎥⎥⎥⎥⎤
or also as
R
x
a
p
=
ϵ
p
,
q
u
1
\mathbf R_x \mathbf a_p=\epsilon_{p,q}\mathbf u_1
Rxap=ϵp,qu1
These are the augmented normal equations.
DM-Special case: All-pole modeling
If
q
=
0
q=0
q=0, we have an all-pole model:
H
(
z
)
=
b
(
0
)
1
+
∑
k
=
1
p
a
p
(
k
)
z
−
k
H(z)=\frac{b(0)}{1+\sum_{k=1}^{p} a_{p}(k) z^{-k}}
H(z)=1+∑k=1pap(k)z−kb(0)
Recall that the solution is given by solving the overdetermined system
(
q
=
0
)
(q=0)
(q=0)
[
x
[
0
]
0
0
⋯
0
x
[
1
]
x
[
0
]
0
⋯
0
x
[
2
]
x
[
1
]
x
[
0
]
⋱
0
x
[
3
]
x
[
2
]
x
[
1
]
⋱
x
[
0
]
⋮
⋮
⋮
⋮
⋮
]
[
a
[
1
]
a
[
2
]
a
[
3
]
⋮
a
[
p
]
]
=
−
[
x
[
1
]
x
[
2
]
x
[
3
]
⋮
⋮
]
\left[\begin{array}{ccccc} x[0] & 0 & 0 & \cdots & 0 \\ x[1] & x[0] & 0 & \cdots & 0 \\ x[2] & x[1] & x[0] & \ddots & 0 \\ x[3] & x[2] & x[1] & \ddots & x[0] \\ \vdots & \vdots & \vdots & \vdots & \vdots \end{array}\right]\left[\begin{array}{c} a[1] \\ a[2] \\ a[3] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{c} x[1] \\ x[2] \\ x[3] \\ \vdots \\ \vdots \end{array}\right]
⎣⎢⎢⎢⎢⎢⎡x[0]x[1]x[2]x[3]⋮0x[0]x[1]x[2]⋮00x[0]x[1]⋮⋯⋯⋱⋱⋮000x[0]⋮⎦⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡a[1]a[2]a[3]⋮a[p]⎦⎥⎥⎥⎥⎥⎤=−⎣⎢⎢⎢⎢⎢⎢⎡x[1]x[2]x[3]⋮⋮⎦⎥⎥⎥⎥⎥⎥⎤
The normal equations become (premultiply with
X
0
H
\mathbf X_0^H
X0H)
[
x
∗
[
0
]
x
∗
[
1
]
x
∗
[
2
]
⋯
0
⋱
⋱
0
0
x
∗
[
0
]
⋯
]
[
x
[
0
]
0
0
x
[
1
]
⋱
0
x
[
2
]
⋱
x
[
0
]
⋮
⋮
⋮
]
[
a
[
1
]
⋮
a
[
p
]
]
=
−
[
x
∗
[
0
]
x
∗
[
1
]
x
∗
[
2
]
⋯
0
⋱
⋱
0
0
x
∗
[
0
]
⋯
]
[
x
[
1
]
x
[
2
]
x
[
3
]
⋮
]
\left[\begin{array}{cccc} x^{*}[0] & x^{*}[1] & x^{*}[2] \cdots \\ 0 & \ddots & \ddots \\ 0 & 0 & x^{*}[0] \cdots \end{array}\right]\left[\begin{array}{ccc} x[0] & 0 & 0 \\ x[1] & \ddots & 0 \\ x[2] & \ddots & x[0] \\ \vdots & \vdots & \vdots \end{array}\right]\left[\begin{array}{c} a[1] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{cccc} x^{*}[0] & x^{*}[1] & x^{*}[2] \cdots \\ 0 & \ddots & \ddots \\ 0 & 0 & x^{*}[0] \cdots \end{array}\right]\left[\begin{array}{c} x[1] \\ x[2] \\ x[3] \\ \vdots \end{array}\right]
⎣⎡x∗[0]00x∗[1]⋱0x∗[2]⋯⋱x∗[0]⋯⎦⎤⎣⎢⎢⎢⎡x[0]x[1]x[2]⋮0⋱⋱⋮00x[0]⋮⎦⎥⎥⎥⎤⎣⎢⎡a[1]⋮a[p]⎦⎥⎤=−⎣⎡x∗[0]00x∗[1]⋱0x∗[2]⋯⋱x∗[0]⋯⎦⎤⎣⎢⎢⎢⎡x[1]x[2]x[3]⋮⎦⎥⎥⎥⎤
From this structure it is seen that the normal equations become
[
r
x
(
0
)
r
x
∗
(
1
)
⋯
r
x
∗
(
p
−
1
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
∗
(
p
−
2
)
⋮
⋱
⋮
r
x
(
p
−
1
)
r
x
(
p
−
2
)
⋯
r
x
(
0
)
]
[
a
[
1
]
a
[
2
]
⋮
a
[
p
]
]
=
−
[
r
x
(
1
)
r
x
(
2
)
⋮
r
x
(
p
)
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}^{*}(1) & \cdots & r_{x}^{*}(p-1) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}^{*}(p-2) \\ \vdots & \ddots & \vdots \\ r_{x}(p-1) & r_{x}(p-2) & \cdots & r_{x}(0) \end{array}\right]\left[\begin{array}{c} a[1] \\ a[2] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{c} r_{x}(1) \\ r_{x}(2) \\ \vdots \\ r_{x}(p) \end{array}\right]
⎣⎢⎢⎢⎡rx(0)rx(1)⋮rx(p−1)rx∗(1)rx(0)⋱rx(p−2)⋯⋯⋮⋯rx∗(p−1)rx∗(p−2)rx(0)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡a[1]a[2]⋮a[p]⎦⎥⎥⎥⎤=−⎣⎢⎢⎢⎡rx(1)rx(2)⋮rx(p)⎦⎥⎥⎥⎤
where the autocorrelation sequence is in fact “stationary”:
r
x
(
k
,
l
)
=
∑
n
=
0
∞
x
[
n
−
k
]
x
[
n
−
l
]
∗
=
x
[
k
−
l
]
x
[
0
]
∗
+
x
[
k
−
l
+
1
]
x
[
1
]
∗
+
⋯
=
∑
n
=
0
∞
x
[
n
+
k
−
l
]
x
[
n
]
∗
=
r
x
(
k
−
l
)
\begin{aligned} r_x(k,l)&=\sum_{n=0}^\infty x[n-k]x[n-l]^*=x[k-l]x[0]^*+x[k-l+1]x[1]^*+\cdots\\ &=\sum_{n=0}^\infty x[n+k-l]x[n]^*=r_x(k-l) \end{aligned}
rx(k,l)=n=0∑∞x[n−k]x[n−l]∗=x[k−l]x[0]∗+x[k−l+1]x[1]∗+⋯=n=0∑∞x[n+k−l]x[n]∗=rx(k−l)
For an all-pole model, the matrix
R
x
\mathbf R_x
Rx has a Toeplitz structure; it can be efficiently inverted using the Levinson algorithm (later in Ch. 5)
Finally, the minimum modeling error for the all-pole model is given by
{
E
p
}
m
i
n
=
ϵ
p
=
r
x
(
0
)
+
∑
k
=
1
p
a
p
(
k
)
r
x
∗
(
k
)
\{\mathcal E_p\}_{min}=\epsilon_p=r_x(0)+\sum_{k=1}^p a_p(k)r^*_x(k)
{Ep}min=ϵp=rx(0)+k=1∑pap(k)rx∗(k)
Incorporating it into the normal equations,
[
r
x
(
0
)
r
x
∗
(
1
)
r
x
∗
(
2
)
⋯
r
x
∗
(
p
)
r
x
(
1
)
r
x
(
0
)
r
x
∗
(
1
)
⋯
r
x
∗
(
p
−
1
)
r
x
(
2
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
∗
(
p
−
2
)
⋮
⋮
⋮
⋮
r
x
(
p
)
r
x
(
p
−
1
)
r
x
(
p
−
2
)
⋯
r
x
(
0
)
]
[
1
a
p
(
1
)
a
p
(
2
)
⋮
a
p
(
p
)
]
=
[
ϵ
p
0
0
⋮
0
]
\left[\begin{array}{ccccc} r_{x}(0) & r_{x}^{*}(1) & r_{x}^{*}(2) & \cdots & r_{x}^{*}(p) \\ r_{x}(1) & r_{x}(0) & r_{x}^{*}(1) & \cdots & r_{x}^{*}(p-1) \\ r_{x}(2) & r_{x}(1) & r_{x}(0) & \cdots & r_{x}^{*}(p-2) \\ \vdots & \vdots & \vdots & & \vdots \\ r_{x}(p) & r_{x}(p-1) & r_{x}(p-2) & \cdots & r_{x}(0) \end{array}\right]\left[\begin{array}{c} 1 \\ a_{p}(1) \\ a_{p}(2) \\ \vdots \\ a_{p}(p) \end{array}\right]=\left[\begin{array}{c} \epsilon_{p} \\ 0 \\ 0 \\ \vdots \\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎢⎡rx(0)rx(1)rx(2)⋮rx(p)rx∗(1)rx(0)rx(1)⋮rx(p−1)rx∗(2)rx∗(1)rx(0)⋮rx(p−2)⋯⋯⋯⋯rx∗(p)rx∗(p−1)rx∗(p−2)⋮rx(0)⎦⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1ap(1)ap(2)⋮ap(p)⎦⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎡ϵp00⋮0⎦⎥⎥⎥⎥⎥⎤
or
R
x
a
p
=
ϵ
p
u
1
\mathbf R_x \mathbf a_p=\epsilon _p \mathbf u_1
Rxap=ϵpu1
The numerator will be chosen as
b
[
0
]
=
ϵ
p
b[0]=\sqrt{\epsilon_p}
b[0]=ϵp
. It can be shown that then
r
x
(
k
)
=
r
h
(
k
)
,
∣
k
∣
≤
p
r_x(k)=r_h(k),|k|\le p
rx(k)=rh(k),∣k∣≤p
Meaning: the autocorrelation sequence of the filter matches the first
p
p
p lags of the autocorrelation sequence of the specified data (“moment matching”).
All-pole modeling with finite data
Since Prony’s method assumes that x ( n ) x(n) x(n) is known for all n n n, the coefficients a p ( k ) a_p(k) ap(k) are found by minimizing the sum of the squares of e ( n ) e(n) e(n) for all n > q n > q n>q. What happens, however, when x ( n ) x(n) x(n) is only known or measured for values of n n n over a finite interval such as [ 0 , N ] [0, N] [0,N]?
Autocorrelation method
Assume that
x
(
n
)
=
0
x(n) = 0
x(n)=0 for
n
<
0
n < 0
n<0 and for
n
>
N
n > N
n>N and then use Prony’s method to find the coefficients
a
p
(
k
)
a_p(k)
ap(k). In other words, we form a new signal
x
~
(
n
)
\tilde x(n)
x~(n) by applying a rectangular window to
x
(
n
)
x(n)
x(n), i.e.
x
~
(
n
)
=
x
(
n
)
w
R
(
n
)
\tilde x(n)=x(n)w_R(n)
x~(n)=x(n)wR(n)
where
w
R
(
n
)
=
{
1
;
n
=
0
,
1
,
⋯
,
N
0
;
otherwise
w_R(n)=\left\{\begin{array}{ll}1;\quad n=0,1,\cdots,N\\ 0;\quad\text{otherwise} \end{array} \right.
wR(n)={1;n=0,1,⋯,N0;otherwise
and then uses Prony’s method to find an all-pole model for
x
~
(
n
)
\tilde x(n)
x~(n):
r
x
(
k
)
=
∑
n
=
0
∞
x
~
(
n
)
x
~
∗
(
n
−
k
)
=
∑
n
=
k
N
x
(
n
)
x
∗
(
n
−
k
)
;
k
=
0
,
1
,
⋯
,
p
r_x(k)=\sum_{n=0}^\infty \tilde x(n)\tilde x^*(n-k)=\sum_{n=k}^{N}x(n)x^*(n-k);\quad k=0,1,\cdots,p
rx(k)=n=0∑∞x~(n)x~∗(n−k)=n=k∑Nx(n)x∗(n−k);k=0,1,⋯,p
It can also be interpreted as solving the overdetermined system
[
x
(
0
)
0
0
⋯
0
x
(
1
)
x
(
0
)
0
⋯
0
x
(
2
)
x
(
1
)
x
(
0
)
⋯
0
⋮
⋮
⋮
⋮
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
x
(
p
−
1
)
x
(
p
−
2
)
x
(
p
−
3
)
⋯
x
(
0
)
x
(
p
)
x
(
p
−
1
)
x
(
p
−
2
)
⋯
x
(
1
)
⋮
⋮
⋮
⋮
x
(
N
−
2
)
x
(
N
−
3
)
x
(
N
−
4
)
⋯
x
(
N
−
p
−
1
)
x
(
N
−
1
)
x
(
N
−
2
)
x
(
N
−
3
)
⋯
x
(
N
−
p
)
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
x
(
N
)
x
(
N
−
1
)
x
(
N
−
2
)
⋯
x
(
N
−
p
+
1
)
0
x
(
N
)
x
(
N
−
1
)
⋯
x
(
N
−
p
+
2
)
⋮
⋮
⋮
⋮
0
0
0
⋯
x
(
N
)
]
[
a
[
1
]
a
[
2
]
a
[
3
]
⋮
a
[
p
]
]
=
−
[
x
[
1
]
x
[
2
]
x
[
3
]
⋮
x
[
N
]
0
⋮
0
]
\left[\begin{array}{ccccc} x(0) & 0 & 0 & \cdots & 0 \\ x(1) & x(0) & 0 & \cdots & 0 \\ x(2) & x(1) & x(0) & \cdots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ ---&---&---&---&---\\ x(p-1) & x(p-2) & x(p-3) & \cdots & x(0) \\ x(p) & x(p-1) & x(p-2) & \cdots & x(1) \\ \vdots & \vdots & \vdots & & \vdots \\ x(N-2) & x(N-3) & x(N-4) & \cdots & x(N-p-1) \\ x(N-1) & x(N-2) & x(N-3) & \cdots & x(N-p) \\ ---&---&---&---&---\\ x(N) & x(N-1) & x(N-2) & \cdots & x(N-p+1) \\ 0 & x(N) & x(N-1) & \cdots & x(N-p+2) \\ \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & 0 & \cdots & x(N) \end{array}\right]\left[\begin{array}{c} a[1] \\ a[2] \\ a[3] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{c} x[1] \\ x[2] \\ x[3] \\ \vdots \\ x[N]\\ 0\\ \vdots\\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x(0)x(1)x(2)⋮−−−x(p−1)x(p)⋮x(N−2)x(N−1)−−−x(N)0⋮00x(0)x(1)⋮−−−x(p−2)x(p−1)⋮x(N−3)x(N−2)−−−x(N−1)x(N)⋮000x(0)⋮−−−x(p−3)x(p−2)⋮x(N−4)x(N−3)−−−x(N−2)x(N−1)⋮0⋯⋯⋯−−−⋯⋯⋯⋯−−−⋯⋯⋯000⋮−−−x(0)x(1)⋮x(N−p−1)x(N−p)−−−x(N−p+1)x(N−p+2)⋮x(N)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡a[1]a[2]a[3]⋮a[p]⎦⎥⎥⎥⎥⎥⎤=−⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x[1]x[2]x[3]⋮x[N]0⋮0⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
Covariance method
If we know that
x
[
n
]
x[n]
x[n] is not zero for
n
<
0
n < 0
n<0 and
n
>
N
n > N
n>N, it is more accurate to omit those equations that contain an extension with zeros:
[
x
[
p
−
1
]
⋱
x
[
0
]
x
[
p
]
⋱
x
[
1
]
⋮
⋱
⋮
x
[
N
−
1
]
⋱
x
[
N
−
p
]
]
[
a
[
1
]
⋮
a
[
p
]
]
=
−
[
x
[
p
]
x
[
p
+
1
]
⋮
x
[
N
]
]
\left[\begin{array}{ccc} x[p-1] & \ddots & x[0] \\ x[p] & \ddots & x[1] \\ \vdots & \ddots & \vdots \\ x[N-1] & \ddots & x[N-p] \end{array}\right]\left[\begin{array}{c} a[1] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{l} x[p] \\ x[p+1] \\ \vdots \\ x[N] \end{array}\right]
⎣⎢⎢⎢⎡x[p−1]x[p]⋮x[N−1]⋱⋱⋱⋱x[0]x[1]⋮x[N−p]⎦⎥⎥⎥⎤⎣⎢⎡a[1]⋮a[p]⎦⎥⎤=−⎣⎢⎢⎢⎡x[p]x[p+1]⋮x[N]⎦⎥⎥⎥⎤
We will have slightly different normal equations as before, with
r
x
(
k
,
ℓ
)
r_x (k, ℓ)
rx(k,ℓ) defined as
r
x
(
k
,
l
)
=
∑
n
=
p
N
x
(
n
−
k
)
x
∗
(
n
−
l
)
r_x(k,l)=\sum_{n=p}^N x(n-k)x^*(n-l)
rx(k,l)=n=p∑Nx(n−k)x∗(n−l)
Unlike the autocorrelation normal equations, the covariance normal equations are not Toeplitz.
Examples
Pole estimation
Consider the finite data sequence
x
=
[
1
,
β
,
β
2
,
⋯
,
β
N
]
T
.
x=\left[1, \beta, \beta^{2}, \cdots, \beta^{N}\right]^{T} .
x=[1,β,β2,⋯,βN]T. Estimate an
A
R
(
1
)
A R(1)
AR(1) model for this signal
(
p
=
1
)
(p=1)
(p=1) :
H
(
z
)
=
b
(
0
)
1
+
a
[
1
]
z
−
1
H(z)=\frac{b(0)}{1+a[1] z^{-1}}
H(z)=1+a[1]z−1b(0)
Autocorrelation method
The normal equations collapse to
r
x
(
0
)
a
[
1
]
=
−
r
x
(
1
)
,
r_{x}(0) a[1]=-r_{x}(1),
rx(0)a[1]=−rx(1), where
r
x
(
0
)
=
[
1
,
β
,
β
2
,
⋯
,
β
N
]
[
1
β
β
2
⋮
β
N
]
=
∑
n
=
0
N
∣
β
∣
2
n
=
1
−
∣
β
∣
2
N
+
2
1
−
∣
β
∣
2
r
x
(
1
)
=
[
0
,
1
,
β
,
⋯
,
β
N
]
[
1
β
N
0
]
=
∑
n
=
0
N
−
1
β
∣
β
∣
2
n
=
β
1
−
∣
β
∣
2
N
1
−
∣
β
∣
2
⇒
a
[
1
]
=
−
β
1
−
∣
β
∣
2
N
1
−
∣
β
∣
2
N
+
2
\begin{array}{r} r_{x}(0)=\left[1, \beta, \beta^{2}, \cdots, \beta^{N}\right]\left[\begin{array}{c} 1 \\ \beta \\ \beta^{2} \\ \vdots \\ \beta^{N} \end{array}\right]=\sum_{n=0}^{N}|\beta|^{2 n}=\frac{1-|\beta|^{2 N+2}}{1-|\beta|^{2}} \\ r_{x}(1)=\left[0,1, \beta, \cdots, \beta^{N}\right]\left[\begin{array}{c} 1 \\ \beta^{N} \\ 0 \end{array}\right]=\sum_{n=0}^{N-1} \beta|\beta|^{2 n}=\beta \frac{1-|\beta|^{2 N}}{1-|\beta|^{2}} \\ \Rightarrow \quad a[1]=-\beta \frac{1-|\beta|^{2 N}}{1-|\beta|^{2 N+2}} \end{array}
rx(0)=[1,β,β2,⋯,βN]⎣⎢⎢⎢⎢⎢⎡1ββ2⋮βN⎦⎥⎥⎥⎥⎥⎤=∑n=0N∣β∣2n=1−∣β∣21−∣β∣2N+2rx(1)=[0,1,β,⋯,βN]⎣⎡1βN0⎦⎤=∑n=0N−1β∣β∣2n=β1−∣β∣21−∣β∣2N⇒a[1]=−β1−∣β∣2N+21−∣β∣2N
Covariance method
The normal equations collapse to
r
x
(
1
,
1
)
a
[
1
]
=
−
r
x
(
1
,
0
)
,
r_{x}(1,1) a[1]=-r_{x}(1,0),
rx(1,1)a[1]=−rx(1,0), where
r
x
(
1
,
1
)
=
[
1
,
β
,
β
2
,
⋯
,
β
N
−
1
]
[
1
β
β
2
⋮
β
N
−
1
]
=
∑
n
=
0
N
−
1
∣
β
∣
2
n
=
1
−
∣
β
∣
2
N
1
−
∣
β
∣
2
r
x
(
1
,
0
)
=
[
1
,
β
,
⋯
,
β
N
−
1
]
[
β
2
⋮
β
N
]
=
∑
n
=
0
N
−
1
β
∣
β
∣
2
n
=
β
1
−
∣
β
∣
2
N
1
−
∣
β
∣
2
\begin{array}{l} r_{x}(1,1)=\left[1, \beta, \beta^{2}, \cdots, \beta^{N-1}\right]\left[\begin{array}{c} 1 \\ \beta \\ \beta^{2} \\ \vdots \\ \beta^{N-1} \end{array}\right]=\sum_{n=0}^{N-1}|\beta|^{2 n}=\frac{1-|\beta|^{2 N}}{1-|\beta|^{2}} \\ r_{x}(1,0)=\left[1, \beta, \cdots, \beta^{N-1}\right]\left[\begin{array}{c} \beta^{2} \\ \vdots \\ \beta^{N} \end{array}\right]=\sum_{n=0}^{N-1} \beta|\beta|^{2 n}=\beta \frac{1-|\beta|^{2 N}}{1-|\beta|^{2}} \end{array}
rx(1,1)=[1,β,β2,⋯,βN−1]⎣⎢⎢⎢⎢⎢⎡1ββ2⋮βN−1⎦⎥⎥⎥⎥⎥⎤=∑n=0N−1∣β∣2n=1−∣β∣21−∣β∣2Nrx(1,0)=[1,β,⋯,βN−1]⎣⎢⎡β2⋮βN⎦⎥⎤=∑n=0N−1β∣β∣2n=β1−∣β∣21−∣β∣2N
Thus,
a
[
1
]
=
−
β
.
a[1]=-\beta .
a[1]=−β. Set
b
(
0
)
=
1
b(0)=1
b(0)=1 so that
x
(
0
)
=
x
^
(
0
)
x(0)=\hat x(0)
x(0)=x^(0) then
H
(
z
)
=
1
1
−
β
z
−
1
H(z)=\frac{1}{1-\beta z^{-1}}
H(z)=1−βz−11
The pole location is found exactly.
Channel inversion
Consider a communication channel with a known (estimated) transfer function
G
(
z
)
G(z)
G(z).
At the receiver, we wish to equalize (invert) the channel using an equalizer
H
(
z
)
H(z)
H(z) :
G
(
z
)
H
(
z
)
=
1
⇔
g
[
n
]
∗
h
[
n
]
=
δ
[
n
]
G(z) H(z)=1 \quad \Leftrightarrow \quad g[n] * h[n]=\delta[n]
G(z)H(z)=1⇔g[n]∗h[n]=δ[n]
Further
H
(
z
)
H(z)
H(z) must be causal and stable, typically FIR.
If
G
(
z
)
G(z)
G(z) is not minimum-phase (has zeros outside the unit circle), causal inversion
is not possible, and we allow for a delay:
G
(
z
)
H
(
z
)
=
z
−
M
⇔
g
[
n
]
∗
h
[
n
]
=
δ
[
n
−
M
]
=
:
d
[
n
]
G(z) H(z)=z^{-M} \quad \Leftrightarrow \quad g[n] * h[n]=\delta[n-M]=: d[n]
G(z)H(z)=z−M⇔g[n]∗h[n]=δ[n−M]=:d[n]
Design of an FIR equalizer
H
(
z
)
H(z)
H(z) of length
N
N
N :
e
[
n
]
=
d
[
n
]
−
h
[
n
]
∗
g
[
n
]
=
d
[
n
]
−
∑
ℓ
=
0
N
−
1
h
[
ℓ
]
g
[
n
−
ℓ
]
e[n]=d[n]-h[n] * g[n]=d[n]-\sum_{\ell=0}^{N-1} h[\ell] g[n-\ell]
e[n]=d[n]−h[n]∗g[n]=d[n]−ℓ=0∑N−1h[ℓ]g[n−ℓ]
Minimize
E
=
∑
n
=
0
∞
∣
e
[
n
]
∣
2
=
∑
n
=
0
∞
∣
d
[
n
]
−
∑
ℓ
=
0
N
−
1
h
[
ℓ
]
g
[
n
−
ℓ
]
∣
2
:
\mathcal{E}=\sum_{n=0}^{\infty}|e[n]|^{2}=\sum_{n=0}^{\infty}\left|d[n]-\sum_{\ell=0}^{N-1} h[\ell] g[n-\ell]\right|^{2}:
E=∑n=0∞∣e[n]∣2=∑n=0∞∣∣∣d[n]−∑ℓ=0N−1h[ℓ]g[n−ℓ]∣∣∣2:
min
{
h
[
ℓ
]
}
∥
[
g
[
0
]
0
0
g
[
1
]
⋱
0
⋮
⋱
g
[
0
]
g
[
N
]
⋱
g
[
1
]
⋮
⋱
⋮
]
[
h
[
0
]
⋮
h
[
N
−
1
]
]
−
[
d
[
0
]
⋮
d
[
N
−
1
]
d
[
N
]
⋮
]
∥
2
\min _{\{h[\ell]\}}\left\|\left[\begin{array}{ccc} g[0] & 0 & 0 \\ g[1] & \ddots & 0 \\ \vdots & \ddots & g[0] \\ g[N] & \ddots & g[1] \\ \vdots & \ddots & \vdots \end{array}\right]\left[\begin{array}{c} h[0] \\ \vdots \\ h[N-1] \end{array}\right]-\left[\begin{array}{c} d[0] \\ \vdots \\ d[N-1] \\ d[N] \\ \vdots \end{array}\right]\right\|^{2}
{h[ℓ]}min∥∥∥∥∥∥∥∥∥∥∥∥⎣⎢⎢⎢⎢⎢⎢⎡g[0]g[1]⋮g[N]⋮0⋱⋱⋱⋱00g[0]g[1]⋮⎦⎥⎥⎥⎥⎥⎥⎤⎣⎢⎡h[0]⋮h[N−1]⎦⎥⎤−⎣⎢⎢⎢⎢⎢⎢⎡d[0]⋮d[N−1]d[N]⋮⎦⎥⎥⎥⎥⎥⎥⎤∥∥∥∥∥∥∥∥∥∥∥∥2
The LS solution of this overdetermined system
G
h
=
d
\mathrm{Gh}=\mathrm{d}
Gh=d is
h
=
G
†
d
=
(
G
H
G
)
−
1
G
H
d
\mathrm{h}=\mathrm{G}^{\dagger} \mathrm{d}=\left(\mathrm{G}^{H} \mathrm{G}\right)^{-1} \mathrm{G}^{H} \mathrm{d}
h=G†d=(GHG)−1GHd
Also, the corresponding normal equations are
(
G
H
G
)
h
=
G
H
d
\left(\mathrm{G}^{H} \mathrm{G}\right) \mathrm{h}=\mathrm{G}^{H} \mathrm{d}
(GHG)h=GHd, or
[
r
g
(
0
)
⋯
r
g
∗
(
N
−
1
)
⋮
⋮
r
g
(
N
−
1
)
⋯
r
g
(
0
)
]
[
h
[
0
]
⋮
h
[
N
−
1
]
]
=
−
[
r
d
g
(
0
)
⋮
r
d
g
(
N
−
1
)
]
\left[\begin{array}{lll} r_{g}(0) & \cdots & r_{g}^{*}(N-1) \\ \vdots & \vdots & \\ r_{g}(N-1) & \cdots & r_{g}(0) \end{array}\right]\left[\begin{array}{c} h[0] \\ \vdots \\ h[N-1] \end{array}\right]=-\left[\begin{array}{c} r_{d g}(0) \\ \vdots \\ r_{d g}(N-1) \end{array}\right]
⎣⎢⎡rg(0)⋮rg(N−1)⋯⋮⋯rg∗(N−1)rg(0)⎦⎥⎤⎣⎢⎡h[0]⋮h[N−1]⎦⎥⎤=−⎣⎢⎡rdg(0)⋮rdg(N−1)⎦⎥⎤
where
r
g
(
k
)
=
∑
n
=
0
∞
g
[
n
]
g
∗
[
n
−
k
]
r_{g}(k)=\sum_{n=0}^{\infty} g[n] g^{*}[n-k]
rg(k)=∑n=0∞g[n]g∗[n−k] and
r
d
g
(
k
)
=
∑
n
=
0
∞
d
[
n
]
g
∗
[
n
−
k
]
.
r_{d g}(k)=\sum_{n=0}^{\infty} d[n] g^{*}[n-k] .
rdg(k)=∑n=0∞d[n]g∗[n−k]. Because the matrix
R
g
=
G
H
G
\mathrm{R}_{g}=\mathrm{G}^{H} \mathrm{G}
Rg=GHG is Toeplitz, it can be inverted efficiently.
Stochastic models
We now consider modeling a stochastic process x [ n ] x[n] x[n] via a rational filter H ( z ) H(z) H(z).
Instead of minimizing
∑
∣
e
′
[
n
]
∣
2
,
\sum\left|e^{\prime}[n]\right|^{2},
∑∣e′[n]∣2, it makes more sense to minimize
E
(
∣
e
′
[
n
]
∣
2
)
=
E
(
∣
x
[
n
]
−
x
^
[
n
]
∣
2
)
E\left(\left|e^{\prime}[n]\right|^{2}\right)=E\left(|x[n]-\hat x[n]|^{2}\right)
E(∣e′[n]∣2)=E(∣x[n]−x^[n]∣2)
As before, minimizing this error is not tractable.
Instead, we can try to match the correlation sequence
r
x
(
k
)
r_{x}(k)
rx(k) of
x
[
n
]
x[n]
x[n] with the correlation sequence of
x
^
[
n
]
\hat{x}[n]
x^[n].
Autoregressive moving average processes
Suppose we filter white noise
v
(
n
)
v(n)
v(n) of variance
σ
v
2
\sigma_{v}^{2}
σv2 with the filter
H
(
z
)
=
B
(
z
)
A
(
z
)
=
∑
k
=
0
q
b
(
k
)
z
−
k
1
+
∑
k
=
1
p
a
(
k
)
z
−
k
H(z)=\frac{B(z)}{A(z)}=\frac{\sum_{k=0}^{q} b(k) z^{-k}}{1+\sum_{k=1}^{p} a(k) z^{-k}}
H(z)=A(z)B(z)=1+∑k=1pa(k)z−k∑k=0qb(k)z−k
Assuming that the filter is stable, the output process
x
(
n
)
x(n)
x(n) will be wide-sense stationary and the power spectrum of the output
x
(
n
)
x(n)
x(n) can then be written as
P
x
(
z
)
=
σ
v
2
B
(
z
)
B
∗
(
1
/
z
∗
)
A
(
z
)
A
∗
(
1
/
z
∗
)
P
x
(
e
j
ω
)
=
σ
v
2
∣
B
(
e
j
ω
)
∣
2
∣
A
(
e
j
ω
)
∣
2
P_{x}(z)=\sigma_{v}^{2} \frac{B(z) B^{*}\left(1 / z^{*}\right)}{A(z) A^{*}\left(1 / z^{*}\right)} \quad P_{x}\left(e^{j \omega}\right)=\sigma_{v}^{2} \frac{\left|B\left(e^{j \omega}\right)\right|^{2}}{\left|A\left(e^{j \omega}\right)\right|^{2}}
Px(z)=σv2A(z)A∗(1/z∗)B(z)B∗(1/z∗)Px(ejω)=σv2∣A(ejω)∣2∣∣B(ejω)∣∣2
-
Such a process is known as an autoregressive moving average process of order ( p , q ) , (p, q), (p,q), or A R M A ( p , q ) \mathrm{ARMA}(p, q) ARMA(p,q).
-
The power spectrum of an ARMA ( p , q ) (p, q) (p,q) process has 2 p 2 p 2p poles and 2 q 2 q 2q zeros with conjugate reciprocal symmetry.
Yule-Walker equations
From the LCCDE between
v
(
n
)
v(n)
v(n) and
x
(
n
)
=
h
(
n
)
∗
v
(
n
)
x(n)=h(n) * v(n)
x(n)=h(n)∗v(n) :
x
(
n
)
+
∑
ℓ
=
1
p
a
(
ℓ
)
x
(
n
−
ℓ
)
=
∑
ℓ
=
0
q
b
(
ℓ
)
v
(
n
−
ℓ
)
x(n)+\sum_{\ell=1}^{p} a(\ell) x(n-\ell)=\sum_{\ell=0}^{q} b(\ell) v(n-\ell)
x(n)+ℓ=1∑pa(ℓ)x(n−ℓ)=ℓ=0∑qb(ℓ)v(n−ℓ)
we can multiply both sides with
x
∗
(
n
−
k
)
x^{*}(n-k)
x∗(n−k) and take the expectation:
r
x
(
k
)
+
∑
ℓ
=
1
p
a
(
ℓ
)
r
x
(
k
−
ℓ
)
=
∑
ℓ
=
0
q
b
(
ℓ
)
E
{
v
(
n
−
ℓ
)
x
∗
(
n
−
k
)
}
=
∑
ℓ
=
0
q
b
(
ℓ
)
r
v
x
(
k
−
ℓ
)
r_{x}(k)+\sum_{\ell=1}^{p} a(\ell) r_{x}(k-\ell)=\sum_{\ell=0}^{q} b(\ell) E\left\{v(n-\ell) x^{*}(n-k)\right\}=\sum_{\ell=0}^{q} b(\ell) r_{v x}(k-\ell)
rx(k)+ℓ=1∑pa(ℓ)rx(k−ℓ)=ℓ=0∑qb(ℓ)E{v(n−ℓ)x∗(n−k)}=ℓ=0∑qb(ℓ)rvx(k−ℓ)
The cross correlation between
v
(
n
)
v(n)
v(n) and
x
(
n
)
x(n)
x(n) can further be expressed as
r
v
x
(
k
−
ℓ
)
=
E
{
v
(
k
)
x
∗
(
ℓ
)
}
=
∑
m
=
−
∞
∞
E
{
v
(
k
)
v
∗
(
ℓ
−
m
)
}
h
∗
(
m
)
=
σ
v
2
h
∗
(
ℓ
−
k
)
r_{v x}(k-\ell)=E\left\{v(k) x^{*}(\ell)\right\}=\sum_{m=-\infty}^{\infty} E\left\{v(k) v^{*}(\ell-m)\right\} h^{*}(m)=\sigma_{v}^{2} h^{*}(\ell-k)
rvx(k−ℓ)=E{v(k)x∗(ℓ)}=m=−∞∑∞E{v(k)v∗(ℓ−m)}h∗(m)=σv2h∗(ℓ−k)
For
k
≥
0
,
k \geq 0,
k≥0, this leads to the Yule-Walker equations
r
x
(
k
)
+
∑
ℓ
=
1
p
a
(
ℓ
)
r
x
(
k
−
ℓ
)
=
{
σ
v
2
∑
l
=
0
q
b
(
ℓ
)
h
∗
(
ℓ
−
k
)
=
σ
v
2
c
(
k
)
;
0
≤
k
≤
q
0
;
k
>
q
r_{x}(k)+\sum_{\ell=1}^{p} a(\ell) r_{x}(k-\ell)=\left\{\begin{array}{cc} &\sigma_{v}^{2} \sum_{l=0}^{q} b(\ell) h^{*}(\ell-k)=\sigma_{v}^{2} c(k) ;& 0 \leq k \leq q \\ &0 ;& k>q \end{array}\right.
rx(k)+ℓ=1∑pa(ℓ)rx(k−ℓ)={σv2∑l=0qb(ℓ)h∗(ℓ−k)=σv2c(k);0;0≤k≤qk>q
Autoregressive processes
An
A
R
M
A
(
p
,
0
)
\mathrm {ARMA}(p,0)
ARMA(p,0) process is an autoregressive process, or
A
R
(
p
)
\mathrm{AR}(p)
AR(p):
H
(
z
)
=
b
(
0
)
1
+
∑
k
=
1
ρ
a
(
k
)
z
−
k
P
x
(
z
)
=
σ
v
2
∣
b
(
0
)
∣
2
A
(
z
)
A
∗
(
1
/
z
∗
)
,
P
x
(
e
j
ω
)
=
σ
v
2
∣
b
(
0
)
∣
2
∣
A
(
e
j
ω
)
∣
2
H(z)=\frac{b(0)}{1+\sum_{k=1}^{\rho} a(k) z^{-k}}\\ \quad P_{x}(z)=\sigma_{v}^{2} \frac{|b(0)|^{2}}{A(z) A^{*}\left(1 / z^{*}\right)}, \quad P_{x}\left(e^{j \omega}\right)=\sigma_{v}^{2} \frac{|b(0)|^{2}}{\left|A\left(e^{j \omega}\right)\right|^{2}}
H(z)=1+∑k=1ρa(k)z−kb(0)Px(z)=σv2A(z)A∗(1/z∗)∣b(0)∣2,Px(ejω)=σv2∣A(ejω)∣2∣b(0)∣2
The Yule-Walker equations are given by
r
x
(
k
)
+
∑
ℓ
=
1
p
a
(
ℓ
)
r
x
(
k
−
ℓ
)
=
σ
v
2
∣
b
(
0
)
∣
2
δ
(
k
)
;
k
≥
0
r_{x}(k)+\sum_{\ell=1}^{p} a(\ell) r_{x}(k-\ell)=\sigma_{v}^{2}|b(0)|^{2} \delta(k) ; k \geq 0
rx(k)+ℓ=1∑pa(ℓ)rx(k−ℓ)=σv2∣b(0)∣2δ(k);k≥0
Moving average processes
An
A
R
M
A
(
0
,
q
)
\mathrm {ARMA}(0,q)
ARMA(0,q) process is a moving average process, or
M
A
(
q
)
\mathrm{MA}(q)
MA(q):
H
(
z
)
=
B
(
z
)
=
∑
k
=
0
q
b
(
k
)
z
−
k
P
x
(
z
)
=
σ
v
2
B
(
z
)
B
∗
(
1
/
z
∗
)
,
P
x
(
e
j
ω
)
=
σ
v
2
∣
B
(
e
j
ω
)
∣
2
H(z)=B(z)=\sum_{k=0}^{q} b(k) z^{-k}\\ P_{x}(z)=\sigma_{v}^{2} B(z) B^{*}\left(1 / z^{*}\right), \quad P_{x}\left(e^{j \omega}\right)=\sigma_{v}^{2}\left|B\left(e^{j \omega}\right)\right|^{2}
H(z)=B(z)=k=0∑qb(k)z−kPx(z)=σv2B(z)B∗(1/z∗),Px(ejω)=σv2∣∣B(ejω)∣∣2
The Yule-Walker equations are given by
r
x
(
k
)
=
σ
v
2
∑
ℓ
=
0
q
b
(
ℓ
)
b
∗
(
k
−
ℓ
)
=
σ
v
2
b
(
k
)
∗
b
∗
(
−
k
)
r_{x}(k)=\sigma_{v}^{2} \sum_{\ell=0}^{q} b(\ell) b^{*}(k-\ell)=\sigma_{v}^{2} b(k) * b^{*}(-k)
rx(k)=σv2ℓ=0∑qb(ℓ)b∗(k−ℓ)=σv2b(k)∗b∗(−k)
The autocorrelation function is zero outside
[
−
q
,
q
]
[-q, q]
[−q,q]
Estimating
b
(
k
)
b(k)
b(k) from the Yule-Walker equations is not easy (nonlinear).
Stochastic model: Pade
In matrix form, the Yule-Walker equations become
[
r
x
(
0
)
r
x
(
−
1
)
⋯
r
x
(
−
p
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
(
−
p
+
1
)
⋮
⋮
⋮
r
x
(
q
)
r
x
(
q
−
1
)
⋯
r
x
(
q
−
p
)
r
x
(
q
+
1
)
r
x
(
q
)
⋯
r
x
(
q
−
p
+
1
)
⋮
⋮
⋮
r
x
(
L
)
r
x
(
L
−
1
)
⋯
r
x
(
L
−
p
)
⋮
⋮
⋮
]
[
1
a
(
1
)
a
(
2
)
⋮
a
(
p
)
]
=
σ
v
2
[
c
(
0
)
c
(
1
)
⋮
c
(
q
)
0
⋮
0
⋮
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}(-1) & \cdots & r_{x}(-p) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}(-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(q) & r_{x}(q-1) & \cdots & r_{x}(q-p) \\ \hline r_{x}(q+1) & r_{x}(q) & \cdots & r_{x}(q-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(L) & r_{x}(L-1) & \cdots & r_{x}(L-p)\\ \vdots & \vdots & & \vdots \\ \end{array}\right]\left[\begin{array}{c} 1 \\ a(1) \\ a(2)\\ \vdots \\ a(p) \end{array}\right]=\sigma_{v}^{2} \left[\begin{array}{c} c(0) \\ c(1) \\ \vdots \\ c(q) \\ \hline 0 \\ \vdots \\ 0\\ \vdots \end{array}\right]
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡rx(0)rx(1)⋮rx(q)rx(q+1)⋮rx(L)⋮rx(−1)rx(0)⋮rx(q−1)rx(q)⋮rx(L−1)⋮⋯⋯⋯⋯⋯rx(−p)rx(−p+1)⋮rx(q−p)rx(q−p+1)⋮rx(L−p)⋮⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1a(1)a(2)⋮a(p)⎦⎥⎥⎥⎥⎥⎤=σv2⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡c(0)c(1)⋮c(q)0⋮0⋮⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
Keep the first
p
+
q
+
1
p+q+1
p+q+1 equations when
k
=
0
,
1
,
⋯
,
p
+
q
k=0,1,\cdots,p+q
k=0,1,⋯,p+q: (because we only have
p
+
q
+
1
p+q+1
p+q+1 unknowns).
[
r
x
(
0
)
r
x
(
−
1
)
⋯
r
x
(
−
p
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
(
−
p
+
1
)
⋮
⋮
⋮
r
x
(
q
)
r
x
(
q
−
1
)
⋯
r
x
(
q
−
p
)
r
x
(
q
+
1
)
r
x
(
q
)
⋯
r
x
(
q
−
p
+
1
)
⋮
⋮
⋮
r
x
(
q
+
p
)
r
x
(
q
+
p
−
1
)
⋯
r
x
(
q
)
]
[
1
a
(
1
)
⋮
a
(
p
)
]
=
σ
v
2
[
c
(
0
)
c
(
1
)
⋮
c
(
q
)
0
⋮
0
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}(-1) & \cdots & r_{x}(-p) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}(-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(q) & r_{x}(q-1) & \cdots & r_{x}(q-p) \\ \hline r_{x}(q+1) & r_{x}(q) & \cdots & r_{x}(q-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(q+p) & r_{x}(q+p-1) & \cdots & r_{x}(q) \end{array}\right]\left[\begin{array}{c} 1 \\ a(1) \\ \vdots \\ a(p) \end{array}\right]=\sigma_{v}^{2} \left[\begin{array}{c} c(0) \\ c(1) \\ \vdots \\ c(q) \\ \hline 0 \\ \vdots \\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡rx(0)rx(1)⋮rx(q)rx(q+1)⋮rx(q+p)rx(−1)rx(0)⋮rx(q−1)rx(q)⋮rx(q+p−1)⋯⋯⋯⋯⋯rx(−p)rx(−p+1)⋮rx(q−p)rx(q−p+1)⋮rx(q)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎡1a(1)⋮a(p)⎦⎥⎥⎥⎤=σv2⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡c(0)c(1)⋮c(q)0⋮0⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
We can solve
a
(
1
)
,
⋯
,
a
(
p
)
a(1),\cdots,a(p)
a(1),⋯,a(p) using the bottom half equations,
[
r
x
(
q
)
r
x
∗
(
q
−
1
)
⋯
r
x
∗
(
q
−
p
+
1
)
r
x
(
q
+
1
)
r
x
(
q
)
⋯
r
x
∗
(
q
−
p
+
2
)
⋮
⋮
⋮
r
x
(
q
+
p
−
1
)
r
x
(
q
+
p
−
2
)
⋯
r
x
(
q
)
]
[
a
[
1
]
a
[
2
]
⋮
a
[
p
]
]
=
−
[
r
x
(
q
+
1
)
r
x
(
q
+
2
)
⋮
r
x
(
q
+
p
)
]
\left[\begin{array}{c} r_{x}(q) & r_{x}^{*}(q-1) & \cdots & r_{x}^{*}(q-p+1) \\ r_{x}(q+1) & r_{x}(q) & \cdots & r_{x}^{*}(q-p+2) \\ \vdots & \vdots& & \vdots \\ r_{x}(q+p-1) & r_{x}(q+p-2) & \cdots & r_{x}(q) \end{array}\right]\left[\begin{array}{c} a[1] \\ a[2] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{c} r_{x}(q+1) \\ r_{x}(q+2) \\ \vdots \\ r_{x}(q+p) \end{array}\right]
⎣⎢⎢⎢⎡rx(q)rx(q+1)⋮rx(q+p−1)rx∗(q−1)rx(q)⋮rx(q+p−2)⋯⋯⋯rx∗(q−p+1)rx∗(q−p+2)⋮rx(q)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡a[1]a[2]⋮a[p]⎦⎥⎥⎥⎤=−⎣⎢⎢⎢⎡rx(q+1)rx(q+2)⋮rx(q+p)⎦⎥⎥⎥⎤
and plug them back to obtain
c
(
0
)
,
⋯
,
c
(
q
)
c(0),\cdots,c(q)
c(0),⋯,c(q):
[
r
x
(
0
)
r
x
∗
(
1
)
⋯
r
x
∗
(
p
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
∗
(
p
+
1
)
⋮
⋮
⋮
r
x
(
q
)
r
x
(
q
−
1
)
⋯
r
x
(
q
−
p
)
]
[
1
a
[
1
]
a
[
2
]
⋮
a
[
p
]
]
=
[
c
[
0
]
c
[
1
]
⋮
c
[
q
]
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}^{*}(1) & \cdots & r_{x}^{*}(p) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}^{*}(p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(q) & r_{x}(q-1) & \cdots & r_{x}(q-p) \end{array}\right]\left[\begin{array}{c} 1 \\ a[1] \\ a[2] \\ \vdots \\ a[p] \end{array}\right]=\left[\begin{array}{c} c[0] \\ c[1] \\ \vdots \\ c[q] \end{array}\right]
⎣⎢⎢⎢⎡rx(0)rx(1)⋮rx(q)rx∗(1)rx(0)⋮rx(q−1)⋯⋯⋯rx∗(p)rx∗(p+1)⋮rx(q−p)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1a[1]a[2]⋮a[p]⎦⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡c[0]c[1]⋮c[q]⎦⎥⎥⎥⎤
Since
c
(
k
)
=
b
(
k
)
∗
h
∗
(
−
k
)
c(k)=b(k)*h^*(-k)
c(k)=b(k)∗h∗(−k), where
h
∗
(
−
k
)
h^*(-k)
h∗(−k) depends on
b
(
k
)
b(k)
b(k) as well, estimating
b
(
k
)
b(k)
b(k) from the Yule-Walker equations is not easy.
Stochastic model: Prony
Prony follows by taking L > p + q L>p+q L>p+q; this results in an overdetermined set of equations for estimating a ( k ) a(k) a(k):
[
r
x
(
0
)
r
x
(
−
1
)
⋯
r
x
(
−
p
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
(
−
p
+
1
)
⋮
⋮
⋮
r
x
(
q
)
r
x
(
q
−
1
)
⋯
r
x
(
q
−
p
)
r
x
(
q
+
1
)
r
x
(
q
)
⋯
r
x
(
q
−
p
+
1
)
⋮
⋮
⋮
r
x
(
L
)
r
x
(
L
−
1
)
⋯
r
x
(
L
−
p
)
]
[
1
a
(
1
)
a
(
2
)
⋮
a
(
p
)
]
=
σ
v
2
[
c
(
0
)
c
(
1
)
⋮
c
(
q
)
0
⋮
0
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}(-1) & \cdots & r_{x}(-p) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}(-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(q) & r_{x}(q-1) & \cdots & r_{x}(q-p) \\ \hline r_{x}(q+1) & r_{x}(q) & \cdots & r_{x}(q-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(L) & r_{x}(L-1) & \cdots & r_{x}(L-p) \end{array}\right]\left[\begin{array}{c} 1 \\ a(1) \\ a(2)\\ \vdots \\ a(p) \end{array}\right]=\sigma_{v}^{2} \left[\begin{array}{c} c(0) \\ c(1) \\ \vdots \\ c(q) \\ \hline 0 \\ \vdots \\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡rx(0)rx(1)⋮rx(q)rx(q+1)⋮rx(L)rx(−1)rx(0)⋮rx(q−1)rx(q)⋮rx(L−1)⋯⋯⋯⋯⋯rx(−p)rx(−p+1)⋮rx(q−p)rx(q−p+1)⋮rx(L−p)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎡1a(1)a(2)⋮a(p)⎦⎥⎥⎥⎥⎥⎤=σv2⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡c(0)c(1)⋮c(q)0⋮0⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
From the last
L
−
q
L - q
L−q equations we have
[
r
x
(
q
)
r
x
(
q
−
1
)
⋯
r
x
(
q
−
p
+
1
)
r
x
(
q
+
1
)
r
x
(
q
)
⋯
r
x
(
q
−
p
+
2
)
⋮
⋮
⋮
r
x
(
L
−
1
)
r
x
(
L
−
2
)
⋯
r
x
(
L
−
p
)
]
[
a
(
1
)
a
(
2
)
⋮
a
(
p
)
]
=
−
[
r
x
(
q
+
1
)
r
x
(
q
+
2
)
⋮
r
x
(
L
)
]
\left[\begin{array}{cccc} r_{x}(q) & r_{x}(q-1) & \cdots & r_{x}(q-p+1) \\ r_{x}(q+1) & r_{x}(q) & \cdots & r_{x}(q-p+2) \\ \vdots & \vdots & & \vdots \\ r_{x}(L-1) & r_{x}(L-2) & \cdots & r_{x}(L-p) \end{array}\right]\left[\begin{array}{c} a(1) \\ a(2) \\ \vdots \\ a(p) \end{array}\right]=-\left[\begin{array}{c} r_{x}(q+1) \\ r_{x}(q+2) \\ \vdots \\ r_{x}(L) \end{array}\right]
⎣⎢⎢⎢⎡rx(q)rx(q+1)⋮rx(L−1)rx(q−1)rx(q)⋮rx(L−2)⋯⋯⋯rx(q−p+1)rx(q−p+2)⋮rx(L−p)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡a(1)a(2)⋮a(p)⎦⎥⎥⎥⎤=−⎣⎢⎢⎢⎡rx(q+1)rx(q+2)⋮rx(L)⎦⎥⎥⎥⎤
or
R
q
a
‾
=
−
r
q
+
1
\mathbf R_q \overline{\mathbf a}=-\mathbf r_{q+1}
Rqa=−rq+1
We may find the least squares solution by solving
(
R
q
H
R
q
)
a
‾
=
−
R
q
H
r
q
+
1
(\mathbf R_q^H \mathbf R_q)\overline {\mathbf a}=-\mathbf R_q^H \mathbf r_{q+1}
(RqHRq)a=−RqHrq+1
where
(
R
q
H
R
q
)
(\mathbf R_q^H\mathbf R_q)
(RqHRq) is a
p
×
p
p\times p
p×p Hermitian Toeplitz matrix.
SM-Special case: All-pole modeling
For autoregressive processes, stack the Yule-Walker equations for
k
=
0
,
1
,
…
,
p
k=0,1, \ldots, p
k=0,1,…,p
[
r
x
(
0
)
r
x
(
−
1
)
⋯
r
x
(
−
p
)
r
x
(
1
)
r
x
(
0
)
⋮
r
x
(
−
p
+
1
)
⋮
⋮
⋮
r
x
(
p
)
r
x
(
p
−
1
)
⋯
r
x
(
0
)
]
[
1
a
(
1
)
⋮
a
(
p
)
]
=
σ
v
2
∣
b
(
0
)
∣
2
[
1
0
⋮
0
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}(-1) & \cdots & r_{x}(-p) \\ r_{x}(1) & r_{x}(0) & \vdots & r_{x}(-p+1) \\ \vdots & \vdots & & \vdots \\ r_{x}(p) & r_{x}(p-1) & \cdots & r_{x}(0) \end{array}\right]\left[\begin{array}{c} 1 \\ a(1) \\ \vdots \\ a(p) \end{array}\right]=\sigma_{v}^{2}|b(0)|^{2}\left[\begin{array}{c} 1 \\ 0 \\ \vdots \\ 0 \end{array}\right]
⎣⎢⎢⎢⎢⎡rx(0)rx(1)⋮rx(p)rx(−1)rx(0)⋮rx(p−1)⋯⋮⋯rx(−p)rx(−p+1)⋮rx(0)⎦⎥⎥⎥⎥⎤⎣⎢⎢⎢⎡1a(1)⋮a(p)⎦⎥⎥⎥⎤=σv2∣b(0)∣2⎣⎢⎢⎢⎡10⋮0⎦⎥⎥⎥⎤
It can also be written as
[
r
x
(
0
)
r
x
∗
(
1
)
⋯
r
x
∗
(
p
−
1
)
r
x
(
1
)
r
x
(
0
)
⋯
r
x
∗
(
p
−
2
)
⋮
⋱
⋮
r
x
(
p
−
1
)
r
x
(
p
−
2
)
⋯
r
x
(
0
)
]
[
a
[
1
]
a
[
2
]
⋮
a
[
p
]
]
=
−
[
r
x
(
1
)
r
x
(
2
)
⋮
r
x
(
p
)
]
\left[\begin{array}{cccc} r_{x}(0) & r_{x}^{*}(1) & \cdots & r_{x}^{*}(p-1) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}^{*}(p-2) \\ \vdots & \ddots & \vdots \\ r_{x}(p-1) & r_{x}(p-2) & \cdots & r_{x}(0) \end{array}\right]\left[\begin{array}{c} a[1] \\ a[2] \\ \vdots \\ a[p] \end{array}\right]=-\left[\begin{array}{c} r_{x}(1) \\ r_{x}(2) \\ \vdots \\ r_{x}(p) \end{array}\right]
⎣⎢⎢⎢⎡rx(0)rx(1)⋮rx(p−1)rx∗(1)rx(0)⋱rx(p−2)⋯⋯⋮⋯rx∗(p−1)rx∗(p−2)rx(0)⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡a[1]a[2]⋮a[p]⎦⎥⎥⎥⎤=−⎣⎢⎢⎢⎡rx(1)rx(2)⋮rx(p)⎦⎥⎥⎥⎤
This is exactly the same as the normal equations for [all-pole modeling with Prony’s method](# DM-Special case: All-pole modeling) in the deterministic case, except that here
r
x
(
k
)
r_{x}(k)
rx(k) is a statistical autocorrelation.
If it is estimated from the data, r ^ x ( k ) = 1 N ∑ n = 0 N − 1 x [ n ] x ∗ [ n − k ] \hat{r}_{x}(k)=\frac{1}{N} \sum_{n=0}^{N-1} x[n] x^{*}[n-k] r^x(k)=N1∑n=0N−1x[n]x∗[n−k], the methods become identical. Also: ∣ b [ 0 ] ∣ 2 = r x ( 0 ) + ∑ k = 1 p a [ k ] r x ( k ) |b[0]|^{2}=r_{x}(0)+\sum_{k=1}^{p} a[k] r_{x}(k) ∣b[0]∣2=rx(0)+∑k=1pa[k]rx(k)
Summary
Determined model: Pade
Determined model: Prony
Determined model: Shank
DM-Special case: All-pole modeling
DM-Special case: All-pole modeling with finite data