Cambridge Maths Academy

A classification of critical points with the Hessian matrix 본문

수학 모음 (Maths collection)/Technical A - Exploring ideas

A classification of critical points with the Hessian matrix

Cambridge Maths Academy 2022. 4. 11. 00:25
반응형

수학 모음 (Maths collection) 전체보기

For a function which depends on two variables $(x,y)$, $$ \textrm f = \textrm f(x,y) $$ we find the critical points by considering 2-dimensional gradient and second-order derivatives. (In 1D, the critical points are usually called the stationary points.)

 

(i) Critical points: $$ \begin{align} \nabla \textrm f = \left( \frac{ \partial \textrm f }{ \partial x }, \frac{ \partial \textrm f }{ \partial y } \right) = 0 \end{align} $$

 

(ii) A classification: We consider a 2-dimensional Taylor expansion $$ \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \left( \frac{ \partial \textrm f }{ \partial x } \Delta x + \frac{ \partial \textrm f }{ \partial y } \Delta y \right) + \frac12 \left[ \frac{ \partial^2 \textrm f }{ \partial x^2 } (\Delta x)^2 + \frac{ \partial^2 \textrm f }{ \partial x \partial y } \Delta x \Delta y + \frac{ \partial \textrm f }{ \partial y } (\Delta y)^2 \right] + \cdots \\ &= \textrm f( \textbf x ) + \underbrace{ \begin{pmatrix} \frac{ \partial \textrm f }{ \partial x } \\ \frac{ \partial \textrm f }{ \partial y } \end{pmatrix} }_{ \nabla \textrm f } \cdot \underbrace{ \begin{pmatrix} \Delta x \\ \Delta y \end{pmatrix} }_{ \Delta \textbf x } + \frac12 \underbrace{ \begin{pmatrix} \Delta x & \Delta y \end{pmatrix} }_{ \Delta \textbf x^\intercal } \underbrace{ \begin{pmatrix} \frac{ \partial^2 \textrm f }{ \partial x^2 } & \frac{ \partial^2 \textrm f }{ \partial x \partial y } \\ \frac{ \partial^2 \textrm f }{ \partial y \partial x } & \frac{ \partial^2 \textrm f }{ \partial y^2 } \end{pmatrix} }_{ H } \underbrace{ \begin{pmatrix} \Delta x \\ \Delta y \end{pmatrix} }_{ \Delta \textbf x } \\ &= \textrm f( \textbf x ) + \nabla \textrm f \cdot \Delta \textbf x + \frac12 \Delta \textbf x^\intercal H \Delta \textbf x + \cdots \end{align}$$ where the Hessian matrix $H$ is defined by $$ \begin{align} H = \begin{pmatrix} \frac{ \partial^2 \textrm f }{ \partial x^2 } & \frac{ \partial^2 \textrm f }{ \partial x \partial y } \\ \frac{ \partial^2 \textrm f }{ \partial y \partial x } & \frac{ \partial^2 \textrm f }{ \partial y^2 } \end{pmatrix} \end{align} $$

 

Aside. A multi-dimensional Taylor expansion reads $$ \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \sum_{i=1}^n \frac{ \partial \textrm f }{ \partial x_i } \Delta x_i + \sum_{i,j} \frac{ \partial^2 \textrm f }{ \partial x_i \partial x_j } \Delta x_i \Delta x_j + \sum_{i,j,k} \frac{ \partial^3 \textrm f }{ \partial x_i \partial x_j \partial x_k } \Delta x_i \Delta x_j \Delta x_k + \cdots \end{align} $$

 

(iii) Diagonalisation: As we diagonalise the Hessian matrix using the eigenvalue equation, $$ \begin{align} && H \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} = \underbrace{ \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} }_{ P^{-1} } \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \\ \\ &\Rightarrow& H = \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix}^{-1} = P^{-1} \Lambda P \end{align} $$ Since $H$ is real symmetric, the eigenvectors form an orthonormal basis and $P$ is thus an orthogonal matrix, i.e. $$ P^\intercal P = \mathbb I \qquad \Leftrightarrow \qquad P^{-1} = P^\intercal $$ which gives $$ H = P^\intercal \Lambda P $$

 

The Taylor expansion may be re-written as $$ \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \Delta \textbf x \cdot \nabla \textrm f + \frac12 ( P \Delta \textbf x)^\intercal \Lambda ( P \Delta \textbf x ) + \cdots \end{align} $$ For critical points $\textbf x_0$, where $\nabla \textrm f = 0$, this gives $$ \begin{align} \textrm f( \textbf x_0 + \Delta \textbf x) &= \textrm f( \textbf x_0 ) + \frac12 ( P \Delta \textbf x)^\intercal \Lambda ( P \Delta \textbf x ) + \cdots \\ &= \textrm f( \textbf x_0 ) + \frac12 ( P \Delta \textbf x)^\intercal \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} ( P \Delta \textbf x ) + \cdots \end{align} $$ Thus, the eigenvalues of $H$ tells about the nature of the critical point(s), i.e.

  • If $ \lambda_1 > 0 $ and $ \lambda_2 > 0 $, $ \textrm f( \textbf x_0 + \Delta \textbf x) > \textrm f ( \textbf x_0 ) $ for all $ \Delta \textbf x $, hence the critical point is a local minimum.
  • If $ \lambda_1 < 0 $ and $ \lambda_2 < 0 $, $ \textrm f( \textbf x_0 + \Delta \textbf x) < \textrm f ( \textbf x_0 ) $ for all $ \Delta \textbf x $, hence the critical point is a local maximum.
  • If $ \lambda_1 \lambda_2 < 0 $, i.e. the two eigenvalues take the opposite signs, $ \textrm f( \textbf x_0 + \Delta \textbf x) > \textrm f ( \textbf x_0 ) $ in one direction while $ \textrm f( \textbf x_0 + \Delta \textbf x) < \textrm f ( \textbf x_0 ) $ in its orthogonal direction, henc the critical point is a saddle point.
  • If $ \lambda_1 \lambda_2 = 0 $, i.e. at least one of them is zero, then the critical point is degenerate as the surface is flat in the relevant direction - the direction of the eigenvector whose eigenvalue is zero.

 

(iv) Eigenvalues: The eigenvalues of the Hessian matrix are given by $$ \begin{align} && \begin{vmatrix} \textrm f_{xx} - \lambda & \textrm f_{xy} \\ \textrm f_{xy} & \textrm f_{yy} - \lambda \end{vmatrix} = 0 \\ \\ & \Rightarrow & ( \textrm f_{xx} - \lambda )( \textrm f_{yy} - \lambda ) - \textrm f_{xy}^2 = 0 \\ & \Rightarrow & \lambda^2 - ( \textrm f_{xx} + \textrm f_{yy} ) \lambda + \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) = 0 \\ & \Rightarrow & \lambda^2 - ( \textrm{tr} \, H ) \lambda + \det H = 0 \\ \\ & \Rightarrow & \lambda = \frac{ \textrm{tr} \, H \pm \sqrt{ ( \textrm{tr} \, H )^2 - 4 \det H } }{ 2 } \end{align} $$ which gives $$ \begin{align} & \Rightarrow & \lambda &= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ ( \textrm f_{xx} + \textrm f_{yy} )^2 - 4 \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ \left( \textrm f_{xx}^2 + 2 \textrm f_{xx} \textrm f_{yy} + \textrm f_{yy}^2 \right) - 4 \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ \left( \textrm f_{xx}^2 - 2 \textrm f_{xx} \textrm f_{yy} + \textrm f_{yy}^2 \right) + 4 \textrm f_{xy}^2 } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ ( \textrm f_{xx} - \textrm f_{yy} )^2 + 4 \textrm f_{xy}^2 } }{ 2 } \end{align} $$ So we see that the discriminant is always non-negative, i.e. $$ \Delta = ( \textrm{tr} \, H )^2 - 4 \det H = ( \textrm f_{xx} - \textrm f_{yy} )^2 + 4 \textrm f_{xy}^2 \ge 0 $$

 

(1) For $\Delta = 0$, i.e. $$ ( \textrm{tr} \, H )^2 = 4 \det H \qquad \Leftrightarrow \qquad \textrm f_{xx} = \textrm f_{yy} \quad \textrm{and} \quad \textrm f_{xy} = 0 $$ the eigenvalues are double roots and $ \det H > 0$. $$ p = \frac{ \textrm{tr} \, H }{ 2 } = \frac{ \textrm f_{xx} + \textrm f_{yy} }{ 2 } $$

  • For $ \textrm{tr} \, H > 0$, it gives a local minimum.
  • For $ \textrm{tr} \, H < 0$, it gives a local maximum.
  • For $ \textrm{tr} \, H = 0$, we also find $ \det H = 0 $ and the critical point is degenerate.

 

(2) For $\Delta > 0$: then there are two distinct eigenvalues.

  • If $\det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 <0$, the two eigenvalues are of the opposite signs. The critical point is a saddle point.
  • If $\det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0$, one of the eigenvalues is zero. $$ p_1 = 0 \quad \textrm{and} \quad p_2 = \textrm{tr} \, H $$ The critical point is degenerate as the surface is flat in the relevant direction - the direction of the eigenvector whose eigenvalue is zero.
  • If $\det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 >0$,

    (i) if $ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 $, the critical point is a local minimum;

    (ii) if $ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 $, the critical point is a local maximum.

  • If $ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 $, then $$ p = \pm \sqrt{ - \det H } $$ and $$ \begin{align} \det H &= \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \\ &= - \textrm f_{xx}^2 - \textrm f_{xy}^2 \le 0 \end{align} $$

    (i) if $ \det H = 0 $, i.e. $ \textrm f_{xx} = \textrm f_{yy} = \textrm f_{xy} = 0 $, the two eigenvalues are both zero so the critical point is degenerate;

    (ii) if $ \det H < 0 $, the two eigenvalues take the opposite signs so the critical point is a saddle point.

The results can be summarised: $$ \begin{align} \begin{array}{|c|c|c|c|} \hline & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 > 0 & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0 & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 < 0 \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 & \textrm{A lcoal minimum} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 & \textrm{A lcoal maximum} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 & \textrm{Not possible} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \end{array} \end{align} $$

 

  $ \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 > 0 $ $ \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0 $ $ \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 < 0$
$ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0$ A local minimum Degenerate A saddle point
$ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0$ A local maximum Degenerate A saddle piont
$ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 $ Not possible Degenerate A saddle point
반응형
Comments