Cambridge Maths Academy

A classification of critical points with the Hessian matrix 본문

수학 모음 (Maths collection)/Technical A - Exploring ideas

A classification of critical points with the Hessian matrix

Cambridge Maths Academy 2022. 4. 11. 00:25
반응형

수학 모음 (Maths collection) 전체보기

For a function which depends on two variables (x,y), f=f(x,y) we find the critical points by considering 2-dimensional gradient and second-order derivatives. (In 1D, the critical points are usually called the stationary points.)

 

(i) Critical points: f=(fx,fy)=0

 

(ii) A classification: We consider a 2-dimensional Taylor expansion f(x+Δx)=f(x)+(fxΔx+fyΔy)+12[2fx2(Δx)2+2fxyΔxΔy+fy(Δy)2]+=f(x)+(fxfy)f(ΔxΔy)Δx+12(ΔxΔy)Δx where the Hessian matrix H is defined by \begin{align} H = \begin{pmatrix} \frac{ \partial^2 \textrm f }{ \partial x^2 } & \frac{ \partial^2 \textrm f }{ \partial x \partial y } \\ \frac{ \partial^2 \textrm f }{ \partial y \partial x } & \frac{ \partial^2 \textrm f }{ \partial y^2 } \end{pmatrix} \end{align}

 

Aside. A multi-dimensional Taylor expansion reads \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \sum_{i=1}^n \frac{ \partial \textrm f }{ \partial x_i } \Delta x_i + \sum_{i,j} \frac{ \partial^2 \textrm f }{ \partial x_i \partial x_j } \Delta x_i \Delta x_j + \sum_{i,j,k} \frac{ \partial^3 \textrm f }{ \partial x_i \partial x_j \partial x_k } \Delta x_i \Delta x_j \Delta x_k + \cdots \end{align}

 

(iii) Diagonalisation: As we diagonalise the Hessian matrix using the eigenvalue equation, \begin{align} && H \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} = \underbrace{ \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} }_{ P^{-1} } \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \\ \\ &\Rightarrow& H = \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix}^{-1} = P^{-1} \Lambda P \end{align} Since H is real symmetric, the eigenvectors form an orthonormal basis and P is thus an orthogonal matrix, i.e. P^\intercal P = \mathbb I \qquad \Leftrightarrow \qquad P^{-1} = P^\intercal which gives H = P^\intercal \Lambda P

 

The Taylor expansion may be re-written as \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \Delta \textbf x \cdot \nabla \textrm f + \frac12 ( P \Delta \textbf x)^\intercal \Lambda ( P \Delta \textbf x ) + \cdots \end{align} For critical points \textbf x_0, where \nabla \textrm f = 0, this gives \begin{align} \textrm f( \textbf x_0 + \Delta \textbf x) &= \textrm f( \textbf x_0 ) + \frac12 ( P \Delta \textbf x)^\intercal \Lambda ( P \Delta \textbf x ) + \cdots \\ &= \textrm f( \textbf x_0 ) + \frac12 ( P \Delta \textbf x)^\intercal \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} ( P \Delta \textbf x ) + \cdots \end{align} Thus, the eigenvalues of H tells about the nature of the critical point(s), i.e.

  • If \lambda_1 > 0 and \lambda_2 > 0 , \textrm f( \textbf x_0 + \Delta \textbf x) > \textrm f ( \textbf x_0 ) for all \Delta \textbf x , hence the critical point is a local minimum.
  • If \lambda_1 < 0 and \lambda_2 < 0 , \textrm f( \textbf x_0 + \Delta \textbf x) < \textrm f ( \textbf x_0 ) for all \Delta \textbf x , hence the critical point is a local maximum.
  • If \lambda_1 \lambda_2 < 0 , i.e. the two eigenvalues take the opposite signs, \textrm f( \textbf x_0 + \Delta \textbf x) > \textrm f ( \textbf x_0 ) in one direction while \textrm f( \textbf x_0 + \Delta \textbf x) < \textrm f ( \textbf x_0 ) in its orthogonal direction, henc the critical point is a saddle point.
  • If \lambda_1 \lambda_2 = 0 , i.e. at least one of them is zero, then the critical point is degenerate as the surface is flat in the relevant direction - the direction of the eigenvector whose eigenvalue is zero.

 

(iv) Eigenvalues: The eigenvalues of the Hessian matrix are given by \begin{align} && \begin{vmatrix} \textrm f_{xx} - \lambda & \textrm f_{xy} \\ \textrm f_{xy} & \textrm f_{yy} - \lambda \end{vmatrix} = 0 \\ \\ & \Rightarrow & ( \textrm f_{xx} - \lambda )( \textrm f_{yy} - \lambda ) - \textrm f_{xy}^2 = 0 \\ & \Rightarrow & \lambda^2 - ( \textrm f_{xx} + \textrm f_{yy} ) \lambda + \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) = 0 \\ & \Rightarrow & \lambda^2 - ( \textrm{tr} \, H ) \lambda + \det H = 0 \\ \\ & \Rightarrow & \lambda = \frac{ \textrm{tr} \, H \pm \sqrt{ ( \textrm{tr} \, H )^2 - 4 \det H } }{ 2 } \end{align} which gives \begin{align} & \Rightarrow & \lambda &= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ ( \textrm f_{xx} + \textrm f_{yy} )^2 - 4 \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ \left( \textrm f_{xx}^2 + 2 \textrm f_{xx} \textrm f_{yy} + \textrm f_{yy}^2 \right) - 4 \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ \left( \textrm f_{xx}^2 - 2 \textrm f_{xx} \textrm f_{yy} + \textrm f_{yy}^2 \right) + 4 \textrm f_{xy}^2 } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ ( \textrm f_{xx} - \textrm f_{yy} )^2 + 4 \textrm f_{xy}^2 } }{ 2 } \end{align} So we see that the discriminant is always non-negative, i.e. \Delta = ( \textrm{tr} \, H )^2 - 4 \det H = ( \textrm f_{xx} - \textrm f_{yy} )^2 + 4 \textrm f_{xy}^2 \ge 0

 

(1) For \Delta = 0, i.e. ( \textrm{tr} \, H )^2 = 4 \det H \qquad \Leftrightarrow \qquad \textrm f_{xx} = \textrm f_{yy} \quad \textrm{and} \quad \textrm f_{xy} = 0 the eigenvalues are double roots and \det H > 0. p = \frac{ \textrm{tr} \, H }{ 2 } = \frac{ \textrm f_{xx} + \textrm f_{yy} }{ 2 }

  • For \textrm{tr} \, H > 0, it gives a local minimum.
  • For \textrm{tr} \, H < 0, it gives a local maximum.
  • For \textrm{tr} \, H = 0, we also find \det H = 0 and the critical point is degenerate.

 

(2) For \Delta > 0: then there are two distinct eigenvalues.

  • If \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 <0, the two eigenvalues are of the opposite signs. The critical point is a saddle point.
  • If \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0, one of the eigenvalues is zero. p_1 = 0 \quad \textrm{and} \quad p_2 = \textrm{tr} \, H The critical point is degenerate as the surface is flat in the relevant direction - the direction of the eigenvector whose eigenvalue is zero.
  • If \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 >0,

    (i) if \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 , the critical point is a local minimum;

    (ii) if \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 , the critical point is a local maximum.

  • If \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 , then p = \pm \sqrt{ - \det H } and \begin{align} \det H &= \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \\ &= - \textrm f_{xx}^2 - \textrm f_{xy}^2 \le 0 \end{align}

    (i) if \det H = 0 , i.e. \textrm f_{xx} = \textrm f_{yy} = \textrm f_{xy} = 0 , the two eigenvalues are both zero so the critical point is degenerate;

    (ii) if \det H < 0 , the two eigenvalues take the opposite signs so the critical point is a saddle point.

The results can be summarised: \begin{align} \begin{array}{|c|c|c|c|} \hline & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 > 0 & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0 & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 < 0 \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 & \textrm{A lcoal minimum} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 & \textrm{A lcoal maximum} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 & \textrm{Not possible} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \end{array} \end{align}

 

  \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 > 0 \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0 \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 < 0
\textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 A local minimum Degenerate A saddle point
\textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 A local maximum Degenerate A saddle piont
\textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 Not possible Degenerate A saddle point
반응형