일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
- DENOMINATOR
- College
- solution
- a-level
- factors
- test
- fractions
- 영국
- integral
- 학년
- Maths
- 교육
- factor
- GCSE
- mathematics
- division
- algebraic
- Order
- differential
- t-치환
- Partial
- equation
- 제도
- 치환
- Weierstrass
- 적분
- 바이어슈트라스
- triangle
- Admissions
- Oxford
- Today
- Total
Cambridge Maths Academy
A classification of critical points with the Hessian matrix 본문
A classification of critical points with the Hessian matrix
Cambridge Maths Academy 2022. 4. 11. 00:25
수학 모음 (Maths collection) 전체보기
For a function which depends on two variables $(x,y)$, $$ \textrm f = \textrm f(x,y) $$ we find the critical points by considering 2-dimensional gradient and second-order derivatives. (In 1D, the critical points are usually called the stationary points.)
(i) Critical points: $$ \begin{align} \nabla \textrm f = \left( \frac{ \partial \textrm f }{ \partial x }, \frac{ \partial \textrm f }{ \partial y } \right) = 0 \end{align} $$
(ii) A classification: We consider a 2-dimensional Taylor expansion $$ \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \left( \frac{ \partial \textrm f }{ \partial x } \Delta x + \frac{ \partial \textrm f }{ \partial y } \Delta y \right) + \frac12 \left[ \frac{ \partial^2 \textrm f }{ \partial x^2 } (\Delta x)^2 + \frac{ \partial^2 \textrm f }{ \partial x \partial y } \Delta x \Delta y + \frac{ \partial \textrm f }{ \partial y } (\Delta y)^2 \right] + \cdots \\ &= \textrm f( \textbf x ) + \underbrace{ \begin{pmatrix} \frac{ \partial \textrm f }{ \partial x } \\ \frac{ \partial \textrm f }{ \partial y } \end{pmatrix} }_{ \nabla \textrm f } \cdot \underbrace{ \begin{pmatrix} \Delta x \\ \Delta y \end{pmatrix} }_{ \Delta \textbf x } + \frac12 \underbrace{ \begin{pmatrix} \Delta x & \Delta y \end{pmatrix} }_{ \Delta \textbf x^\intercal } \underbrace{ \begin{pmatrix} \frac{ \partial^2 \textrm f }{ \partial x^2 } & \frac{ \partial^2 \textrm f }{ \partial x \partial y } \\ \frac{ \partial^2 \textrm f }{ \partial y \partial x } & \frac{ \partial^2 \textrm f }{ \partial y^2 } \end{pmatrix} }_{ H } \underbrace{ \begin{pmatrix} \Delta x \\ \Delta y \end{pmatrix} }_{ \Delta \textbf x } \\ &= \textrm f( \textbf x ) + \nabla \textrm f \cdot \Delta \textbf x + \frac12 \Delta \textbf x^\intercal H \Delta \textbf x + \cdots \end{align}$$ where the Hessian matrix $H$ is defined by $$ \begin{align} H = \begin{pmatrix} \frac{ \partial^2 \textrm f }{ \partial x^2 } & \frac{ \partial^2 \textrm f }{ \partial x \partial y } \\ \frac{ \partial^2 \textrm f }{ \partial y \partial x } & \frac{ \partial^2 \textrm f }{ \partial y^2 } \end{pmatrix} \end{align} $$
Aside. A multi-dimensional Taylor expansion reads $$ \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \sum_{i=1}^n \frac{ \partial \textrm f }{ \partial x_i } \Delta x_i + \sum_{i,j} \frac{ \partial^2 \textrm f }{ \partial x_i \partial x_j } \Delta x_i \Delta x_j + \sum_{i,j,k} \frac{ \partial^3 \textrm f }{ \partial x_i \partial x_j \partial x_k } \Delta x_i \Delta x_j \Delta x_k + \cdots \end{align} $$
(iii) Diagonalisation: As we diagonalise the Hessian matrix using the eigenvalue equation, $$ \begin{align} && H \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} = \underbrace{ \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} }_{ P^{-1} } \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \\ \\ &\Rightarrow& H = \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix} \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \begin{pmatrix} \textbf e_1 & \textbf e_2 \end{pmatrix}^{-1} = P^{-1} \Lambda P \end{align} $$ Since $H$ is real symmetric, the eigenvectors form an orthonormal basis and $P$ is thus an orthogonal matrix, i.e. $$ P^\intercal P = \mathbb I \qquad \Leftrightarrow \qquad P^{-1} = P^\intercal $$ which gives $$ H = P^\intercal \Lambda P $$
The Taylor expansion may be re-written as $$ \begin{align} \textrm f( \textbf x + \Delta \textbf x) &= \textrm f( \textbf x ) + \Delta \textbf x \cdot \nabla \textrm f + \frac12 ( P \Delta \textbf x)^\intercal \Lambda ( P \Delta \textbf x ) + \cdots \end{align} $$ For critical points $\textbf x_0$, where $\nabla \textrm f = 0$, this gives $$ \begin{align} \textrm f( \textbf x_0 + \Delta \textbf x) &= \textrm f( \textbf x_0 ) + \frac12 ( P \Delta \textbf x)^\intercal \Lambda ( P \Delta \textbf x ) + \cdots \\ &= \textrm f( \textbf x_0 ) + \frac12 ( P \Delta \textbf x)^\intercal \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} ( P \Delta \textbf x ) + \cdots \end{align} $$ Thus, the eigenvalues of $H$ tells about the nature of the critical point(s), i.e.
- If $ \lambda_1 > 0 $ and $ \lambda_2 > 0 $, $ \textrm f( \textbf x_0 + \Delta \textbf x) > \textrm f ( \textbf x_0 ) $ for all $ \Delta \textbf x $, hence the critical point is a local minimum.
- If $ \lambda_1 < 0 $ and $ \lambda_2 < 0 $, $ \textrm f( \textbf x_0 + \Delta \textbf x) < \textrm f ( \textbf x_0 ) $ for all $ \Delta \textbf x $, hence the critical point is a local maximum.
- If $ \lambda_1 \lambda_2 < 0 $, i.e. the two eigenvalues take the opposite signs, $ \textrm f( \textbf x_0 + \Delta \textbf x) > \textrm f ( \textbf x_0 ) $ in one direction while $ \textrm f( \textbf x_0 + \Delta \textbf x) < \textrm f ( \textbf x_0 ) $ in its orthogonal direction, henc the critical point is a saddle point.
- If $ \lambda_1 \lambda_2 = 0 $, i.e. at least one of them is zero, then the critical point is degenerate as the surface is flat in the relevant direction - the direction of the eigenvector whose eigenvalue is zero.
(iv) Eigenvalues: The eigenvalues of the Hessian matrix are given by $$ \begin{align} && \begin{vmatrix} \textrm f_{xx} - \lambda & \textrm f_{xy} \\ \textrm f_{xy} & \textrm f_{yy} - \lambda \end{vmatrix} = 0 \\ \\ & \Rightarrow & ( \textrm f_{xx} - \lambda )( \textrm f_{yy} - \lambda ) - \textrm f_{xy}^2 = 0 \\ & \Rightarrow & \lambda^2 - ( \textrm f_{xx} + \textrm f_{yy} ) \lambda + \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) = 0 \\ & \Rightarrow & \lambda^2 - ( \textrm{tr} \, H ) \lambda + \det H = 0 \\ \\ & \Rightarrow & \lambda = \frac{ \textrm{tr} \, H \pm \sqrt{ ( \textrm{tr} \, H )^2 - 4 \det H } }{ 2 } \end{align} $$ which gives $$ \begin{align} & \Rightarrow & \lambda &= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ ( \textrm f_{xx} + \textrm f_{yy} )^2 - 4 \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ \left( \textrm f_{xx}^2 + 2 \textrm f_{xx} \textrm f_{yy} + \textrm f_{yy}^2 \right) - 4 \left( \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \right) } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ \left( \textrm f_{xx}^2 - 2 \textrm f_{xx} \textrm f_{yy} + \textrm f_{yy}^2 \right) + 4 \textrm f_{xy}^2 } }{ 2 } \\ &&&= \frac{ ( \textrm f_{xx} + \textrm f_{yy} ) \pm \sqrt{ ( \textrm f_{xx} - \textrm f_{yy} )^2 + 4 \textrm f_{xy}^2 } }{ 2 } \end{align} $$ So we see that the discriminant is always non-negative, i.e. $$ \Delta = ( \textrm{tr} \, H )^2 - 4 \det H = ( \textrm f_{xx} - \textrm f_{yy} )^2 + 4 \textrm f_{xy}^2 \ge 0 $$
(1) For $\Delta = 0$, i.e. $$ ( \textrm{tr} \, H )^2 = 4 \det H \qquad \Leftrightarrow \qquad \textrm f_{xx} = \textrm f_{yy} \quad \textrm{and} \quad \textrm f_{xy} = 0 $$ the eigenvalues are double roots and $ \det H > 0$. $$ p = \frac{ \textrm{tr} \, H }{ 2 } = \frac{ \textrm f_{xx} + \textrm f_{yy} }{ 2 } $$
- For $ \textrm{tr} \, H > 0$, it gives a local minimum.
- For $ \textrm{tr} \, H < 0$, it gives a local maximum.
- For $ \textrm{tr} \, H = 0$, we also find $ \det H = 0 $ and the critical point is degenerate.
(2) For $\Delta > 0$: then there are two distinct eigenvalues.
- If $\det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 <0$, the two eigenvalues are of the opposite signs. The critical point is a saddle point.
- If $\det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0$, one of the eigenvalues is zero. $$ p_1 = 0 \quad \textrm{and} \quad p_2 = \textrm{tr} \, H $$ The critical point is degenerate as the surface is flat in the relevant direction - the direction of the eigenvector whose eigenvalue is zero.
If $\det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 >0$,
(i) if $ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 $, the critical point is a local minimum;
(ii) if $ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 $, the critical point is a local maximum.
If $ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 $, then $$ p = \pm \sqrt{ - \det H } $$ and $$ \begin{align} \det H &= \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 \\ &= - \textrm f_{xx}^2 - \textrm f_{xy}^2 \le 0 \end{align} $$
(i) if $ \det H = 0 $, i.e. $ \textrm f_{xx} = \textrm f_{yy} = \textrm f_{xy} = 0 $, the two eigenvalues are both zero so the critical point is degenerate;
(ii) if $ \det H < 0 $, the two eigenvalues take the opposite signs so the critical point is a saddle point.
The results can be summarised: $$ \begin{align} \begin{array}{|c|c|c|c|} \hline & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 > 0 & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0 & \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 < 0 \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0 & \textrm{A lcoal minimum} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0 & \textrm{A lcoal maximum} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 & \textrm{Not possible} & \textrm{Degenerate} & \textrm{A saddle point} \\\hline \end{array} \end{align} $$
$ \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 > 0 $ | $ \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 = 0 $ | $ \det H = \textrm f_{xx} \textrm f_{yy} - \textrm f_{xy}^2 < 0$ | |
$ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} > 0$ | A local minimum | Degenerate | A saddle point |
$ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} < 0$ | A local maximum | Degenerate | A saddle piont |
$ \textrm{tr} \, H = \textrm f_{xx} + \textrm f_{yy} = 0 $ | Not possible | Degenerate | A saddle point |
'수학 모음 (Maths collection) > Technical A - Exploring ideas' 카테고리의 다른 글
Integration of a product of an exponential and a trigonometric/hyperbolic function (0) | 2022.03.08 |
---|---|
A first-order differential equation with complex coefficients (0) | 2022.03.05 |
16. Integration of powers of the sine function (Wallis integral) (0) | 2021.12.09 |
15. Integration of the square-root of cot x (0) | 2021.11.24 |
14. Integration of the square-root of tan x (0) | 2021.11.23 |