We consider gradient descent on functions of the form $L_1 = |f|$ and $L_2 = f^2$, where $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is any smooth function with 0 as a regular value. We show that gradient descent implemented with a discrete step size $\tau$ behaves qualitatively differently from continuous gradient descent. We show that over long time scales, continuous and discrete gradient descent on $L_1$ find different minima of $L_1$, and we can characterize the difference - the minima that tend to be found by discrete gradient descent lie in a secondary critical submanifold $M' \subset M$, the locus within $M$ where the function $K=|\nabla f|^2 \big|_M$ is minimized. In this paper, we explain this behavior. We also study the more subtle behavior of discrete gradient descent on $L_2$.
from cs updates on arXiv.org https://ift.tt/2MPCDXe
//
0 comments:
Post a Comment