Understanding Neural Networks as Splines
How does a neural network approximate a given function? What kinds of functions can it approximate well/poorly? How does the optimization algorithm bias learning? What is the structure of the loss surface and Hessian and how does that impact generalization? Deep Learning has revolutionized many fields, and yet answers to fundamental questions like these remain elusive. Here we present a new emerging viewpoint -- the function space or spline perspective -- that has the power to answer these questions. We find that understanding neural nets is most easily done in the function space, via a novel spline parametrization. This change of coordinates sheds light on many perplexing phenomena, providing simple explanations for the necessity of overparameterization, the structure of loss surface and Hessian, the consequent difficulty of training, and, perhaps most importantly, the phenomenon and mechanism underlying implicit regularization.
Understanding the representation, learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. As such, we propose reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat functions upon initialization, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with recent work. Our spline-based approach reproduces key implicit regularization results from recent work but in a far more intuitive and transparent manner. In addition to understanding, the spline lens suggests new kinds of data-dependent initializations and learning algorithms that combine gradient descent with other more global optimization algorithms.
We briefly discuss future work applying the spline lens to: neuronally consistent networks (with saturating activation functions, E/I Balance, and cell types) and to developing new experimental protocols that can test for and characterize implicit regularization in the brain. Going forward, we believe the spline lens will play a foundational role in efforts to understand and design artificial and real neural networks.
BIO: Ankit B. Patel is currently an Assistant Professor at the Baylor College of Medicine in the Dept. of Neuroscience, and at Rice University in the Dept. of Electrical and Computer Engineering. Ankit is broadly interested in the intersection between (deep) machine learning and computational neuroscience, two areas essential for understanding and building truly intelligent systems, with a focus on the low-level mechanisms by which learned representations work. He works with neuroscientists to build a bridge between artificial and real neuronal networks, using theories and experiments about artificial nets to help understand and make testable predictions about real brain circuits. Ankit returned to academia after spending 6 years in industry, building real-time inference systems trained on large-scale data for ballistic missile defense (MIT Lincoln Laboratory), and high-frequency trading. He received his graduate and undergraduate degrees in Computer Science and Applied Mathematics from Harvard University.