Events

DMS Colloquium: Yuesheng Xu

Time: Nov 03, 2023 (04:00 PM)
Location: 010 ACLC

Details:

Refreshments 3:30 p.m., Parker 244

 

yueshengxu.jpg 

Speaker: Yuesheng Xu

Title: Multi-Grade Deep Learning and its Application in Solutions of Nonlinear Differential Equations
 
 
Abstract: The great success of deep learning has been widely recognized. From a mathematical perspective, such successes are mainly due to the powerful expressiveness of deep neural networks in representing a function. Deep learning requires solving a nonconvex optimization problem of a large size to learn a deep neural network. The current deep learning model is of a single-grade, that is, it learns a deep neural network by solving a single nonconvex optimization problem. When the layer number of the neural network is large, it is computationally challenging to carry out such a task efficiently. The complexity of the task comes from learning all weight matrices and bias vectors from one single nonconvex optimization problem of a large size. Inspired by the human education process which arranges learning in grades, we propose a multi-grade learning model: Instead of solving one single optimization problem of a large size, we successively solve a number of optimization problems of small sizes, which are organized in grades, to learn a shallow neural network for each grade. Specifically, the current grade is to learn the leftover from the previous grade. In each of the grades, we learn a shallow neural network stacked on top of the neural network,  learned in the previous grades, which remains unchanged in training of the current and future grades. By dividing the task of learning a deep neural network into learning several shallow neural networks, one can alleviate the severity of the nonconvexity of the original optimization problem of a large size. When all grades of the learning are completed, the final neural network learned is a stair-shape neural network, which is the superposition of networks learned from all grades. Such a model enables us to learn a deep neural network much more effectively and efficiently. We provide several numerical examples of numerical solutions of the Burger equation (1D-3D), which demonstrate that the proposed multi-grade model significantly outperforms the traditional single-grade model.