This joint seminar of machine learning program aims to integrate high-quality teaching and research resources at home and abroad for outstanding domestic students and young professors, and create the world's top learning and research environment and research platform, making them the best young talents in machine learning and related fields. Please see here for the early event webpage.
At this stage, we mainly invite domestic and foreign researchers who are most active in the field of machine learning to systematically introduce the latest developments in machine learning research and the most cutting-edge scientific issues. The main content includes: the basic research of machine learning, the application of machine learning in scientific problems and the application of machine learning in the industrial field. Seminars are held every two weeks. Two scholars are invited to report each time.
Organizing Committee：Weinan E, Bin Dong, Weiguo Gao, Zhongyi Huang, Han Wang, Zhiqin John Xu, Zhouwang Yang, Linfeng Zhang.
Abstract：冷冻电镜显微学（cryo-electron microscopy）是结构⽣物学领域⾥⼀项⾮常重要的实验技术，并且在近⼗年⾥有突破性进展。这项技术对阐释包括新冠病毒作⽤机理在内的众多⽣物医学问题提供了重要帮助。它的三位创⽴者在2017 年被授予了诺⻉尔化学奖。相对于传统结构⽣物学技术，可适⽤于冷冻电镜成像的⽣物样品范围更广，⽽且从同⼀批样品中可以解析出多个结构，从⽽更全⾯的帮助我们理解⽣物⼤分⼦的功能机理。本次报告将侧重于冷冻电镜显微学⾥发展最快的⼀个分⽀：单颗粒冷冻电镜分析技术（single-particle analysis）。我会介绍这项技术的原理，应⽤，相关传统算法和基于机器学习的算法。虽然冷冻电镜实验技术正在逐步⾛向成熟，相关数据分析算法的精准度和鲁棒性仍有很⼤的提升空间。我希望能借助这次报告激发⼤家对冷冻电镜技术的兴趣，⼀起推动冷冻电镜算法的发展。
Abstract：碳氢分子是燃料的主要组成部分，探明其燃烧机理是实现模拟发动机燃烧，进而推动发动机设计所需要重点解决的问题之一。受限于力场精度，目前广泛采用的分子力学方法对燃烧反应模拟结果的可靠性仍有较大提升空间。由于量子化学方法需要大量的计算资源，直接用其模拟碳氢燃料的燃烧机理是不可行的。在前期工作中，我们发展了一种基于分块的量子化学计算方法MFCC-combustion，实现了对模拟体系能量和力的高效精确计算。该方法与动力学模拟软件结合后，通过合理的控温控压，可实现对燃料燃烧的从头算分子动力学模拟 (AIMD)。最近，我们基于深度势能(Deep Potential)模型将AIMD的模拟效率进一步提高了三个数量级左右，从而实现了对碳氢燃料的纳秒级反应动力学模拟。该方法的发展有望为碳氢燃料燃烧机理、燃烧基础数库的构建和完善提供一个高效准确的研究工具。
Tecent Meeting Link： https://meeting.tencent.com/dm/p1q2HxwW3x4r
Tecent Meeting download link： https://meeting.tencent.com/download-center.html
Living Room： bilibili living room
Alternate meeting ID (no password)： 553 2421 7498
Title：Deep Network Approximation: Achieving Arbitrary Error with Fixed Size
Affiliations： National University of Singapore
Abstract：This talk discusses a new type of simple feed-forward neural network that achieves the universal approximation property for all continuous functions with a fixed network size. This new type of neural network is simple because it is designed with a simple and computable continuous activation function σ leveraging a triangular-wave function and a softsign function. First, we prove that σ-activated networks with width 36d(2d + 1) and depth 11 can approximate any continuous function on a d-dimensional hypercube with an arbitrarily small error. Next, we show that classification functions arising from image and signal classification can be exactly represented by σ-activated networks with width 36d(2d + 1) and depth 12, when there exist pairwise disjoint closed bounded subsets of Rd such that the samples of the same class are located in the same subset.
Title：Deep Network Approximation: Error Characterization in term of Width and Depth
Affiliations： Purdue University
Abstract：Deep neural networks are a powerful tool in many applications in sciences, engineering, technology, and industries, especially for large-scale and high-dimensional learning problems. This talk focuses on the mathematical understanding of deep neural networks. In particular, a relation of the approximation properties of deep neural networks and function compositions is characterized. The approximation error of ReLU networks in terms of the width and depth is given for various function spaces, such as space of polynomials, continuous functions, or smooth functions on a hypercube. Finally, to achieve a better approximation error, we introduce a new type of network called Floor-ReLU networks, built with each neuron activated by either Floor or ReLU.
Title：Embedding Principle of Loss Landscape of Deep Neural Networks
Affiliations： Shanghai Jiao Tong University
Abstract：Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this talk, we present an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy and preserving the DNN output function. Note that, given any training data, differentiable loss function and differentiable activation function, this embedding structure of critical points holds. This general structure of DNNs is starkly different from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides a new perspective to study the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, this work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near future.
Title：Global Loss Landscape of Neural Networks: What do We Know and What do We Not Know?
Affiliations： University of Illinois at Urbana-Champaign
Abstract：The recent success of neural networks suggests that their loss landscape is not too bad, but what concrete results do we know and not know about the landscape? In this talk, we present a few recent results on the landscape. First, non-linear neural nets can have sub-optimal local minima under mild assumptions (for arbitrary width, for generic input data and for most activation functions). Second, wide networks do not have sub-optimal ``basin'' for any continuous activation, while narrow networks can have sub-optimal basin. We will present a simple 2D geometrical object that is a basic component of neural net landscape, which can visually explain the above two results. Third, we show that for ReQU and ReLU networks, adding a proper regularizer can eliminate sub-optimal local minima and decreasing paths to infinity. Together, these results demonstrate that wide neural nets have a ``nice landscape'', but the meaning of ``nice landscape'' is more subtle than we expected. We will also mention the limitation of existing results, e.g., in what settings we do not know the existence of sub-optimal local minima. We will briefly discuss the relevance of the landscape results in two aspects: (1) these results can help understand the training difficulty of narrow networks; (2) these results can potentially help convergence analysis and implicit regularization.
Title：Some theoretical results on model-based reinforcement learning
Affiliations： Princeton University
Abstract：We discuss some recent results on model-based methods for reinforcement learning (RL) in both online and offline problems. For the online RL problem, we discuss several model-based RL methods that adaptively explore an unknown environment and learn to act with provable regret bounds. In particular, we focus on finite-horizon episodic RL where the unknown transition law belongs to a generic family of models. We propose a model based ‘value-targeted regression’ RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, for arbitrary family of transition models, using the notion of the so-called Eluder dimension proposed by Russo & Van Roy (2014). Next we discuss batch data (offline) reinforcement learning, where the goal is to predict the value of a new policy using data generated by some behavior policy (which may be unknown). We show that the fitted Q-iteration method with linear function approximation is equivalent to a model-based plugin estimator. We establish that this model-based estimator is minimax optimal and its statistical limit is determined by a form of restricted chi-square divergence between the two policies.
Title：Policy Cover Guided Exploration in Model-free and Model-based Reinforcement Learning
Affiliations： Cornell University
Abstract：Existing RL methods that leverage random exploration techniques often fail to learn efficiently in environments where strategic exploration is needed. In this talk, we introduce a new concept called Policy Cover, which is an ensemble of learned policies. The policy cover encodes information about which part of the state space is explored and such information can be used to further guide the exploration process. We show that this idea can be used in both model-free and model-based algorithmic frameworks. Particularly, for model-free learning, we present the first policy gradient algorithm--- Policy Cover Policy Gradient (PG-PG), that can explore and learn with polynomial sample complexity. For model-based setting, we present an algorithm — Policy-Cover Model Learning and Planning (PC-MLP), that also learns and explores with polynomial sample complexity. For both approaches, we show that they are flexible enough to be used together with deep neural networks, and their deep versions achieve state-of-art performance on common benchmarks, including exploration challenging tasks such as reward-free Maze exploration.
Title：In Search of Effective and Reproducible Clinical Imaging Biomarkers for Population Health and Oncology Applications of Screening, Diagnosis and Prognosis.
Reporter：Le Lu (吕乐)
Affiliations： PhD, FIEEE, MICCAI Board Member, AE for TPAMI
Abstract：This talk will first give an overall on the work of employing deep learning to permit novel clinical workflows in two population health tasks, namely using conventional ultrasound for liver steatosis screening and quantitative reporting; osteoporosis screening via conventional X-ray imaging and "AI readers". These two tasks were generally considered as infeasible tasks for human readers, but as proved by our scientific and clinical studies and peer-reviewed publications, they are suitable for AI readers. AI can be a supplementary and useful tool to assist physicians for cheaper and more convenient/precision patient management. Next, the main part of this talk describes a roadmap on three key problems in pancreatic cancer imaging solution: early screening, precision differential diagnosis, and deep prognosis on patient survival prediction. (1) Based on a new self-learning framework, we train the pancreatic ductal adenocarcinoma (PDAC) segmentation model using a larger quantity of patients, with a mix of annotated/unannotated venous or multi-phase CT images. Pseudo annotations are generated by combining two teacher models with different PDAC segmentation specialties on unannotated images, and can be further refined by a teaching assistant model that identifies associated vessels around the pancreas. Our approach makes it technically feasible for robust large-scale PDAC screening from multi-institutional multi-phase partially-annotated CT scans. (2) We propose a holistic segmentation-mesh classification network (SMCN) to provide patient-level diagnosis, by fully utilizing the geometry and location information. SMCN learns the pancreas and mass segmentation task and builds an anatomical correspondence-aware organ mesh model by progressively deforming a pancreas prototype on the raw segmentation mask. Our results are comparable to a multimodality clinical test that combines clinical, imaging, and molecular testing for clinical management of patients with cysts. (3) Accurate preoperative prognosis of resectable PDACs for personalized treatment is highly desired in clinical practice. We present a novel deep neural network for the survival prediction of resectable PDAC patients, 3D Contrast-Enhanced Convolutional Long Short-Term Memory network (CE-ConvLSTM), to derive the tumor attenuation signatures from CE-CT imaging studies. Our framework can significantly improve the prediction performances upon existing state-of-the-art survival analysis methods. This deep tumor signature has evidently added values (as a predictive biomarker) to be combined with the existing clinical staging system.
Title：Deep Learning for inverse problems
Affiliations： Stanford University
Abstract：This talk is about some recent progress on solving inverse problems using deep learning. Compared to traditional machine learning problems, inverse problems are often limited by the size of the training data set. We show how to overcome this issue by incorporating mathematical analysis and physics into the design of neural network architectures. We first describe neural network representations of pseudodifferential operators and Fourier integral operators. We then continue to discuss applications including electric impedance tomography, optical tomography, inverse acoustic/EM scattering, seismic imaging, and travel-time tomography.
Title：Self-supervised Deep Learning for Solving Inverse Problems in Imaging
Affiliations： National University of Singapore
Abstract：Deep learning has become a prominent tool for solving many inverse problems in imaging sciences. Most existing SOTA solutions are built on supervised learning with a prerequisite on the availability of a large-scale dataset of many degraded/truth image pairs. In recent years, driven by practical need, there is an increasing interest on studying deep learning methods under limited data resources, which has particular significance for imaging in science and medicine. This talk will focus on the discussion of self-supervised deep learning for solving inverse imaging problems, which assumes no training sample is available. By examining deep learning from the perspective of Bayesian inference, we will present several results and techniques on self-supervised learning for MMSE (minimum mean squared error) estimator. Built on these techniques, we will show that, in several applications, the resulting dataset-free deep learning methods provide very competitive performance in comparison to their SOTA supervised counterparts. While the demonstrations only cover image denoising, compressed sensing, and phase retrieval, the presented techniques and methods are quite general which can used for solving many other inverse imaging problems.
Affiliations： Pennsylvania State University
Abstract：In this talk, we will present some recent results related to the Barron space studiedf by E et al in 2019. For the ReLU activation function, we give an equivalent characterization of the corresponding Barron space in terms of the convex hull of an appropriate dictionary. This characterization enables a generalization of the notion of Barron space to neural networks with general activation functions, and even to general dictionaries of functions. We provide an explicit representation of some Barron norms which are of particular interest to the theory of neural networks, specifically corresponding to a dictionary of decaying Fourier modes and the dictionary corresponding to shallow ReLU^k networks. Next, we present optimal estimates of approximation rates, metric entropy, Kolmogorov and Gelfand n-widths of the Barron space unit ball with respect to L^2 for ReLU^k activation functions and the dictionary of decaying Fourier modes. These results provide a solution to several open problems concerning the precise approximation properties of these spaces. If time allows, we will also give recent results on the approximation rates and metric entropies for sigmoidal and ReLU Barron spaces with respect to L^p for p > 2. This talk is based on joint work with Jonathan Siegel.
Title：Convergence analysis for the gradient descent optimization method in the training of artificial neural networks with ReLU activation for piecewise linear target functions
Applied Mathematics: Institute for Analysis and Numerics, Faculty of Mathematics and Computer Science, University of Muenster, Germany;
School of Data Science and Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China
Abstract：Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains -- even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer -- an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of ANNs with ReLU activation to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity. In this talk we prove this conjecture in the special situation where the probability distribution of the input data is absolutely continuous with respect to the continuous uniform distribution on a compact interval and where the target function under consideriation is piecewise linear.
Title：Learning and Learning to Solve PDEs
Abstract：Deep learning continues to dominate machine learning and has been successful in computer vision, natural language processing, etc. Its impact has now expanded to many research areas in science and engineering. In this talk, I will mainly focus on some recent impact of deep learning on computational mathematics. I will present our recent work on bridging deep neural networks with numerical differential equations. On the one hand, I will show how to design transparent deep convolutional networks to uncover hidden PDE models from observed dynamical data. On the other hand, I will present our recent preliminary attempts to combine wisdoms from numerical PDEs and machine learning to design data-driven solvers for PDEs.
Title：Neural Operator: Learning Maps Between Function Spaces
Abstract：The classical development of neural networks has primarily focused on learning mappings between finite dimensional Euclidean spaces or finite sets. We propose a generalization of neural networks tailored to learn operators mapping between infinite dimensional function spaces. We formulate the approximation of operators by composition of a class of linear integral operators and nonlinear activation functions, so that the composed operator can approximate complex nonlinear operators. We prove a universal approximation theorem for our construction. The proposed neural operators are resolution-invariant: they share the same network parameters between different discretization of the underlying function spaces and can be used for zero-shot super-resolutions. Numerically, the proposed models show superior performance compared to existing machine learning based methodologies on Burgers' equation, Darcy flow, and the Navier-Stokes equation, while being several order of magnitude faster compared to conventional PDE solvers.
Reporter：许志钦 (Zhi-Qin John Xu), 上海交通大学
Abstract：In this talk, I would introduce frequency principle (F-Principle) in detail, including experiments, and theory. I would also connect the F-Principle with traditional iterative methods, such as Jacobi methods, understanding the training of neural networks from the perspective of numerical analysis. Then, I will use some examples to show how F-Principle benefits the design of neural networks. Finally, I would talk about some open questions about the F-Principle.
Title：DeePKS: a machine learning assisted electronic structure model
Abstract：We introduce a general machine learning-based framework for building an accurate and widely-applicable energy functional within the framework of generalized Kohn-Sham density functional theory. In particular, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels. We demonstrate that the functional that results from this training procedure, with the efficiency of cheap density functional models, gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules.
Title：Modelling Temporal Data: from RNNs to CNNs
Reporter：Zhong Li, Qianxiao Li
Abstract：There are several competing models in deep learning when
input-output relationships in temporal
data: recurrent neural networks (RNNs), convolutional neural networks (CNNs),
transformers, etc. In recent
work, we study the approximation properties and optimization dynamics of RNNs when
applied to learn
temporal dynamics. We consider the simple but representative setting of using
continuous-time linear RNNs
to learn from data generated by linear relationships. Mathematically, the latter can
understood as a
sequence of linear functionals. We prove a universal approximation theorem of such
linear functionals and
characterize the approximation rate. Moreover, we perform a fine-grained dynamical
analysis of training
linear RNNs by gradient methods. A unifying theme uncovered is the non-trivial
memory, a notion
that can be made precise in our framework, on both approximation and optimization:
there is longterm memory in the
target, it takes a large number of neurons to approximate it. Moreover, the training
process will suffer from severe slow downs. In particular, both of these effects
pronounced with increasing memory - a phenomenon we call the “curse of memory”.
a basic step towards a concrete mathematical understanding of new phenomenons
relationships using recurrent architectures.
We also study the approximation properties of convolutional architectures applied to time series modelling. Similar to the recurrent setting, parallel results for convolutional architectures are derived regarding to the approximation efficiency, with WaveNet being a prime example. Our results reveal that under this new setting, the approximation efficiency is not only characterized by memory, but also additional fine structures in the target relationship. This leads to a novel definition of spectrum-based regularity that measures the complexity of temporal relationships under the convolutional approximation scheme. These analyses provide a foundation to understand the differences between architectural choices for temporal modelling with theoretically grounded guidance for practical applications.
Topic：Implicit biases of SGD for neural network models
Abstract：Understanding the implicit biases of optimization algorithms is one of the core problems in theoretical machine learning. This refers to the fact that even without any explicit regularizations to avoid overfitting, the dynamics of an optimizer itself is biased to pick solutions that generalize well. This talk introduces the recent progress in understanding the implicit bias of stochastic gradient descent (SGD) for neural network models. First, we consider the gradient descent flow, i.e., SGD with an infinitesimal learning rate, for two-layer neural networks. In particular, we will see how the implicit bias is affected by the extent of over-parameterization. Then, we turn to SGD with a finite learning rate. The influence of learning rate as well as the batch size will be studied from the perspective of dynamical stability. The concept of uniformity is introduced, which, together, with flatness characterizes the accessibility of a particular SGD to a global minimum. This analysis shows that learning rate and batch size play different roles in selecting global minima. Extensive empirical results correlate well with the theoretical findings.
Topic：The landscape-dependent annealing strategy in machine learning: How Stochastic-Gradient-Descent finds flat minima
Reporter：Yuhai Tu (IBM T. J. Watson Research Center)
Abstract：Despite tremendous success of the Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we discuss our recent work [1,2] on establishing a theoretical framework based on nonequilibrium statistical physics to understand the SGD learning dynamics, the loss function landscape, and their relation. Our study shows that SGD dynamics follows a low-dimensional drift-diffusion motion in the weight space and the loss function is flat with large values of flatness (inverse of curvature) in most directions. Furthermore, our study reveals a robust inverse relation between the weight variance in SGD and the landscape flatness opposite to the fluctuation-response relation in equilibrium systems. We develop a statistical theory of SGD based on properties of the ensemble of minibatch loss functions and show that the noise strength in SGD depends inversely on the landscape flatness, which explains the inverse variance-flatness relation. Our study suggests that SGD serves as an ``intelligent" annealing strategy where the effective temperature self-adjusts according to the loss landscape in order to find the flat minimum regions that contain generalizable solutions. Finally, we discuss an application of these insights for reducing catastrophic forgetting efficiently for sequential multiple tasks learning.
Reference： 1. “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021. 2. “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), April 7, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf
Topic：Stochastic gradient descent for noise with ML-type scaling
Reporter：Stephan Wojtowytsch (Princeton University)
Abstract：In the literature on stochastic gradient descent, there are two types of convergence results: (1) SGD finds minimizers of convex objective functions and (2) SGD finds critical points of smooth objective functions. Classical results are obtained under the assumption that the stochastic noise is L^2-bounded and that the learning rate decays to zero at a suitable speed. We show that, if the objective landscape and noise possess certain properties which are reminiscent of deep learning problems, then we can obtain global convergence guarantees of first type under second type assumptions for a fixed (small, but positive) learning rate. The convergence is exponential, but with a large random coefficient. If the learning rate exceeds a certain threshold, we discuss minimum selection by studying the invariant distribution of a continuous time SGD model. We show that at a critical threshold, SGD prefers minimizers where the objective function is 'flat' in a precise sense.
Topic：Learn like a child: super-large-scale multi-modal pre-training model
Abstract：The experience revolution of cognitive science brings people a new perspective on understanding meaning from language: the ability to think and use language is the result of the cooperation between our body and mind. The physical body includes various modalities such as vision, hearing, smell, touch, and motor nerves. Human children learn languages in a multi-modal environment, which is also the lack of AI at present. The lecture will introduce some of our work on cross-modal understanding of language and images. Starting from the relationship between vision and language, we use tens of millions or even hundreds of millions of pairs of pictures and text generated by the Internet to complete one of the largest Chinese general graphic pre-training models with self-supervised tasks. Initially explore the possibility of AI learning languages in a multi-modal environment. By analyzing the changes in language from monomodal to multimodal learning, we found some phenomena closely related to human cognition.
Topic：Knowledge-guided pre-training language model
Abstract：In recent years, deep learning has become a key technology for natural language processing, especially the pre-trained language model since 2018 has significantly improved the overall performance of natural language processing. As a typical data-driven method, deep learning represented by pre-trained language models still faces problems such as poor interpretability and poor robustness. How to introduce a large amount of human language knowledge and world knowledge into the model is to improve the performance of deep learning It is an important direction, but also faces many challenges. This report will systematically introduce the latest developments and trends of knowledge-guided pre-training language models.
Topic：Introduction to Reinforced Dynamics
Abstract：This report will introduce the basic concepts of molecular dynamics simulation, as well as the fundamental problem in molecular dynamics simulation-sampling problem. We briefly introduce the limitations and challenges of two types of enhanced sampling methods. In particular, the report will introduce in detail our solution to the enhanced sampling problem: an enhanced sampling method based on deep learning-reinforced dynamics (reinforced dynamics). Finally, we show the effect of enhanced dynamics on protein structure prediction.
Topic：Flow Model: A Computational Physics Perspective
Abstract：This report will combine some personal research experiences to introduce scientific issues and scientific applications related to flow-based generative models. From the perspective of computational physics, we will see the relationship between the flow model and the optimal transport theory, fluid mechanics, symplectic geometry algorithm, renormalization group, and Monte Carlo calculations.
Topic：Understand the training process of neural networks
Reporter：Zhiqin John Xu
Abstract：Only from the perspective of approximation theory, an over-parameterized neural network can have infinite sets of solutions to minimize the error of the training set. In the actual training process, the neural network always seems to find a good generalization solution. In order to understand how a neural network can learn a class of generalized solutions from an infinite number of possibilities, it is necessary to understand the training process experienced in finding the solution. In this report, I will introduce some progress in the training process of neural networks, such as the complexity changes of neural networks during the training process, frequency behavior and how initialization affects training. Finally, I will discuss some open issues and explore the understanding of the development trend of neural networks from the training process.
Topic：Machine Learning and Dynamical Systems
Abstract：In this talk, we discuss some recent work on the connections between machine learning and dynamical systems. These come broadly in three categories, namely machine learning via, for and of dynamical systems, and here we will focus on the first two. In the direction of machine learning via dynamical systems, we introduce a dynamical approach to deep learning theory with particular emphasis on its connections with control theory. In the reverse direction of machine learning for dynamical systems, we discuss the approximation and optimization theory of learning input-output temporal relationships using recurrent neural networks, with the goal of highlighting key new phenomena that arise in learning in dynamic settings. If time permits, we will also discuss some applications of dynamical systems on the analysis of optimization algorithms commonly applied in machine learning.
Topic：Neural network and high-dimensional function approximation
Abstract：In recent years, deep learning methods based on neural network models have achieved unprecedented success in different fields, such as computer vision and scientific computing. From the perspective of approximation theory, these successes depend on the powerful ability of neural networks to approximate high-dimensional functions. And we know that traditional methods will inevitably suffer the curse of dimensionality when approximating high-dimensional functions. Does this indicate that neural networks can avoid the curse of dimensionality in a sense? If so, what is the mechanism behind it? We will discuss these issues around the three models of the kernel method, two-layer neural network and deep residual network. In particular, we will characterize the high-dimensional function space approximated by each model. Finally, we will list some open questions to help everyone have an overall understanding of this field.
Topic：Use deep learning to solve high-dimensional control problems
Abstract：The vigorous development of deep learning in recent years has provided us with new powerful tools for solving high-dimensional calculations. Among them, the algorithms of gradient back propagation and stochastic gradient descent provide us with efficient algorithms for solving the optimal neural network. This report will first analyze the neural network from the perspective of cybernetics, discuss the similarities between solving the optimal neural network and the optimal control problem, and the enlightenment of the above algorithm for solving the high-dimensional control problem. Under this enlightenment, we will show two works that use deep learning to solve high-dimensional control problems, (1) solve model-based high-dimensional stochastic control problems; (2) combine the variational form of backward stochastic equations to solve parabolic partial differentials equation. Compared with the previous traditional algorithms that are limited by the disaster of dimensionality, these algorithms show huge computational advantages and greatly improve our computational ability to deal with a large class of high-dimensional problems. Finally, we will discuss some unresolved issues in related directions to help everyone have a better understanding of this field.
Time：2020-11-15, 10:00 - 12:00
Time：2020-11-08 10:00 - 12:00
Time：2020-11-01 10:00 - 12:00