机器学习与科学应用

本联合研讨计划旨在为国内优秀学生和青年教授，整合国内外优质的教学和科研资源，创造全球顶级的学习和科研环境及科研平台，使其成为机器学习以及相关领域最优秀的青年人才。活动早期网页请见此

现阶段主要邀请国内外在机器学习领域最活跃的研究人员系统介绍机器学习研究的最新进展和最前沿的科学问题。主要内容包括：机器学习的基础研究、机器学习在科学问题的应用和机器学习在工业领域的应用。研讨班每两周举行一次。每次邀请两位学者给报告。

本次会议使用腾讯会议软件，并以线上会议模式开展，点此下载腾讯会议

组织委员会：鄂维南，董彬，高卫国，黄忠亿，王涵，许志钦，杨周旺，张林峰。

已结束的报告见：以往活动

时间：2021-11-27 9：30-11：30

报告人：王宇航

Affiliations： 北京深势科技有限公司

题目：冷冻电镜原理及相关算法

摘要：冷冻电镜显微学（cryo-electron microscopy）是结构⽣物学领域⾥⼀项⾮常重要的实验技术，并且在近⼗年⾥有突破性进展。这项技术对阐释包括新冠病毒作⽤机理在内的众多⽣物医学问题提供了重要帮助。它的三位创⽴者在2017 年被授予了诺⻉尔化学奖。相对于传统结构⽣物学技术，可适⽤于冷冻电镜成像的⽣物样品范围更广，⽽且从同⼀批样品中可以解析出多个结构，从⽽更全⾯的帮助我们理解⽣物⼤分⼦的功能机理。本次报告将侧重于冷冻电镜显微学⾥发展最快的⼀个分⽀：单颗粒冷冻电镜分析技术（single-particle analysis）。我会介绍这项技术的原理，应⽤，相关传统算法和基于机器学习的算法。虽然冷冻电镜实验技术正在逐步⾛向成熟，相关数据分析算法的精准度和鲁棒性仍有很⼤的提升空间。我希望能借助这次报告激发⼤家对冷冻电镜技术的兴趣，⼀起推动冷冻电镜算法的发展。

Bio：
王宇航（深势科技电镜算法研究员），毕业于美国伊利诺伊⼤学⾹槟分校的计算⽣物物理学专业。他博⼠阶段的主要课题是利⽤分⼦动⼒学⽅法来研究⽣物⼤分⼦功能背后的机理，研究的⽣物⼤分⼦体系包括离⼦通道等膜蛋⽩以及T 细胞受体蛋⽩。博⼠就读期间的研究重点是把冷冻电镜的测量数据和分⼦动态模拟技术结合起来解析⽣物⼤分⼦的结构。之后在加州理⼯学院做博后期间，他利⽤冷冻电⼦断层扫描技术来研究⽣物⼤分⼦在细胞环境下的结构和功能。目前他的⼯作重点是开发新的冷冻电镜数据分析算法。

报告人：朱通

Affiliations： 华东师范大学

题目：基于神经网络实现碳氢燃料的燃烧模拟

摘要：碳氢分子是燃料的主要组成部分，探明其燃烧机理是实现模拟发动机燃烧，进而推动发动机设计所需要重点解决的问题之一。受限于力场精度，目前广泛采用的分子力学方法对燃烧反应模拟结果的可靠性仍有较大提升空间。由于量子化学方法需要大量的计算资源，直接用其模拟碳氢燃料的燃烧机理是不可行的。在前期工作中，我们发展了一种基于分块的量子化学计算方法MFCC-combustion，实现了对模拟体系能量和力的高效精确计算。该方法与动力学模拟软件结合后，通过合理的控温控压，可实现对燃料燃烧的从头算分子动力学模拟 (AIMD)。最近，我们基于深度势能(Deep Potential)模型将AIMD的模拟效率进一步提高了三个数量级左右，从而实现了对碳氢燃料的纳秒级反应动力学模拟。该方法的发展有望为碳氢燃料燃烧机理、燃烧基础数库的构建和完善提供一个高效准确的研究工具。

Bio：朱通，2013年博士毕业于华东师范大学精密光谱科学与技术国家重点实验室，2016-2018年台湾中央研究院访问学者，现为华东师范大学化学与分子工程学院副研究员。主要研究方向为采用量化计算及分子动力学模拟研究复杂化学体系的结构与性质，包括金属离子与蛋白质/核酸的相互作用和碳氢燃料的燃烧反应机理。

会议链接： https://meeting.tencent.com/dm/p1q2HxwW3x4r

Tecent Meeting 下载链接： https://meeting.tencent.com/download-center.html

会议ID：329744341

会议密码：974114

会议直播间： B站机器学习联合培养计划直播间

备用会议ID(无密码)： 553 2421 7498

时间：2021-11-13 9：30-11：30

题目：机器学习辅助科学计算：工程化与落地

报告人：张林峰

Affiliations： 北京深势科技有限公司、北京科学智能研究院

摘要：近年来，机器学习作为高维复杂函数的表示工具，为很多困难的科学计算问题提供了新的解决方案，大大拓展了人们利用计算进行模拟、控制和设计的能力。这不仅为底层理论与算法的发展带来了崭新的可能性，也给我们带来很多关于工程化与落地的新命题。我将就以下3个方向讨论这些命题：1.工程化要做的事情：规模工程、数据工程、性能工程；2.DeepModeling开源社区和深势科技算法工程团队关于协同机制的实践；3.深势科技在面向计算的云基础设施、药物设计、材料设计方面关于落地的实践。

时间：2021-11-13 9：30-11：30

题目：从下一代生物识别到 AI 知识数据库

报告人：邰骋,汤林鹏

Affiliations： 墨奇科技

摘要：传统数据库和大数据系统主要处理 SQL 结构化数据和半结构化数据，并在此基础上驱动了信息产业的革命。而近年来，以图像、视频、音频、文本、时间序列为代表的复杂数据不断增长，深度神经网络为代表的 AI 算法能够对这类数据进行初步的处理并提取价值，并促成了近十年来 AI 发展的浪潮。但是，我们仍然缺乏一种能够统一处理这类复杂数据的数据库，这大大提高了 AI 从研发到落地的成本。在本次演讲中，我们提出一种新型的 AI 知识数据库，支持从复杂数据中提出以图和向量为代表的中间表示，并支持对于这类中间表示的高效存储和检索。我们会讨论这个新型数据库的一些关键设计要点，以及相关的开放问题。作为一个行业应用案例，通过多尺度的图像特征表示，小样本无监督学习框架和高效、异构的比对算法，我们打造了世界上首个十亿级别的精准可靠、无需标注的指纹数据库。同时，我们还将图像识别和密码学结合，构建保护隐私的生物识别算法。利用这一新型的 AI 数据库，我们希望加快各行各业 AI 从研发到落地的速度，并促进 AI 算法的进一步发展。

时间：2021-10-30 9：30-11：30

题目：Deep Network Approximation: Achieving Arbitrary Error with Fixed Size

报告人：张仕俊

Affiliations： 新加坡国立大学

Slides： 下载链接

Homepage： https://shijunzhang.top/

Replay： https://www.bilibili.com/video/BV1pf4y1u79M

摘要：This talk discusses a new type of simple feed-forward neural network that achieves the universal approximation property for all continuous functions with a fixed network size. This new type of neural network is simple because it is designed with a simple and computable continuous activation function σ leveraging a triangular-wave function and a softsign function. First, we prove that σ-activated networks with width 36d(2d + 1) and depth 11 can approximate any continuous function on a d-dimensional hypercube with an arbitrarily small error. Next, we show that classification functions arising from image and signal classification can be exactly represented by σ-activated networks with width 36d(2d + 1) and depth 12, when there exist pairwise disjoint closed bounded subsets of Rd such that the samples of the same class are located in the same subset.

时间：2021-10-30 9：30-11：30

题目：Deep Network Approximation: Error Characterization in term of Width and Depth

报告人：杨海钊

Affiliations： 普渡大学

Slides： 下载链接

Replay： https://www.bilibili.com/video/BV1TL4y1q76Z

Homepage： https://haizhaoyang.github.io/

摘要：Deep neural networks are a powerful tool in many applications in sciences, engineering, technology, and industries, especially for large-scale and high-dimensional learning problems. This talk focuses on the mathematical understanding of deep neural networks. In particular, a relation of the approximation properties of deep neural networks and function compositions is characterized. The approximation error of ReLU networks in terms of the width and depth is given for various function spaces, such as space of polynomials, continuous functions, or smooth functions on a hypercube. Finally, to achieve a better approximation error, we introduce a new type of network called Floor-ReLU networks, built with each neuron activated by either Floor or ReLU.

时间：2021-10-17 9：30-11：30

题目：Embedding Principle of Loss Landscape of Deep Neural Networks

报告人：张耀宇

Affiliations： Shanghai Jiao Tong University

Replay： https://www.bilibili.com/video/BV1x44y1x7Qg

Homepage： http://old.ins.sjtu.edu.cn/p/zhangyaoyu

摘要：Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this talk, we present an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy and preserving the DNN output function. Note that, given any training data, differentiable loss function and differentiable activation function, this embedding structure of critical points holds. This general structure of DNNs is starkly different from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides a new perspective to study the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, this work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near future.

时间：2021-10-17 9：30-11：30

题目：Global Loss Landscape of Neural Networks: What do We Know and What do We Not Know?

报告人：孙若愚

Affiliations： University of Illinois at Urbana-Champaign

Slides： 下载链接

Replay： https://www.bilibili.com/video/BV1cq4y1G7Zn

Homepage： https://ruoyus.github.io/

摘要：The recent success of neural networks suggests that their loss landscape is not too bad, but what concrete results do we know and not know about the landscape? In this talk, we present a few recent results on the landscape. First, non-linear neural nets can have sub-optimal local minima under mild assumptions (for arbitrary width, for generic input data and for most activation functions). Second, wide networks do not have sub-optimal ``basin'' for any continuous activation, while narrow networks can have sub-optimal basin. We will present a simple 2D geometrical object that is a basic component of neural net landscape, which can visually explain the above two results. Third, we show that for ReQU and ReLU networks, adding a proper regularizer can eliminate sub-optimal local minima and decreasing paths to infinity. Together, these results demonstrate that wide neural nets have a ``nice landscape'', but the meaning of ``nice landscape'' is more subtle than we expected. We will also mention the limitation of existing results, e.g., in what settings we do not know the existence of sub-optimal local minima. We will briefly discuss the relevance of the landscape results in two aspects: (1) these results can help understand the training difficulty of narrow networks; (2) these results can potentially help convergence analysis and implicit regularization.

时间：2021-9-25 9：30-11：30

题目：Some theoretical results on model-based reinforcement learning

报告人：Mengdi Wang

Affiliations： Princeton University

Homepage： https://mwang.princeton.edu/

摘要：We discuss some recent results on model-based methods for reinforcement learning (RL) in both online and offline problems. For the online RL problem, we discuss several model-based RL methods that adaptively explore an unknown environment and learn to act with provable regret bounds. In particular, we focus on finite-horizon episodic RL where the unknown transition law belongs to a generic family of models. We propose a model based ‘value-targeted regression’ RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, for arbitrary family of transition models, using the notion of the so-called Eluder dimension proposed by Russo & Van Roy (2014). Next we discuss batch data (offline) reinforcement learning, where the goal is to predict the value of a new policy using data generated by some behavior policy (which may be unknown). We show that the fitted Q-iteration method with linear function approximation is equivalent to a model-based plugin estimator. We establish that this model-based estimator is minimax optimal and its statistical limit is determined by a form of restricted chi-square divergence between the two policies.

时间：2021-9-25 9：30-11：30

题目：Policy Cover Guided Exploration in Model-free and Model-based Reinforcement Learning

报告人：Wen Sun

Affiliations： Cornell University

Replay： https://www.bilibili.com/video/BV1hL4y1z7GN

Homepage： https://wensun.github.io/

摘要：Existing RL methods that leverage random exploration techniques often fail to learn efficiently in environments where strategic exploration is needed. In this talk, we introduce a new concept called Policy Cover, which is an ensemble of learned policies. The policy cover encodes information about which part of the state space is explored and such information can be used to further guide the exploration process. We show that this idea can be used in both model-free and model-based algorithmic frameworks. Particularly, for model-free learning, we present the first policy gradient algorithm--- Policy Cover Policy Gradient (PG-PG), that can explore and learn with polynomial sample complexity. For model-based setting, we present an algorithm — Policy-Cover Model Learning and Planning (PC-MLP), that also learns and explores with polynomial sample complexity. For both approaches, we show that they are flexible enough to be used together with deep neural networks, and their deep versions achieve state-of-art performance on common benchmarks, including exploration challenging tasks such as reward-free Maze exploration.

时间：2021-9-11 9：30-11：30

题目：In Search of Effective and Reproducible Clinical Imaging Biomarkers for Population Health and Oncology Applications of Screening, Diagnosis and Prognosis.

报告人：Le Lu (吕乐)

Affiliations： PhD, FIEEE, MICCAI Board Member, AE for TPAMI

Homepage： https://www.cs.jhu.edu/~lelu/

摘要：This talk will first give an overall on the work of employing deep learning to permit novel clinical workflows in two population health tasks, namely using conventional ultrasound for liver steatosis screening and quantitative reporting; osteoporosis screening via conventional X-ray imaging and "AI readers". These two tasks were generally considered as infeasible tasks for human readers, but as proved by our scientific and clinical studies and peer-reviewed publications, they are suitable for AI readers. AI can be a supplementary and useful tool to assist physicians for cheaper and more convenient/precision patient management. Next, the main part of this talk describes a roadmap on three key problems in pancreatic cancer imaging solution: early screening, precision differential diagnosis, and deep prognosis on patient survival prediction. (1) Based on a new self-learning framework, we train the pancreatic ductal adenocarcinoma (PDAC) segmentation model using a larger quantity of patients, with a mix of annotated/unannotated venous or multi-phase CT images. Pseudo annotations are generated by combining two teacher models with different PDAC segmentation specialties on unannotated images, and can be further refined by a teaching assistant model that identifies associated vessels around the pancreas. Our approach makes it technically feasible for robust large-scale PDAC screening from multi-institutional multi-phase partially-annotated CT scans. (2) We propose a holistic segmentation-mesh classification network (SMCN) to provide patient-level diagnosis, by fully utilizing the geometry and location information. SMCN learns the pancreas and mass segmentation task and builds an anatomical correspondence-aware organ mesh model by progressively deforming a pancreas prototype on the raw segmentation mask. Our results are comparable to a multimodality clinical test that combines clinical, imaging, and molecular testing for clinical management of patients with cysts. (3) Accurate preoperative prognosis of resectable PDACs for personalized treatment is highly desired in clinical practice. We present a novel deep neural network for the survival prediction of resectable PDAC patients, 3D Contrast-Enhanced Convolutional Long Short-Term Memory network (CE-ConvLSTM), to derive the tumor attenuation signatures from CE-CT imaging studies. Our framework can significantly improve the prediction performances upon existing state-of-the-art survival analysis methods. This deep tumor signature has evidently added values (as a predictive biomarker) to be combined with the existing clinical staging system.

时间：2021-8-28 9：30-11：30

题目：Deep Learning for inverse problems

报告人：应乐兴

Affiliations： 斯坦福大学

Replay： https://www.bilibili.com/video/BV1Qq4y1K71Q

Homepage： https://web.stanford.edu/~lexing/

摘要：This talk is about some recent progress on solving inverse problems using deep learning. Compared to traditional machine learning problems, inverse problems are often limited by the size of the training data set. We show how to overcome this issue by incorporating mathematical analysis and physics into the design of neural network architectures. We first describe neural network representations of pseudodifferential operators and Fourier integral operators. We then continue to discuss applications including electric impedance tomography, optical tomography, inverse acoustic/EM scattering, seismic imaging, and travel-time tomography.

时间：2021-8-28 9：30-11：30

题目：Self-supervised Deep Learning for Solving Inverse Problems in Imaging

报告人：纪辉

Affiliations： 新加坡国立大学

Replay： https://www.bilibili.com/video/BV1uU4y177VF

Homepage： https://blog.nus.edu.sg/matjh/

摘要：Deep learning has become a prominent tool for solving many inverse problems in imaging sciences. Most existing SOTA solutions are built on supervised learning with a prerequisite on the availability of a large-scale dataset of many degraded/truth image pairs. In recent years, driven by practical need, there is an increasing interest on studying deep learning methods under limited data resources, which has particular significance for imaging in science and medicine. This talk will focus on the discussion of self-supervised deep learning for solving inverse imaging problems, which assumes no training sample is available. By examining deep learning from the perspective of Bayesian inference, we will present several results and techniques on self-supervised learning for MMSE (minimum mean squared error) estimator. Built on these techniques, we will show that, in several applications, the resulting dataset-free deep learning methods provide very competitive performance in comparison to their SOTA supervised counterparts. While the demonstrations only cover image denoising, compressed sensing, and phase retrieval, the presented techniques and methods are quite general which can used for solving many other inverse imaging problems.

时间：2021-8-14 9：30-11：30

题目：Barron Spaces

报告人：Jinchao Xu

Affiliations： Pennsylvania State University

Replay： https://www.bilibili.com/video/BV1Yq4y1Q7n2

Homepage： http://www.personal.psu.edu/jxx1/

摘要：In this talk, we will present some recent results related to the Barron space studiedf by E et al in 2019. For the ReLU activation function, we give an equivalent characterization of the corresponding Barron space in terms of the convex hull of an appropriate dictionary. This characterization enables a generalization of the notion of Barron space to neural networks with general activation functions, and even to general dictionaries of functions. We provide an explicit representation of some Barron norms which are of particular interest to the theory of neural networks, specifically corresponding to a dictionary of decaying Fourier modes and the dictionary corresponding to shallow ReLU^k networks. Next, we present optimal estimates of approximation rates, metric entropy, Kolmogorov and Gelfand n-widths of the Barron space unit ball with respect to L^2 for ReLU^k activation functions and the dictionary of decaying Fourier modes. These results provide a solution to several open problems concerning the precise approximation properties of these spaces. If time allows, we will also give recent results on the approximation rates and metric entropies for sigmoidal and ReLU Barron spaces with respect to L^p for p > 2. This talk is based on joint work with Jonathan Siegel.

时间：2021-8-14 9：30-11：30

题目：Convergence analysis for the gradient descent optimization method in the training of artificial neural networks with ReLU activation for piecewise linear target functions

报告人：Arnulf Jentzen

Affiliations：
Applied Mathematics: Institute for Analysis and Numerics, Faculty of Mathematics and Computer Science, University of Muenster, Germany;
School of Data Science and Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China

Slides： 下载链接

Replay： https://www.bilibili.com/video/BV1DU4y177A1

Homepage： http://www.ajentzen.de/

摘要：Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains -- even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer -- an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of ANNs with ReLU activation to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity. In this talk we prove this conjecture in the special situation where the probability distribution of the input data is absolutely continuous with respect to the continuous uniform distribution on a compact interval and where the target function under consideriation is piecewise linear.

时间：2021-7-31 9：30-11：30

题目：Learning and Learning to Solve PDEs

报告人：董彬, 北京大学

Slides： 下载链接

Replay： https://www.bilibili.com/video/BV1dh411z7JA

Homepage： https://bicmr.pku.edu.cn/~dongbin/

摘要：Deep learning continues to dominate machine learning and has been successful in computer vision, natural language processing, etc. Its impact has now expanded to many research areas in science and engineering. In this talk, I will mainly focus on some recent impact of deep learning on computational mathematics. I will present our recent work on bridging deep neural networks with numerical differential equations. On the one hand, I will show how to design transparent deep convolutional networks to uncover hidden PDE models from observed dynamical data. On the other hand, I will present our recent preliminary attempts to combine wisdoms from numerical PDEs and machine learning to design data-driven solvers for PDEs.

时间：2021-7-31 9：30-11：30

题目：Neural Operator: Learning Maps Between Function Spaces

报告人：李宗宜, 加州理工学院

Slides： 下载链接

Replay： https://www.bilibili.com/video/BV1MA411P7HU

Homepage： https://zongyi-li.github.io/

摘要：The classical development of neural networks has primarily focused on learning mappings between finite dimensional Euclidean spaces or finite sets. We propose a generalization of neural networks tailored to learn operators mapping between infinite dimensional function spaces. We formulate the approximation of operators by composition of a class of linear integral operators and nonlinear activation functions, so that the composed operator can approximate complex nonlinear operators. We prove a universal approximation theorem for our construction. The proposed neural operators are resolution-invariant: they share the same network parameters between different discretization of the underlying function spaces and can be used for zero-shot super-resolutions. Numerically, the proposed models show superior performance compared to existing machine learning based methodologies on Burgers' equation, Darcy flow, and the Navier-Stokes equation, while being several order of magnitude faster compared to conventional PDE solvers.

时间：2021-7-17 9：30-11：30

题目：Frequency Principle

报告人：许志钦 (Zhi-Qin John Xu), 上海交通大学

Slides： 下载链接

Replay： https://www.bilibili.com/video/BV1Yy4y1T7M8

Homepage： https://ins.sjtu.edu.cn/people/xuzhiqin/

摘要：In this talk, I would introduce frequency principle (F-Principle) in detail, including experiments, and theory. I would also connect the F-Principle with traditional iterative methods, such as Jacobi methods, understanding the training of neural networks from the perspective of numerical analysis. Then, I will use some examples to show how F-Principle benefits the design of neural networks. Finally, I would talk about some open questions about the F-Principle.

时间：2021-7-3 9：30-11：30

题目：基于深度学习的分子动力学模拟

报告人：王涵 (北京应用物理与计算数学研究所)

Slides： 下载链接

摘要：分子动力学模拟需要对原子间相互作用（势函数）有一个精确的描述，然而人们面临两难困境：第一性原理方法精确但昂贵，经验势方法快速但精度有限。我们在报告中从两个方面讨论了解决办法：势函数构造和数据生成。在势函数构造方面，我们介绍深度势能方法，这是一个对第一性原理势函数的精确表示。在数据生成方面，我们介绍同步学习格式DP-GEN。这个方法能自动生成满足特定精度要求的最小训练数据集。相比于经验势，DP-GEN开启了通过探索构型和化学空间持续改进深度势能的可能性。在报告的最后部分，我们介绍深度势能方法针对CPU+GPU异构超级计算机的优化实现。这个实现在超级计算机顶点（Summit）上达到了双精度91P的峰值性能，在一天内能够完成纳秒量级的第一性原理精度分子动力学模拟，快于之前基线水平1000倍以上。在我们的工作中，物理模型、深度学习和高性能计算的结合为重大科学发现提供了有力模拟工具。

时间：2021-7-3 9：30-11：30

题目：DeePKS: a machine learning assisted electronic structure model

报告人：张林峰 (北京大数据研究院、深势科技)

Slides： 下载链接

Homepage： https://cn.linkedin.com/in/linfeng-zhang-%E5%BC%A0%E6%9E%97%E5%B3%B0-312242a8

摘要：We introduce a general machine learning-based framework for building an accurate and widely-applicable energy functional within the framework of generalized Kohn-Sham density functional theory. In particular, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels. We demonstrate that the functional that results from this training procedure, with the efficiency of cheap density functional models, gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules.

时间：2021-6-19 9：30-11：30

题目：Modelling Temporal Data: from RNNs to CNNs

报告人：Zhong Li, Qianxiao Li

Replay： https://www.bilibili.com/video/BV1Wq4y1L7Cg

Homepage： https://blog.nus.edu.sg/qianxiaoli/

摘要：There are several competing models in deep learning when modelling input-output relationships in temporal data: recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. In recent work, we study the approximation properties and optimization dynamics of RNNs when applied to learn temporal dynamics. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals and characterize the approximation rate. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs by gradient methods. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on both approximation and optimization: when there is longterm memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from severe slow downs. In particular, both of these effects become exponentially more pronounced with increasing memory - a phenomenon we call the “curse of memory”. These analyses represent a basic step towards a concrete mathematical understanding of new phenomenons arising in learning temporal relationships using recurrent architectures.

We also study the approximation properties of convolutional architectures applied to time series modelling. Similar to the recurrent setting, parallel results for convolutional architectures are derived regarding to the approximation efficiency, with WaveNet being a prime example. Our results reveal that under this new setting, the approximation efficiency is not only characterized by memory, but also additional fine structures in the target relationship. This leads to a novel definition of spectrum-based regularity that measures the complexity of temporal relationships under the convolutional approximation scheme. These analyses provide a foundation to understand the differences between architectural choices for temporal modelling with theoretically grounded guidance for practical applications.

时间：2021-6-5 9：30-11：30

题目：Implicit biases of SGD for neural network models

报告人：Lei Wu (Princeton University)

Replay： https://www.bilibili.com/video/BV1xv411V7v3

Slides： 下载链接

Homepage： https://leiwu0.github.io/index.html

摘要：Understanding the implicit biases of optimization algorithms is one of the core problems in theoretical machine learning. This refers to the fact that even without any explicit regularizations to avoid overfitting, the dynamics of an optimizer itself is biased to pick solutions that generalize well. This talk introduces the recent progress in understanding the implicit bias of stochastic gradient descent (SGD) for neural network models. First, we consider the gradient descent flow, i.e., SGD with an infinitesimal learning rate, for two-layer neural networks. In particular, we will see how the implicit bias is affected by the extent of over-parameterization. Then, we turn to SGD with a finite learning rate. The influence of learning rate as well as the batch size will be studied from the perspective of dynamical stability. The concept of uniformity is introduced, which, together, with flatness characterizes the accessibility of a particular SGD to a global minimum. This analysis shows that learning rate and batch size play different roles in selecting global minima. Extensive empirical results correlate well with the theoretical findings.

时间：2021-6-5 9：30-11：30

题目：The landscape-dependent annealing strategy in machine learning: How Stochastic-Gradient-Descent finds flat minima

报告人：Yuhai Tu (IBM T. J. Watson Research Center)

Replay： https://www.bilibili.com/video/BV1v64y1R7Xx

Slides： 下载链接

Homepage： https://researcher.watson.ibm.com/researcher/view.php?person=us-yuhai

摘要：Despite tremendous success of the Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we discuss our recent work [1,2] on establishing a theoretical framework based on nonequilibrium statistical physics to understand the SGD learning dynamics, the loss function landscape, and their relation. Our study shows that SGD dynamics follows a low-dimensional drift-diffusion motion in the weight space and the loss function is flat with large values of flatness (inverse of curvature) in most directions. Furthermore, our study reveals a robust inverse relation between the weight variance in SGD and the landscape flatness opposite to the fluctuation-response relation in equilibrium systems. We develop a statistical theory of SGD based on properties of the ensemble of minibatch loss functions and show that the noise strength in SGD depends inversely on the landscape flatness, which explains the inverse variance-flatness relation. Our study suggests that SGD serves as an ``intelligent" annealing strategy where the effective temperature self-adjusts according to the loss landscape in order to find the flat minimum regions that contain generalizable solutions. Finally, we discuss an application of these insights for reducing catastrophic forgetting efficiently for sequential multiple tasks learning.

Reference：
1. “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021.
2. “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), April 7, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf

时间：2021-6-5 9：30-11：30

题目：Stochastic gradient descent for noise with ML-type scaling

报告人：Stephan Wojtowytsch (Princeton University)

Replay： https://www.bilibili.com/video/BV1ky4y137ou

Slides： 下载链接

Homepage： https://www.swojtowytsch.com/

摘要：In the literature on stochastic gradient descent, there are two types of convergence results: (1) SGD finds minimizers of convex objective functions and (2) SGD finds critical points of smooth objective functions. Classical results are obtained under the assumption that the stochastic noise is L^2-bounded and that the learning rate decays to zero at a suitable speed. We show that, if the objective landscape and noise possess certain properties which are reminiscent of deep learning problems, then we can obtain global convergence guarantees of first type under second type assumptions for a fixed (small, but positive) learning rate. The convergence is exponential, but with a large random coefficient. If the learning rate exceeds a certain threshold, we discuss minimum selection by studying the invariant distribution of a continuous time SGD model. We show that at a critical threshold, SGD prefers minimizers where the objective function is 'flat' in a precise sense.

时间：2021年5月22号，9:30-11:30

题目：像孩子一样学习：超大规模多模态预训练模型

报告人：文继荣

Replay： https://www.bilibili.com/video/BV1Z54y1G7du

摘要：认知科学的体验革命带来人们对于从语言理解意义的新观点：思考以及使用语言的能力是我们的肉身与头脑合作的成果。肉身包括视觉、听觉、嗅觉、触觉和运动神经等各种各样的模态。人类的孩子是在多模态环境下学习语言，这也是目前AI欠缺的部分。讲座将介绍我们在语言和图像的跨模态理解方面的一些工作。我们从视觉和语言的关系出发，利用互联网产生的千万甚至上亿的成对图片与文字，用自监督的任务完成一个目前最大的中文通用图文预训练模型悟道∙文澜，由此去初步探索AI在多模态环境中学习语言的可能性。通过分析语言从单模态到多模态学习发生的变化，我们发现一些与人类认知密切相关的现象。

时间：2021年5月22号，9:30-11:30

题目：知识指导的预训练语言模型

报告人：刘知远

Replay： https://www.bilibili.com/video/BV1MK4y1G7Eb

摘要：近年来深度学习成为自然语言处理关键技术，特别是2018年以来的预训练语言模型显著提升了自然语言处理的整体性能。作为典型的数据驱动方法，以预训练语言模型为代表的深度学习仍然面临可解释性不强、鲁棒性差等难题，如何将人类积累的大量语言知识和世界知识引入模型，是改进深度学习性能的重要方向，同时也面临很多挑战。本报告将系统介绍知识指导的预训练语言模型的最新进展与趋势。

时间：2021年5月9号，9:30-11:30

题目：强化动力学简介

报告人：王涵

Slides： 下载链接

摘要：本报告将介绍分子动力学模拟的基本概念，以及分子动力学模拟中的根本性难题——采样问题。我们简单介绍两类增强采样方法的局限与挑战。特别地，报告将详细介绍我们对增强采样问题的解决方案：一种基于深度学习的增强采样方法——强化动力学（reinforced dynamics）。最后我们展示强化动力学在蛋白质结构预测方面的效果。

时间：2021年5月9号，9:30-11:30

题目：流模型：计算物理视角

报告人：王磊

B站： https://www.bilibili.com/video/BV1gp4y1t7ND

Slides： 下载链接

摘要：本报告将结合一些个人的研究体会，介绍与流模型(flow-based generative model)相关的科学问题与科学应用。从计算物理的视角出发，我们将看到流模型与最优输运理论、流体力学、辛几何算法、重正化群和蒙特卡罗计算等领域的关联。

时间：2021年4月17号，10:30-11:30

题目：理解神经网络的训练过程

报告人：许志钦（上海交通大学）

B站： https://www.bilibili.com/video/BV1Rh411U7xH

Slides： 下载链接

摘要：仅从逼近论角度，一个过参数化的神经网络可以有无穷多组解使得训练集的误差最小。而实际训练过程，神经网络似乎总能找到泛化不错的解。为了理解神经网络如何从无穷多种可能中学到一类泛化好的解，我们有必要理解找到解所经历的训练过程。本次报告，我将介绍关于神经网络训练过程的一些进展，比如神经网络在训练过程中的复杂度变化，频率行为以及初始化如何影响训练。最后，我将讨论一些公开问题，探索从训练过程理解神经网络的发展趋势。

时间：2021年4月17号，9:30-10:30

题目：Machine Learning and Dynamical Systems

报告人：李千骁（新加坡国立大学）

B站： https://www.bilibili.com/video/BV1jh411S7ds

Slides： 下载链接

摘要：In this talk, we discuss some recent work on the connections between machine learning and dynamical systems. These come broadly in three categories, namely machine learning via, for and of dynamical systems, and here we will focus on the first two. In the direction of machine learning via dynamical systems, we introduce a dynamical approach to deep learning theory with particular emphasis on its connections with control theory. In the reverse direction of machine learning for dynamical systems, we discuss the approximation and optimization theory of learning input-output temporal relationships using recurrent neural networks, with the goal of highlighting key new phenomena that arise in learning in dynamic settings. If time permits, we will also discuss some applications of dynamical systems on the analysis of optimization algorithms commonly applied in machine learning.

时间：2021年4月3号，10:30-11:30

题目：神经网络和高维函数逼近

报告人：吴磊（普林斯顿大学）

B站： https://www.bilibili.com/video/BV1R64y1m7fJ

Homepage： https://leiwu0.github.io/index.html

摘要：近年来，以神经网络模型为基础的深度学习方法在不同领域取得了前所未有的成功，例如计算机视觉、科学计算等。从逼近论的角度来说，这些成功依赖于神经网络强大的逼近高维函数的能力。而我们知道传统方法在逼近高维函数时必然会遭受维数诅咒。这是否表明神经网络在某种意义下可以避免维数诅咒？如果可以，那么背后的机制又是什么呢？我们将围绕kernel方法、两层神经网络和深度残差网络三个模型来讨论这些问题。特别地，我们将刻画每个模型所逼近的高维函数空间。最后，我们会罗列一些公开问题帮助大家对这个领域有个整体理解。

时间：2021年4月3号，9:30-10:30

题目：利用深度学习求解高维控制问题

报告人：韩劼群（普林斯顿大学）

B站： https://www.bilibili.com/video/BV1uK411c7Cz

摘要：近年来深度学习的蓬勃发展为我们求解高维计算提供了新的强有力工具，其中梯度反向传播和随机梯度下降的算法为我们求解最优神经网络提供了高效的算法。本报告将先以控制论的视角来分析神经网络，探讨求解最优神经网络和最优控制问题的相似点以及上述算法对于我们求解高维控制问题的启发。在此启发下我们会展示两个利用深度学习求解高维控制问题的工作，(1) 求解基于模型的高维随机控制问题；(2) 结合倒向随机方程的变分形式求解抛物类偏微分方程。这些算法相比于以往的受限于维数灾难的传统算法体现出巨大的计算优势，大幅度提高了我们处理一大类高维问题的计算能力。最后我们会讨论一些相关方向的未解决问题帮助大家对这个领域有更好的理解。

2020年第3次课程《深度学习基础和实践》之三

主讲：张林峰

时间：2020年11月15日 10:00 - 12:00

B站： https://www.bilibili.com/video/BV1i64y1y7nN?p=2

2020年第2次课程《深度学习基础和实践》之二

主讲：吴磊

时间：2020年11月08日 10:00 - 12:00

B站： https://www.bilibili.com/video/BV1i64y1y7nN?p=1

2020年第1次课程《深度学习基础和实践》之一

主讲：鄂维南

时间：2020年11月01日 10:00 - 12:00

联合研讨计划启动仪式

2020年10月25日上午，《机器学习联合研讨计划》启动仪式在线上线下同步成功举行。

此联合研讨计划由鄂维南院士牵头，旨在选拔国内一流大学中具备扎实理论基础和学术素养的优秀学生，整合国内外优质的教学和科研资源，创造全球顶级的学习和科研环境及科研平台，尽快把学生培养成为机器学习以及相关领域最优秀的青年人才，并为参与单位的人工智能基础研究团队储备人才。

本次启动仪式线下在东区第五教学楼5307教室举行，线上经由zoom会议系统进行。由中国科大数学科学学院教授、大数据学院副院长杨周旺教授主持，70余名师生线下、100余名师生线上参与了此次启动仪式。

参加此次启动仪式的同学来自北京大学、清华大学、中国科技大学、复旦大学、上海交通大学五所高校。仪式伊始，中国科大杨周旺老师、北京大学董彬老师、清华大学黄忠亿老师、复旦大学高卫国老师、上海交通大学许志钦老师分别介绍了各自高校的报名情况，并对同学们提出了要求与期望。之后由鄂维南院士做了题为《数学、科学、与人工智能》的主题报告。鄂维南院士从科学研究的基本目的、基本方法切入，引出了目前各科研领域遇到问题的共同根源：维数灾难，即随着维数的增加，计算量成指数增长。进而通过三个例子说明了深度学习对高维函数提供了有效的逼近方法。并引出机器学习在科学和科学计算领域的应用，总结说明在科研的新时代中研究范式、对象、环境都将发生改变。最后，鄂维南院士对各位同学提出期望：希望大家一起参与到这个伟大事业中来。

鄂维南院士精彩的分享内容和严谨的科学语言，为在场师生们阐述了机器学习的数学理论基础，并倡导年轻人加入“AI for Science”带来的科学机遇，引发了师生们的积极提问和热烈讨论。

联合研讨计划联络组

2020年10月25日

鄂维南院士做客

“机器学习联合研讨计划座谈会”

2020年10月6日下午，应数学科学学院邀请，781校友鄂维南院士做客《机器学习联合研讨计划》在中国科大的第一次校内座谈会。

本次座谈会在东区第五教学楼5106教室举行，由数学科学学院教授、大数据学院副院长杨周旺教授主持，四十余名师生参与了此次座谈会。

参加此次座谈会的同学来自数学、统计、物理、化学等专业，从各自专业的不同角度提出了许多值得讨论的问题。鄂维南院士首先与数学、统计专业的同学讨论了有监督机器学习的基础理论，从逼近论的角度针对高维和过参数化的情形引进一套新的数学框架，并将这个新的框架应用到核方法、浅层和深层的神经网络模型；对所有这些例子，寻找到正确的目标函数空间，证明正逼近论和逆逼近论定理以及最优的先验和后验误差估计，并给出梯度方法的理论分析；此外还涉及深度学习模型与随机矩阵领域的联系，深度学习在精度要求较高的回归问题上的可靠性与实用性，深度学习框架在经济大数据课题上的实际应用等。之后鄂维南院士与物理、化学专业的同学讨论如何将机器学习技术和传统的科学模型相结合，对一些最困难的科学和工程问题，特别是对物理化学领域的问题，建立起可靠的、行之有效的多尺度模型，并对有志投入该领域研究的同学提出了学习上的建议与指导。

鄂维南院士精彩的分享内容和严谨的科学语言，与在场师生们讨论了机器学习的数学理论基础和实际应用，并倡导年轻人加入“AI for Science”带来的科学机遇，与在场师生们进行了热烈讨论。

中国科学技术大学数学科学学院

安徽应用数学中心

2020年10月6日

B站主页： https://space.bilibili.com/1550111353

机器学习联合研讨计划

2020年第3次课程《深度学习基础和实践》之三

2020年第2次课程《深度学习基础和实践》之二

2020年第1次课程《深度学习基础和实践》之一

联合研讨计划启动仪式

鄂维南院士做客

“机器学习联合研讨计划座谈会”

订阅活动信息