机器学习与科学应用

机器学习联合研讨计划

本联合研讨计划旨在为国内优秀学生和青年教授,整合国内外优质的教学和科研资源,创造全球顶级的学习和科研环境及科研平台,使其成为机器学习以及相关领域最优秀的青年人才。活动早期网页请见此

现阶段主要邀请国内外在机器学习领域最活跃的研究人员系统介绍机器学习研究的最新进展和最前沿的科学问题。主要内容包括:机器学习的基础研究、机器学习在科学问题的应用和机器学习在工业领域的应用。研讨班每两周举行一次。每次邀请两位学者给报告。

本次会议使用腾讯会议软件,并以线上会议模式开展,点此下载腾讯会议

组织委员会:鄂维南,董彬,高卫国,黄忠亿,王涵,许志钦,杨周旺,张林峰。

已结束的报告见:以往活动

 

时间:2021-10-30   9:30-11:30

 

报告人:杨海钊

Affiliations: 普渡大学

题目:Deep Network Approximation: Error Characterization in term of Width and Depth

摘要:Deep neural networks are a powerful tool in many applications in sciences, engineering, technology, and industries, especially for large-scale and high-dimensional learning problems. This talk focuses on the mathematical understanding of deep neural networks. In particular, a relation of the approximation properties of deep neural networks and function compositions is characterized. The approximation error of ReLU networks in terms of the width and depth is given for various function spaces, such as space of polynomials, continuous functions, or smooth functions on a hypercube. Finally, to achieve a better approximation error, we introduce a new type of network called Floor-ReLU networks, built with each neuron activated by either Floor or ReLU.  

Bio:2010年本科毕业于上海交通大学数学科学学院;2012年于美国德州大学奥斯汀分校获得数学硕士学位;2015年于美国斯坦福大学获得数学博士学位;毕业后于2015至2017年在美国杜克大学数学系担任访问助理教授,于2017至2019年在新加坡国立大学数学系和数据科学院担任助理教授。主要研究方向包括:机器学习,数据科学,应用和计算数学的理论基础和高效算法。

 

 

报告人:张仕俊

Affiliations: 新加坡国立大学

题目:Deep Network Approximation: Achieving Arbitrary Error with Fixed Size

摘要:This talk discusses a new type of simple feed-forward neural network that achieves the universal approximation property for all continuous functions with a fixed network size. This new type of neural network is simple because it is designed with a simple and computable continuous activation function σ leveraging a triangular-wave function and a softsign function. First, we prove that σ-activated networks with width 36d(2d + 1) and depth 11 can approximate any continuous function on a d-dimensional hypercube with an arbitrarily small error. Next, we show that classification functions arising from image and signal classification can be exactly represented by σ-activated networks with width 36d(2d + 1) and depth 12, when there exist pairwise disjoint closed bounded subsets of Rd such that the samples of the same class are located in the same subset.

Bio:张仕俊,新加坡国立大学博士后研究员,2020年博士毕业于新加坡国立大学数学系,2016年本科毕业于武汉大学数学与统计学院。研究方向主要集中于理论深度学习。

 

 

会议链接: https://meeting.tencent.com/dm/nFuqZlyiVkRJ

Tecent Meeting 下载链接: https://meeting.tencent.com/download-center.html

会议ID:124702512

会议密码:123456

会议直播间: B站机器学习联合培养计划直播间

备用会议ID(无密码): 625 2803 3607

时间:2021-10-17   9:30-11:30

题目:Embedding Principle of Loss Landscape of Deep Neural Networks

报告人:张耀宇

Affiliations: Shanghai Jiao Tong University

Replay: https://www.bilibili.com/video/BV1x44y1x7Qg

摘要:Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this talk, we present an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy and preserving the DNN output function. Note that, given any training data, differentiable loss function and differentiable activation function, this embedding structure of critical points holds. This general structure of DNNs is starkly different from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides a new perspective to study the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, this work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near future.

时间:2021-10-17   9:30-11:30

题目:Global Loss Landscape of Neural Networks: What do We Know and What do We Not Know?

报告人:孙若愚

Affiliations: University of Illinois at Urbana-Champaign

摘要:The recent success of neural networks suggests that their loss landscape is not too bad, but what concrete results do we know and not know about the landscape? In this talk, we present a few recent results on the landscape. First, non-linear neural nets can have sub-optimal local minima under mild assumptions (for arbitrary width, for generic input data and for most activation functions). Second, wide networks do not have sub-optimal ``basin'' for any continuous activation, while narrow networks can have sub-optimal basin. We will present a simple 2D geometrical object that is a basic component of neural net landscape, which can visually explain the above two results. Third, we show that for ReQU and ReLU networks, adding a proper regularizer can eliminate sub-optimal local minima and decreasing paths to infinity. Together, these results demonstrate that wide neural nets have a ``nice landscape'', but the meaning of ``nice landscape'' is more subtle than we expected. We will also mention the limitation of existing results, e.g., in what settings we do not know the existence of sub-optimal local minima. We will briefly discuss the relevance of the landscape results in two aspects: (1) these results can help understand the training difficulty of narrow networks; (2) these results can potentially help convergence analysis and implicit regularization.

时间:2021-9-25   9:30-11:30

题目:Some theoretical results on model-based reinforcement learning

报告人:Mengdi Wang

Affiliations: Princeton University

摘要:We discuss some recent results on model-based methods for reinforcement learning (RL) in both online and offline problems. For the online RL problem, we discuss several model-based RL methods that adaptively explore an unknown environment and learn to act with provable regret bounds. In particular, we focus on finite-horizon episodic RL where the unknown transition law belongs to a generic family of models. We propose a model based ‘value-targeted regression’ RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, for arbitrary family of transition models, using the notion of the so-called Eluder dimension proposed by Russo & Van Roy (2014). Next we discuss batch data (offline) reinforcement learning, where the goal is to predict the value of a new policy using data generated by some behavior policy (which may be unknown). We show that the fitted Q-iteration method with linear function approximation is equivalent to a model-based plugin estimator. We establish that this model-based estimator is minimax optimal and its statistical limit is determined by a form of restricted chi-square divergence between the two policies.

时间:2021-9-25   9:30-11:30

题目:Policy Cover Guided Exploration in Model-free and Model-based Reinforcement Learning

报告人:Wen Sun

Affiliations: Cornell University

Replay: https://www.bilibili.com/video/BV1hL4y1z7GN

摘要:Existing RL methods that leverage random exploration techniques often fail to learn efficiently in environments where strategic exploration is needed. In this talk, we introduce a new concept called Policy Cover, which is an ensemble of learned policies. The policy cover encodes information about which part of the state space is explored and such information can be used to further guide the exploration process. We show that this idea can be used in both model-free and model-based algorithmic frameworks. Particularly, for model-free learning, we present the first policy gradient algorithm--- Policy Cover Policy Gradient (PG-PG), that can explore and learn with polynomial sample complexity. For model-based setting, we present an algorithm — Policy-Cover Model Learning and Planning (PC-MLP), that also learns and explores with polynomial sample complexity. For both approaches, we show that they are flexible enough to be used together with deep neural networks, and their deep versions achieve state-of-art performance on common benchmarks, including exploration challenging tasks such as reward-free Maze exploration.

时间:2021-9-11   9:30-11:30

题目:In Search of Effective and Reproducible Clinical Imaging Biomarkers for Population Health and Oncology Applications of Screening, Diagnosis and Prognosis.

报告人:Le Lu (吕乐)

Affiliations: PhD, FIEEE, MICCAI Board Member, AE for TPAMI

摘要:This talk will first give an overall on the work of employing deep learning to permit novel clinical workflows in two population health tasks, namely using conventional ultrasound for liver steatosis screening and quantitative reporting; osteoporosis screening via conventional X-ray imaging and "AI readers". These two tasks were generally considered as infeasible tasks for human readers, but as proved by our scientific and clinical studies and peer-reviewed publications, they are suitable for AI readers. AI can be a supplementary and useful tool to assist physicians for cheaper and more convenient/precision patient management. Next, the main part of this talk describes a roadmap on three key problems in pancreatic cancer imaging solution: early screening, precision differential diagnosis, and deep prognosis on patient survival prediction. (1) Based on a new self-learning framework, we train the pancreatic ductal adenocarcinoma (PDAC) segmentation model using a larger quantity of patients, with a mix of annotated/unannotated venous or multi-phase CT images. Pseudo annotations are generated by combining two teacher models with different PDAC segmentation specialties on unannotated images, and can be further refined by a teaching assistant model that identifies associated vessels around the pancreas. Our approach makes it technically feasible for robust large-scale PDAC screening from multi-institutional multi-phase partially-annotated CT scans. (2) We propose a holistic segmentation-mesh classification network (SMCN) to provide patient-level diagnosis, by fully utilizing the geometry and location information. SMCN learns the pancreas and mass segmentation task and builds an anatomical correspondence-aware organ mesh model by progressively deforming a pancreas prototype on the raw segmentation mask. Our results are comparable to a multimodality clinical test that combines clinical, imaging, and molecular testing for clinical management of patients with cysts. (3) Accurate preoperative prognosis of resectable PDACs for personalized treatment is highly desired in clinical practice. We present a novel deep neural network for the survival prediction of resectable PDAC patients, 3D Contrast-Enhanced Convolutional Long Short-Term Memory network (CE-ConvLSTM), to derive the tumor attenuation signatures from CE-CT imaging studies. Our framework can significantly improve the prediction performances upon existing state-of-the-art survival analysis methods. This deep tumor signature has evidently added values (as a predictive biomarker) to be combined with the existing clinical staging system.

时间:2021-8-28   9:30-11:30

题目:Deep Learning for inverse problems

报告人:应乐兴

Affiliations: 斯坦福大学

Replay: https://www.bilibili.com/video/BV1Qq4y1K71Q

摘要:This talk is about some recent progress on solving inverse problems using deep learning. Compared to traditional machine learning problems, inverse problems are often limited by the size of the training data set. We show how to overcome this issue by incorporating mathematical analysis and physics into the design of neural network architectures. We first describe neural network representations of pseudodifferential operators and Fourier integral operators. We then continue to discuss applications including electric impedance tomography, optical tomography, inverse acoustic/EM scattering, seismic imaging, and travel-time tomography.

时间:2021-8-28   9:30-11:30

题目:Self-supervised Deep Learning for Solving Inverse Problems in Imaging

报告人:纪辉

Affiliations: 新加坡国立大学

Replay: https://www.bilibili.com/video/BV1uU4y177VF

摘要:Deep learning has become a prominent tool for solving many inverse problems in imaging sciences. Most existing SOTA solutions are built on supervised learning with a prerequisite on the availability of a large-scale dataset of many degraded/truth image pairs. In recent years, driven by practical need, there is an increasing interest on studying deep learning methods under limited data resources, which has particular significance for imaging in science and medicine. This talk will focus on the discussion of self-supervised deep learning for solving inverse imaging problems, which assumes no training sample is available. By examining deep learning from the perspective of Bayesian inference, we will present several results and techniques on self-supervised learning for MMSE (minimum mean squared error) estimator. Built on these techniques, we will show that, in several applications, the resulting dataset-free deep learning methods provide very competitive performance in comparison to their SOTA supervised counterparts. While the demonstrations only cover image denoising, compressed sensing, and phase retrieval, the presented techniques and methods are quite general which can used for solving many other inverse imaging problems.

时间:2021-8-14   9:30-11:30

题目:Barron Spaces

报告人:Jinchao Xu

Affiliations: Pennsylvania State University

Replay: https://www.bilibili.com/video/BV1Yq4y1Q7n2

摘要:In this talk, we will present some recent results related to the Barron space studiedf by E et al in 2019. For the ReLU activation function, we give an equivalent characterization of the corresponding Barron space in terms of the convex hull of an appropriate dictionary. This characterization enables a generalization of the notion of Barron space to neural networks with general activation functions, and even to general dictionaries of functions. We provide an explicit representation of some Barron norms which are of particular interest to the theory of neural networks, specifically corresponding to a dictionary of decaying Fourier modes and the dictionary corresponding to shallow ReLU^k networks. Next, we present optimal estimates of approximation rates, metric entropy, Kolmogorov and Gelfand n-widths of the Barron space unit ball with respect to L^2 for ReLU^k activation functions and the dictionary of decaying Fourier modes. These results provide a solution to several open problems concerning the precise approximation properties of these spaces. If time allows, we will also give recent results on the approximation rates and metric entropies for sigmoidal and ReLU Barron spaces with respect to L^p for p > 2. This talk is based on joint work with Jonathan Siegel.

时间:2021-8-14   9:30-11:30

题目:Convergence analysis for the gradient descent optimization method in the training of artificial neural networks with ReLU activation for piecewise linear target functions

报告人:Arnulf Jentzen

Affiliations:
Applied Mathematics: Institute for Analysis and Numerics, Faculty of Mathematics and Computer Science, University of Muenster, Germany;
School of Data Science and Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China

Slides: 下载链接

Replay: https://www.bilibili.com/video/BV1DU4y177A1

摘要:Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains -- even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer -- an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of ANNs with ReLU activation to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity. In this talk we prove this conjecture in the special situation where the probability distribution of the input data is absolutely continuous with respect to the continuous uniform distribution on a compact interval and where the target function under consideriation is piecewise linear.

时间:2021-7-31   9:30-11:30

题目:Learning and Learning to Solve PDEs

报告人:董彬, 北京大学

Slides: 下载链接

Replay: https://www.bilibili.com/video/BV1dh411z7JA

摘要:Deep learning continues to dominate machine learning and has been successful in computer vision, natural language processing, etc. Its impact has now expanded to many research areas in science and engineering. In this talk, I will mainly focus on some recent impact of deep learning on computational mathematics. I will present our recent work on bridging deep neural networks with numerical differential equations. On the one hand, I will show how to design transparent deep convolutional networks to uncover hidden PDE models from observed dynamical data. On the other hand, I will present our recent preliminary attempts to combine wisdoms from numerical PDEs and machine learning to design data-driven solvers for PDEs.

时间:2021-7-31   9:30-11:30

题目:Neural Operator: Learning Maps Between Function Spaces

报告人:李宗宜, 加州理工学院

Slides: 下载链接

Replay: https://www.bilibili.com/video/BV1MA411P7HU

摘要:The classical development of neural networks has primarily focused on learning mappings between finite dimensional Euclidean spaces or finite sets. We propose a generalization of neural networks tailored to learn operators mapping between infinite dimensional function spaces. We formulate the approximation of operators by composition of a class of linear integral operators and nonlinear activation functions, so that the composed operator can approximate complex nonlinear operators. We prove a universal approximation theorem for our construction. The proposed neural operators are resolution-invariant: they share the same network parameters between different discretization of the underlying function spaces and can be used for zero-shot super-resolutions. Numerically, the proposed models show superior performance compared to existing machine learning based methodologies on Burgers' equation, Darcy flow, and the Navier-Stokes equation, while being several order of magnitude faster compared to conventional PDE solvers.

时间:2021-7-17   9:30-11:30

题目:Frequency Principle

报告人:许志钦 (Zhi-Qin John Xu), 上海交通大学

Slides: 下载链接

Replay: https://www.bilibili.com/video/BV1Yy4y1T7M8

摘要:In this talk, I would introduce frequency principle (F-Principle) in detail, including experiments, and theory. I would also connect the F-Principle with traditional iterative methods, such as Jacobi methods, understanding the training of neural networks from the perspective of numerical analysis. Then, I will use some examples to show how F-Principle benefits the design of neural networks. Finally, I would talk about some open questions about the F-Principle.

时间:2021-7-3   9:30-11:30

题目:基于深度学习的分子动力学模拟

报告人:王涵 (北京应用物理与计算数学研究所)

Slides: 下载链接

摘要:分子动力学模拟需要对原子间相互作用(势函数)有一个精确的描述,然而人们面临两难困境:第一性原理方法精确但昂贵,经验势方法快速但精度有限。我们在报告中从两个方面讨论了解决办法:势函数构造和数据生成。在势函数构造方面,我们介绍深度势能方法,这是一个对第一性原理势函数的精确表示。在数据生成方面,我们介绍同步学习格式DP-GEN。这个方法能自动生成满足特定精度要求的最小训练数据集。相比于经验势,DP-GEN开启了通过探索构型和化学空间持续改进深度势能的可能性。在报告的最后部分,我们介绍深度势能方法针对CPU+GPU异构超级计算机的优化实现。这个实现在超级计算机顶点(Summit)上达到了双精度91P的峰值性能,在一天内能够完成纳秒量级的第一性原理精度分子动力学模拟,快于之前基线水平1000倍以上。在我们的工作中,物理模型、深度学习和高性能计算的结合为重大科学发现提供了有力模拟工具。

时间:2021-7-3   9:30-11:30

题目:DeePKS: a machine learning assisted electronic structure model

报告人:张林峰 (北京大数据研究院、深势科技)

Slides: 下载链接

摘要:We introduce a general machine learning-based framework for building an accurate and widely-applicable energy functional within the framework of generalized Kohn-Sham density functional theory. In particular, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels. We demonstrate that the functional that results from this training procedure, with the efficiency of cheap density functional models, gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules.

时间:2021-6-19   9:30-11:30

题目:Modelling Temporal Data: from RNNs to CNNs

报告人:Zhong Li, Qianxiao Li

Replay: https://www.bilibili.com/video/BV1Wq4y1L7Cg

摘要:There are several competing models in deep learning when modelling input-output relationships in temporal data: recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. In recent work, we study the approximation properties and optimization dynamics of RNNs when applied to learn temporal dynamics. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals and characterize the approximation rate. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs by gradient methods. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on both approximation and optimization: when there is longterm memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from severe slow downs. In particular, both of these effects become exponentially more pronounced with increasing memory - a phenomenon we call the “curse of memory”. These analyses represent a basic step towards a concrete mathematical understanding of new phenomenons arising in learning temporal relationships using recurrent architectures.

We also study the approximation properties of convolutional architectures applied to time series modelling. Similar to the recurrent setting, parallel results for convolutional architectures are derived regarding to the approximation efficiency, with WaveNet being a prime example. Our results reveal that under this new setting, the approximation efficiency is not only characterized by memory, but also additional fine structures in the target relationship. This leads to a novel definition of spectrum-based regularity that measures the complexity of temporal relationships under the convolutional approximation scheme. These analyses provide a foundation to understand the differences between architectural choices for temporal modelling with theoretically grounded guidance for practical applications.

时间:2021-6-5   9:30-11:30

题目:Implicit biases of SGD for neural network models

报告人:Lei Wu (Princeton University)

Replay: https://www.bilibili.com/video/BV1xv411V7v3

Slides: 下载链接

摘要:Understanding the implicit biases of optimization algorithms is one of the core problems in theoretical machine learning. This refers to the fact that even without any explicit regularizations to avoid overfitting, the dynamics of an optimizer itself is biased to pick solutions that generalize well. This talk introduces the recent progress in understanding the implicit bias of stochastic gradient descent (SGD) for neural network models. First, we consider the gradient descent flow, i.e., SGD with an infinitesimal learning rate, for two-layer neural networks. In particular, we will see how the implicit bias is affected by the extent of over-parameterization. Then, we turn to SGD with a finite learning rate. The influence of learning rate as well as the batch size will be studied from the perspective of dynamical stability. The concept of uniformity is introduced, which, together, with flatness characterizes the accessibility of a particular SGD to a global minimum. This analysis shows that learning rate and batch size play different roles in selecting global minima. Extensive empirical results correlate well with the theoretical findings.

时间:2021-6-5   9:30-11:30

题目:The landscape-dependent annealing strategy in machine learning: How Stochastic-Gradient-Descent finds flat minima

报告人:Yuhai Tu (IBM T. J. Watson Research Center)

Replay: https://www.bilibili.com/video/BV1v64y1R7Xx

Slides: 下载链接

摘要:Despite tremendous success of the Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we discuss our recent work [1,2] on establishing a theoretical framework based on nonequilibrium statistical physics to understand the SGD learning dynamics, the loss function landscape, and their relation. Our study shows that SGD dynamics follows a low-dimensional drift-diffusion motion in the weight space and the loss function is flat with large values of flatness (inverse of curvature) in most directions. Furthermore, our study reveals a robust inverse relation between the weight variance in SGD and the landscape flatness opposite to the fluctuation-response relation in equilibrium systems. We develop a statistical theory of SGD based on properties of the ensemble of minibatch loss functions and show that the noise strength in SGD depends inversely on the landscape flatness, which explains the inverse variance-flatness relation. Our study suggests that SGD serves as an ``intelligent" annealing strategy where the effective temperature self-adjusts according to the loss landscape in order to find the flat minimum regions that contain generalizable solutions. Finally, we discuss an application of these insights for reducing catastrophic forgetting efficiently for sequential multiple tasks learning.

Reference:
1. “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021.
2. “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), April 7, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf

时间:2021-6-5   9:30-11:30

题目:Stochastic gradient descent for noise with ML-type scaling

报告人:Stephan Wojtowytsch (Princeton University)

Replay: https://www.bilibili.com/video/BV1ky4y137ou

Slides: 下载链接

摘要:In the literature on stochastic gradient descent, there are two types of convergence results: (1) SGD finds minimizers of convex objective functions and (2) SGD finds critical points of smooth objective functions. Classical results are obtained under the assumption that the stochastic noise is L^2-bounded and that the learning rate decays to zero at a suitable speed. We show that, if the objective landscape and noise possess certain properties which are reminiscent of deep learning problems, then we can obtain global convergence guarantees of first type under second type assumptions for a fixed (small, but positive) learning rate. The convergence is exponential, but with a large random coefficient. If the learning rate exceeds a certain threshold, we discuss minimum selection by studying the invariant distribution of a continuous time SGD model. We show that at a critical threshold, SGD prefers minimizers where the objective function is 'flat' in a precise sense.

时间:2021年5月22号,9:30-11:30

题目:像孩子一样学习:超大规模多模态预训练模型

报告人:文继荣

Replay: https://www.bilibili.com/video/BV1Z54y1G7du

摘要:认知科学的体验革命带来人们对于从语言理解意义的新观点:思考以及使用语言的能力是我们的肉身与头脑合作的成果。肉身包括视觉、听觉、嗅觉、触觉和运动神经等各种各样的模态。人类的孩子是在多模态环境下学习语言,这也是目前AI欠缺的部分。讲座将介绍我们在语言和图像的跨模态理解方面的一些工作。我们从视觉和语言的关系出发,利用互联网产生的千万甚至上亿的成对图片与文字,用自监督的任务完成一个目前最大的中文通用图文预训练模型悟道∙文澜,由此去初步探索AI在多模态环境中学习语言的可能性。通过分析语言从单模态到多模态学习发生的变化,我们发现一些与人类认知密切相关的现象。

时间:2021年5月22号,9:30-11:30

题目:知识指导的预训练语言模型

报告人:刘知远

Replay: https://www.bilibili.com/video/BV1MK4y1G7Eb

摘要:近年来深度学习成为自然语言处理关键技术,特别是2018年以来的预训练语言模型显著提升了自然语言处理的整体性能。作为典型的数据驱动方法,以预训练语言模型为代表的深度学习仍然面临可解释性不强、鲁棒性差等难题,如何将人类积累的大量语言知识和世界知识引入模型,是改进深度学习性能的重要方向,同时也面临很多挑战。本报告将系统介绍知识指导的预训练语言模型的最新进展与趋势。

时间:2021年5月9号,9:30-11:30

题目:强化动力学简介

报告人:王涵

Slides: 下载链接

摘要:本报告将介绍分子动力学模拟的基本概念,以及分子动力学模拟中的根本性难题——采样问题。我们简单介绍两类增强采样方法的局限与挑战。特别地,报告将详细介绍我们对增强采样问题的解决方案:一种基于深度学习的增强采样方法——强化动力学(reinforced dynamics)。最后我们展示强化动力学在蛋白质结构预测方面的效果。

时间:2021年5月9号,9:30-11:30

题目:流模型:计算物理视角

报告人:王磊

B站: https://www.bilibili.com/video/BV1gp4y1t7ND

Slides: 下载链接

摘要:本报告将结合一些个人的研究体会,介绍与流模型(flow-based generative model)相关的科学问题与科学应用。从计算物理的视角出发,我们将看到流模型与最优输运理论、流体力学、辛几何算法、重正化群和蒙特卡罗计算等领域的关联。

时间:2021年4月17号,10:30-11:30

题目:理解神经网络的训练过程

报告人:许志钦(上海交通大学)

B站: https://www.bilibili.com/video/BV1Rh411U7xH

Slides: 下载链接

摘要:仅从逼近论角度,一个过参数化的神经网络可以有无穷多组解使得训练集的误差最小。而实际训练过程,神经网络似乎总能找到泛化不错的解。为了理解神经网络如何从无穷多种可能中学到一类泛化好的解,我们有必要理解找到解所经历的训练过程。本次报告,我将介绍关于神经网络训练过程的一些进展,比如神经网络在训练过程中的复杂度变化,频率行为以及初始化如何影响训练。最后,我将讨论一些公开问题,探索从训练过程理解神经网络的发展趋势。

时间:2021年4月17号,9:30-10:30

题目:Machine Learning and Dynamical Systems

报告人:李千骁(新加坡国立大学)

B站: https://www.bilibili.com/video/BV1jh411S7ds

Slides: 下载链接

摘要:In this talk, we discuss some recent work on the connections between machine learning and dynamical systems. These come broadly in three categories, namely machine learning via, for and of dynamical systems, and here we will focus on the first two. In the direction of machine learning via dynamical systems, we introduce a dynamical approach to deep learning theory with particular emphasis on its connections with control theory. In the reverse direction of machine learning for dynamical systems, we discuss the approximation and optimization theory of learning input-output temporal relationships using recurrent neural networks, with the goal of highlighting key new phenomena that arise in learning in dynamic settings. If time permits, we will also discuss some applications of dynamical systems on the analysis of optimization algorithms commonly applied in machine learning.

时间:2021年4月3号,10:30-11:30

题目:神经网络和高维函数逼近

报告人:吴磊(普林斯顿大学)

B站: https://www.bilibili.com/video/BV1R64y1m7fJ

摘要:近年来,以神经网络模型为基础的深度学习方法在不同领域取得了前所未有的成功, 例如计算机视觉、科学计算等。从逼近论的角度来说,这些成功依赖于神经网络强大的逼近高维函数的能力。而我们知道传统方法在逼近高维函数时必然会遭受维数诅咒。 这是否表明神经网络在某种意义下可以避免维数诅咒?如果可以,那么背后的机制又是什么呢? 我们将围绕kernel方法、两层神经网络和深度残差网络三个模型来讨论这些问题。 特别地,我们将刻画每个模型所逼近的高维函数空间。最后,我们会罗列一些公开问题帮助大家对这个领域有个整体理解。

时间:2021年4月3号,9:30-10:30

题目:利用深度学习求解高维控制问题

报告人:韩劼群(普林斯顿大学)

B站: https://www.bilibili.com/video/BV1uK411c7Cz

摘要:近年来深度学习的蓬勃发展为我们求解高维计算提供了新的强有力工具,其中梯度反向传播和随机梯度下降的算法为我们求解最优神经网络提供了高效的算法。本报告将先以控制论的视角来分析神经网络,探讨求解最优神经网络和最优控制问题的相似点以及上述算法对于我们求解高维控制问题的启发。在此启发下我们会展示两个利用深度学习求解高维控制问题的工作,(1) 求解基于模型的高维随机控制问题;(2) 结合倒向随机方程的变分形式求解抛物类偏微分方程。这些算法相比于以往的受限于维数灾难的传统算法体现出巨大的计算优势,大幅度提高了我们处理一大类高维问题的计算能力。最后我们会讨论一些相关方向的未解决问题帮助大家对这个领域有更好的理解。

2020年第3次课程《深度学习基础和实践》之三

主讲:张林峰

时间:2020年11月15日 10:00 - 12:00

B站: https://www.bilibili.com/video/BV1i64y1y7nN?p=2

2020年第2次课程《深度学习基础和实践》之二

主讲:吴磊

时间:2020年11月08日 10:00 - 12:00

B站: https://www.bilibili.com/video/BV1i64y1y7nN?p=1

2020年第1次课程《深度学习基础和实践》之一

主讲:鄂维南

时间:2020年11月01日 10:00 - 12:00

订阅活动信息

如果需要订阅活动信息,请输入下方的邮箱, 获取验证码并点击订阅按钮。订阅后,在每次活动前将会使用邮件将会议的链接及内容简介发到您的邮箱中。

邮箱
验证码