November 14, 2023
Kyunghyun Cho, New York University
Already in 2015, Leon Bottou discussed the prevalence and end of the training/test experimental paradigm in machine learning. The machine learning community has however continued to stick to this paradigm until now (2023), relying almost entirely and exclusively on the test-set accuracy, which is a rough proxy to the true quality of a machine learning system we want to measure. There are however many aspects in building a machine learning system that require more attention. Specifically, I will discuss three such aspects in this talk; (1) model assumption and construction, (2) optimization and (3) inference. For model assumption and construction, I will discuss our recent work on generative multitask learning and incidental correlation in multimodal learning. For optimization, I will talk about how we can systematically study and investigate learning trajectories. Finally for inference, I will lay out two consistencies that must be satisfied by a large-scale language model and demonstrate that most of the language models do not fully satisfy such consistencies.
November 7, 2023
Berfin Şimşek, New York University
In this talk, I will present an average-case analysis of finite-width neural networks through permutation symmetry. First, I will give a new scaling law for the critical manifolds of finite-width neural networks derived from counting all partitions due to neuron splitting from an initial set of neurons. Considering the invariance of zero neuron addition, we derive the scaling law of the zero-loss manifolds that is exact for the population loss. In a simplified setting, a factor 2log2 of overparameterization guarantees that the zero-loss manifolds are the most numerous. Our complexity calculations show that the loss landscape of neural networks exhibits extreme non-convexity at the onset of overparameterization, which is tamed gradually with overparameterization, and it effectively vanishes for infinitely wide networks. Finally, based on the theory, we will propose an `Expand-Cluster’ algorithm for model identification in practice.
October 31, 2023
Edward Johns, Robot Learning Lab at Imperial College London
Most of the major recent breakthroughs in AI have relied on training huge neural networks on huge amounts of data. But what about a breakthrough in real-world robotics? One of the challenges is that physical robotics data is very scarce, and very expensive to collect. To address this, my team and I have been developing very data-efficient methods for robots to learn new tasks through human demonstrations. Using these methods, we are now able to quickly teach robots a range of everyday tasks, such as hammering in a nail, inserting a plug into a socket, and scooping up an object with a spatula. However, even with these efficient methods, providing human demonstrations can be laborious. Therefore, we have also been exploring the use of off-the-shelf neural networks trained on web-scale data, such as OpenAI’s DALL-E and GPT, to act as a robot’s “imagination” or its “internal monologue” when solving new tasks. Through this talk, we will explore the importance of image, language, and action data in robotics, as the three ingredients for scalable robot learning.
October 17, 2023
Petar Velickovic from Google DeepMind
When deploying graph neural networks, we often make a seemingly innocent assumption: that the input graph we are given is the ground-truth. However, as my talk will unpack, this is often not the case: even when the graphs are perfectly correct, they may be severely suboptimal for completing the task at hand. This will introduce us to a rich and vibrant area of graph rewiring, which is experiencing a renaissance in recent times. I will discuss some of the most representative works, including two of our own contributions (https://arxiv.org/abs/2210.02997, https://arxiv.org/abs/2306.03589), one of which won the Best Paper Award at the Graph Learning Frontiers Workshop at NeurIPS’22.
October 10, 2023
Eunsol Choi, University of Texas at Austin
Modern language models have the capacity to store and use immense amounts of knowledge about real world. Yet, their knowledge about the world is often incorrect or outdated, motivating ways to augment their knowledge. In this talk, I will present two complementary avenues for knowledge augmentation: (1) a modular, retrieval-based approach which brings in new information at inference time and (2) a parameter updating approach which aims to enable models to internalize new information and make inferences based on it.