Yann LeCun

New York University

H-index: 145

North America-United States

About Yann LeCun

Yann LeCun, With an exceptional h-index of 145 and a recent h-index of 113 (since 2020), a distinguished researcher at New York University, specializes in the field of AI, machine learning, computer vision, robotics, image compression.

His recent articles reflect a diverse array of research interests and contributions to the field:

Learning and Leveraging World Models in Visual Representation Learning

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Fast and exact enumeration of deep networks partitions regions

An Information Theory Perspective on Variance-Invariance-Covariance Regularization

Eyes wide shut? exploring the visual shortcomings of multimodal llms

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

Learning by Reconstruction Produces Uninformative Features For Perception

To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review

Yann LeCun Information

University	New York University
Position	Chief AI Scientist at Facebook & Silver Professor at the Courant Institute
Citations(all)	338733
Citations(since 2020)	228006
Cited By	196357
hIndex(all)	145
hIndex(since 2020)	113
i10Index(all)	364
i10Index(since 2020)	286
Email	Access Email
University Profile Page	New York University

Yann LeCun Skills & Research Interests

machine learning

computer vision

robotics

image compression

Top articles of Yann LeCun

Learning and Leveraging World Models in Visual Representation Learning

Authors

Quentin Garrido,Mahmoud Assran,Nicolas Ballas,Adrien Bardes,Laurent Najman,Yann LeCun

Journal

arXiv preprint arXiv:2403.00504

Published Date

2024/3/1

Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict the effect of global photometric transformations in latent space. We study the recipe of learning performant IWMs and show that it relies on three key aspects: conditioning, prediction difficulty, and capacity. Additionally, we show that the predictive world model learned by IWM can be adapted through finetuning to solve diverse tasks; a fine-tuned IWM world model matches or surpasses the performance of previous self-supervised methods. Finally, we show that learning with an IWM allows one to control the abstraction level of the learned representations, learning invariant representations such as contrastive methods, or equivariant representations such as masked image modelling.

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Authors

Amir Bar,Arya Bakhtiar,Danny Tran,Antonio Loquercio,Jathushan Rajasegaran,Yann LeCun,Amir Globerson,Trevor Darrell

Journal

arXiv preprint arXiv:2404.09991

Published Date

2024/4/15

Authors

Ravid Shwartz Ziv,Yann LeCun

Published Date

2024/3/12

Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory has shaped deep neural networks, particularly the information bottleneck principle. This principle optimizes the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. This framework includes multiple encoders and decoders, suggesting that all existing work on self-supervised learning can be seen as specific instances. We aim to unify these approaches to understand their underlying principles better and address the main challenge: many works present different frameworks with differing theories that may seem contradictory. By weaving existing research into a cohesive narrative, we delve into contemporary self-supervised methodologies, spotlight potential research areas, and highlight inherent challenges. Moreover, we discuss how to estimate information-theoretic quantities and their associated empirical problems. Overall, this paper provides a comprehensive review of the intersection of information theory, self-supervised learning, and deep neural networks, aiming for a better understanding through our …

Learning With Fewer Labels in Computer Vision

Authors

Li Liu,Timothy Hospedales,Yann LeCun,Mingsheng Long,Jiebo Luo,Wanli Ouyang,Matti Pietikäinen,Tinne Tuytelaars

Journal

IEEE Transactions on Pattern Analysis and Machine Intelligence

Published Date

2024/2/6

Undoubtedly, Deep Neural Networks (DNNs), from AlexNet to ResNet to Transformer, have sparked revolutionary advancements in diverse computer vision tasks. The scale of DNNs has grown exponentially due to the rapid development of computational resources. Despite the tremendous success, DNNs typically depend on massive amounts of training data (especially the recent various foundation models) to achieve high performance and are brittle in that their performance can degrade severely with small changes in their operating environment. Generally, collecting massive-scale training datasets is costly or even infeasible, as for certain fields, only very limited or no examples at all can be gathered. Nevertheless, collecting, labeling, and vetting massive amounts of practical training data is certainly difficult and expensive, as it requires the painstaking efforts of experienced human annotators or experts, and in …

Revisiting feature prediction for learning visual representations from video

Authors

Adrien Bardes,Quentin Garrido,Jean Ponce,Xinlei Chen,Michael Rabbat,Yann LeCun,Mahmoud Assran,Nicolas Ballas

Journal

arXiv preprint arXiv:2404.08471

Published Date

2024/2/15

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

Predicting masked tokens in stochastic locations improves masked image modeling

Authors

Amir Bar,Florian Bordes,Assaf Shocher,Mahmoud Assran,Pascal Vincent,Nicolas Ballas,Trevor Darrell,Amir Globerson,Yann LeCun

Journal

arXiv preprint arXiv:2308.00566

Published Date

2023/7/31

Self-supervised learning is a promising paradigm in deep learning that enables learning from unlabeled data by constructing pretext tasks that require learning useful representations. In natural language processing, the dominant pretext task has been masked language modeling (MLM), while in computer vision there exists an equivalent called Masked Image Modeling (MIM). However, MIM is challenging because it requires predicting semantic content in accurate locations. E.g, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose FlexPredict, a stochastic model that addresses this challenge by incorporating location uncertainty into the model. Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties. Our approach improves downstream performance on a range of tasks, e.g, compared to MIM baselines, FlexPredict boosts ImageNet linear probing by 1.6% with ViT-B and by 2.5% for semi-supervised video segmentation using ViT-L.

URLOST: Unsupervised Representation Learning without Stationarity or Topology

Authors

Zeyu Yun,Juexiao Zhang,Bruno Olshausen,Yann LeCun,Yubei Chen

Journal

arXiv preprint arXiv:2310.04496

Published Date

2023/10/6

Unsupervised representation learning has seen tremendous progress but is constrained by its reliance on data modality-specific stationarity and topology, a limitation not found in biological intelligence systems. For instance, human vision processes visual signals derived from irregular and non-stationary sampling lattices yet accurately perceives the geometry of the world. We introduce a novel framework that learns from high-dimensional data lacking stationarity and topology. Our model combines a learnable self-organizing layer, density adjusted spectral clustering, and masked autoencoders. We evaluate its effectiveness on simulated biological vision data, neural recordings from the primary visual cortex, and gene expression datasets. Compared to state-of-the-art unsupervised learning methods like SimCLR and MAE, our model excels at learning meaningful representations across diverse modalities without depending on stationarity or topology. It also outperforms other methods not dependent on these factors, setting a new benchmark in the field. This work represents a step toward unsupervised learning methods that can generalize across diverse high-dimensional data modalities.

Variance-Covariance Regularization Improves Representation Learning

Authors

Jiachen Zhu,Katrina Evtimova,Yubei Chen,Ravid Shwartz-Ziv,Yann LeCun

Journal

arXiv preprint arXiv:2306.13292

Published Date

2024

Transfer learning has emerged as a key approach in the machine learning domain, enabling the application of knowledge derived from one domain to improve performance on subsequent tasks. Given the often limited information about these subsequent tasks, a strong transfer learning approach calls for the model to capture a diverse range of features during the initial pretraining stage. However, recent research suggests that, without sufficient regularization, the network tends to concentrate on features that primarily reduce the pretraining loss function. This tendency can result in inadequate feature learning and impaired generalization capability for target tasks. To address this issue, we propose Variance-Covariance Regularization (VCR), a regularization technique aimed at fostering diversity in the learned network features. Drawing inspiration from recent advancements in the self-supervised learning approach, our approach promotes learned representations that exhibit high variance and minimal covariance, thus preventing the network from focusing solely on loss-reducing features. We empirically validate the efficacy of our method through comprehensive experiments coupled with in-depth analytical studies on the learned representations. In addition, we develop an efficient implementation strategy that assures minimal computational overhead associated with our method. Our results indicate that VCR is a powerful and efficient method for enhancing transfer learning performance for both supervised learning and self-supervised learning, opening new possibilities for future research in this domain.

Compact and optimal deep learning with recurrent parameter generators

Authors

Jiayun Wang,Yubei Chen,Stella X Yu,Brian Cheung,Yann LeCun

Published Date

2023

Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of an arbitrary architecture, in one-stage end-to-end learning. Specifically, we create a recurrent parameter generator (RPG), which repeatedly fetches parameters from a ring and unpacks them onto a large model with random permutation and sign flipping to promote parameter decorrelation. We show that gradient descent can automatically find the best model under constraints with in fact faster convergence. Our extensive experimentation reveals a log-linear relationship between model DoF and accuracy. Our RPG demonstrates remarkable DoF reduction, and can be further pruned and quantized for additional run-time performance gain. For example, in terms of top-1 accuracy on ImageNet, RPG achieves 96% of ResNet18's performance with only 18% DoF (the equivalent of one convolutional layer) and 52% of ResNet34's performance with only 0.25% DoF! Our work shows significant potential of constrained neural optimization in compact and optimal deep learning.

Blockwise self-supervised learning at scale

Authors

Shoaib Ahmed Siddiqui,David Krueger,Yann LeCun,Stéphane Deny

Journal

arXiv preprint arXiv:2302.01647

Published Date

2023/2/3

Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.

The ssl interplay: Augmentations, inductive bias, and generalization

Authors

Vivien Cabannes,Bobak T Kiani,Randall Balestriero,Yann LeCun,Alberto Bietti

Published Date

2023/2/6

Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm.% on the resulting performance in downstream tasks. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in kernel regimes, and highlight several insights for SSL practitioners that arise from our theory.

Just How Flexible are Neural Networks in Practice?

Authors

Ravid Shwartz-Ziv,Micah Goldblum,Arpit Bansal,C Bayan Bruss,Yann LeCun,Andrew Gordon Wilson

Published Date

2023/10/13

It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function class, built into an architecture, shapes its loss surface and impacts the minima we find. In this work, we examine the ability of neural networks to fit data in practice. Our findings indicate that: (1) standard optimizers find minima where the model can only fit training sets with significantly fewer samples than it has parameters; (2) convolutional networks are more parameter-efficient than MLPs and ViTs, even on randomly labeled data; (3) whereas stochastic training is thought to have a regularizing effect, SGD actually finds minima that fit more training data than full-batch gradient descent; (4) the difference in capacity to fit correctly labeled and incorrectly labeled samples predicts generalization; (5) ReLU activation functions enable fitting more data despite being designed to avoid vanishing and exploding gradients in deep architectures.

POLICE: Provably optimal linear constraint enforcement for deep neural networks

Authors

Randall Balestriero,Yann LeCun

Published Date

2023/6/4

Authors

Jacob Browning,Yann LeCun

Published Date

2023/10/6

Since the 1950s, philosophers and AI researchers have held that disambiguating natural language sentences depended on common sense. In 2011, the Winograd Schema Challenge was established to evaluate the common-sense reasoning abilities of a machine by testing its ability to disambiguate sentences. The designers argued only a system capable of “thinking in the full-bodied sense” would be able to pass the test. However, by 2021, the original authors concede the test has been soundly defeated by large language models which still seem to lack common sense of full-bodied thinking. In this paper, we argue that disambiguating sentences only seemed like a good test of common-sense based on a certain picture of the relationship between linguistic comprehension and semantic knowledge—one typically associated with the early computational theory of mind and Symbolic AI. If this picture is rejected, as …

An information-theoretic perspective on variance-invariance-covariance regularization

Authors

Ravid Shwartz-Ziv,Randall Balestriero,Kenji Kawaguchi,Tim GJ Rudner,Yann LeCun

Published Date

2023/12

In this paper, we provide an information-theoretic perspective on Variance-Invariance-Covariance Regularization (VICReg) for self-supervised learning. To do so, we first demonstrate how information-theoretic quantities can be obtained for deterministic networks as an alternative to the commonly used unrealistic stochastic networks assumption. Next, we relate the VICReg objective to mutual information maximization and use it to highlight the underlying assumptions of the objective. Based on this relationship, we derive a generalization bound for VICReg, providing generalization guarantees for downstream supervised learning tasks and present new self-supervised learning methods, derived from a mutual information maximization objective, that outperform existing methods in terms of performance. This work provides a new information-theoretic perspective on self-supervised learning and Variance-Invariance-Covariance Regularization in particular and guides the way for improved transfer learning via information-theoretic self-supervised learning objectives.

Emp-ssl: Towards self-supervised learning in one training epoch

Authors

Shengbang Tong,Yubei Chen,Yi Ma,Yann Lecun

Journal

arXiv preprint arXiv:2304.03977

Published Date

2023/4/8

Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each image instance. Leveraging one of the state-of-the-art SSL method, we introduce a simplistic form of self-supervised learning method called Extreme-Multi-Patch Self-Supervised-Learning (EMP-SSL) that does not rely on many heuristic techniques for SSL such as weight sharing between the branches, feature-wise normalization, output quantization, and stop gradient, etc, and reduces the training epochs by two orders of magnitude. We show that the proposed method is able to converge to 85.1% on CIFAR-10, 58.5% on CIFAR-100, 38.1% on Tiny ImageNet and 58.5% on ImageNet-100 in just one epoch. Furthermore, the proposed method achieves 91.5% on CIFAR-10, 70.1% on CIFAR-100, 51.5% on Tiny ImageNet and 78.9% on ImageNet-100 with linear probing in less than ten training epochs. In addition, we show that EMP-SSL shows significantly better transferability to out-of-domain datasets compared to baseline SSL methods. We will release the code in https://github.com/tsb0601/EMP-SSL.

Rankme: Assessing the downstream performance of pretrained self-supervised representations by their rank

Authors

Quentin Garrido,Randall Balestriero,Laurent Najman,Yann Lecun

Published Date

2023/7/3

Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL’s core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method—coined RankMe—allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset’s labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations’ quality assessment.

An Information-Theoretic Understanding of Maximum Manifold Capacity Representations

Authors

Berivan Isik,Victor Lecomte,Rylan Schaeffer,Yann LeCun,Mikail Khona,Ravid Shwartz-Ziv,Sanmi Koyejo,Andrey Gromov

Published Date

2023/12/18

Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is interesting for at least two reasons. Firstly, MMCR is an oddity in the zoo of MVSSL methods: it is not (explicitly) contrastive, applies no masking, performs no clustering, leverages no distillation, and does not (explicitly) reduce redundancy. Secondly, while many self-supervised learning (SSL) methods originate in information theory, MMCR distinguishes itself by claiming a different origin: a statistical mechanical characterization of the geometry of linear separability of data manifolds. However, given the rich connections between statistical mechanics and information theory, and given recent work showing how many SSL methods can be understood from an information-theoretic perspective, we conjecture that MMCR can be similarly understood from an information-theoretic perspective. In this paper, we leverage tools from high dimensional probability and information theory to demonstrate that an optimal solution to MMCR's nuclear norm-based objective function is the same optimal solution that maximizes a well-known lower bound on mutual information.

Mc-jepa: A joint-embedding predictive architecture for self-supervised learning of motion and content features

Authors

Adrien Bardes,Jean Ponce,Yann LeCun

Journal

arXiv preprint arXiv:2307.12698

Published Date

2023/7/24

Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and introduce MC-JEPA, a joint-embedding predictive architecture and self-supervised learning approach to jointly learn optical flow and content features within a shared encoder, demonstrating that the two associated objectives; the optical flow estimation objective and the self-supervised learning objective; benefit from each other and thus learn content features that incorporate motion information. The proposed approach achieves performance on-par with existing unsupervised optical flow benchmarks, as well as with common self-supervised learning approaches on downstream tasks such as semantic segmentation of images and videos.

Catalyzing next-generation artificial intelligence through neuroai

Authors

Anthony Zador,Sean Escola,Blake Richards,Bence Ölveczky,Yoshua Bengio,Kwabena Boahen,Matthew Botvinick,Dmitri Chklovskii,Anne Churchland,Claudia Clopath,James DiCarlo,Surya Ganguli,Jeff Hawkins,Konrad Körding,Alexei Koulakov,Yann LeCun,Timothy Lillicrap,Adam Marblestone,Bruno Olshausen,Alexandre Pouget,Cristina Savin,Terrence Sejnowski,Eero Simoncelli,Sara Solla,David Sussillo,Andreas S Tolias,Doris Tsao

Published Date

2023/3/22

Authors

Xiaoxin He,Bryan Hooi,Thomas Laurent,Adam Perold,Yann LeCun,Xavier Bresson

Published Date

2023/7/3

Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph ViT/MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on Long Range Graph Benchmark and TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them. The source code is available for reproducibility at: https://github. com/XiaoxinHe/Graph-ViT-MLPMixer.

Gradient-based Planning with World Models

Authors

Jyothir SV,Siddhartha Jalagam,Yann LeCun,Vlad Sobal

Journal

arXiv preprint arXiv:2312.17227

Authors

Ravid Shwartz-Ziv,Randall Balestriero,Kenji Kawaguchi,Yann LeCun

Published Date

2022/9/29

In this paper, we provide an information-theoretic (IT) understanding of self-supervised learning methods, their construction, and optimality. As a first step, we demonstrate how IT quantities can be obtained for deterministic networks, as an alternative to the commonly used unrealistic stochastic networks assumption. Secondly, we demonstrate how different SSL models can be (re)discovered based on first principles and highlight what the underlying assumptions of different SSL variants are. Third, we derive a novel generalization bound based on our IT understanding of SSL methods, providing generalization guarantees for the downstream supervised learning task. As a result of this bound, along with our unified view of SSL, we can compare the different approaches and provide general guidelines to practitioners. Consequently, our derivation and insights can contribute to a better understanding of SSL and transfer learning from a theoretical and practical perspective.

Minimalistic unsupervised learning with the sparse manifold transform

Authors

Yubei Chen,Zeyu Yun,Yi Ma,Bruno Olshausen,Yann LeCun

Journal

arXiv preprint arXiv:2209.15261

Published Date

2022/9/30

We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10 and 53.2% on CIFAR-100. With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100. These results significantly close the gap between simplistic "white-box" methods and the SOTA methods. Additionally, we provide visualization to explain how an unsupervised representation transform is formed. The proposed method is closely connected to latent-embedding self-supervised methods and can be treated as the simplest form of VICReg. Though there remains a small performance gap between our simple constructive model and SOTA methods, the evidence points to this as a promising direction for achieving a principled and white-box approach to unsupervised learning.

Joint embedding predictive architectures focus on slow features

Authors

Vlad Sobal,Jyothir SV,Siddhartha Jalagam,Nicolas Carion,Kyunghyun Cho,Yann LeCun

Journal

arXiv preprint arXiv:2211.10831

Published Date

2022/11/20

Many common methods for learning a world model for pixel-based environments use generative architectures trained with pixel-level reconstruction objectives. Recently proposed Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. In this work, we analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards, and compare the results to the performance of the generative architecture. We test the methods in a simple environment with a moving dot with various background distractors, and probe learned representations for the dot's location. We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed. Furthermore, we provide a theoretical explanation for the poor performance of JEPA-based methods with fixed noise, highlighting an important limitation.

VICReg: Variance-invariance-covariance regularization for self-supervised learning

Authors

Adrien Bardes,Jean Ponce,Yann LeCun

Published Date

2022

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.

Masked siamese convnets

Authors

Li Jing,Jiachen Zhu,Yann LeCun

Journal

arXiv preprint arXiv:2206.07700

Published Date

2022/6/15

Authors

Grégoire Mialon,Randall Balestriero,Yann LeCun

Published Date

2022/9/29

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector’s output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that VCReg enforces pairwise independence between the features of the learned representation. This result emerges by bridging VCReg applied on the projector’s output to kernel independence criteria applied on the projector’s input. This provides the first theoretical motivations and explanations of VCReg. We empirically validate our findings where (i) we put in evidence which projector’s characteristics favor pairwise independence, (ii) we use these findings to obtain nontrivial performance gains for VICReg, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. We hope that our findings will support the adoption of VCReg in SSL and beyond.

Decoupled contrastive learning

Authors

Chun-Hsiao Yeh,Cheng-Yao Hong,Yen-Chi Hsu,Tyng-Luh Liu,Yubei Chen,Yann LeCun

Published Date

2022/10/23

Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented “views” of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, including large sample batches, extensive training epochs, etc. We are thus motivated to tackle these issues and establish a simple, efficient, yet competitive baseline of contrastive learning. Specifically, we identify, from theoretical and empirical studies, a noticeable negative-positive-coupling (NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning efficiency concerning the batch size. By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the …

Minimalistic unsupervised representation learning with the sparse manifold transform

Authors

Yubei Chen,Zeyu Yun,Yi Ma,Bruno Olshausen,Yann LeCun

Published Date

2022/9/29

We describe a minimalistic and interpretable method for unsupervised representation learning that does not require data augmentation, hyperparameter tuning, or other engineering designs, but nonetheless achieves performance close to the state-of-the-art (SOTA) SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic (one training epoch) sparse manifold transform, it is possible to achieve KNN top-1 accuracy on MNIST, KNN top-1 accuracy on CIFAR-10, and on CIFAR-100. With simple gray-scale augmentation, the model achieves KNN top-1 accuracy on CIFAR-10 and on CIFAR-100. These results significantly close the gap between simplistic ``white-box'' methods and SOTA methods. We also provide visualization to illustrate how an unsupervised representation transform is formed. The proposed method is closely connected to latent-embedding self-supervised methods and can be treated as the simplest form of VICReg. Though a small performance gap remains between our simple constructive model and SOTA methods, the evidence points to this as a promising direction for achieving a principled and white-box approach to unsupervised representation learning, which has potential to significantly improve learning efficiency.

Separating the world and ego models for self-driving

Authors

Vlad Sobal,Alfredo Canziani,Nicolas Carion,Kyunghyun Cho,Yann LeCun

Journal

arXiv preprint arXiv:2204.07184

Authors

Xiaoxin He,Bryan Hooi,Thomas Laurent,Adam Perold,Yann LeCun,Xavier Bresson

Published Date

2022/9/29

Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we consider an alternative approach to overcome these structural limitations while keeping a low complexity cost. Motivated by the recent MLP-Mixer architecture introduced in computer vision, we propose to generalize this network to graphs. This GNN model, namely Graph MLP-Mixer, can make long-range connections without over-squashing or high complexity due to the mixer layer applied to the graph patches extracted from the original graph. As a result, this architecture exhibits promising results when comparing standard GNNs vs. Graph MLP-Mixers on benchmark graph datasets.

The effects of regularization and data augmentation are class dependent

Authors

Randall Balestriero,Leon Bottou,Yann LeCun

Journal

Advances in Neural Information Processing Systems

Published Date

2022/12/6

Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, ie cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation over all classes leads to disastrous model performances on some classes eg on Imagenet with a resnet50, the``barn spider''classification test accuracy falls from to only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance---averaged over all classes and samples---has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks eg an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from to on class\# 8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that finding a correct measure of a model's complexity without class-dependent preference remains an open research question.

Neural manifold clustering and embedding

Authors

Zengyi Li,Yubei Chen,Yann LeCun,Friedrich T Sommer

Journal

arXiv preprint arXiv:2201.10000

Published Date

2022/1/24

Given a union of non-linear manifolds, non-linear subspace clustering or manifold clustering aims to cluster data points based on manifold structures and also learn to parameterize each manifold as a linear subspace in a feature space. Deep neural networks have the potential to achieve this goal under highly non-linear settings given their large capacity and flexibility. We argue that achieving manifold clustering with neural networks requires two essential ingredients: a domain-specific constraint that ensures the identification of the manifolds, and a learning algorithm for embedding each manifold to a linear subspace in the feature space. This work shows that many constraints can be implemented by data augmentation. For subspace feature learning, Maximum Coding Rate Reduction (MCR) objective can be used. Putting them together yields {\em Neural Manifold Clustering and Embedding} (NMCE), a novel method for general purpose manifold clustering, which significantly outperforms autoencoder-based deep subspace clustering. Further, on more challenging natural image datasets, NMCE can also outperform other algorithms specifically designed for clustering. Qualitatively, we demonstrate that NMCE learns a meaningful and interpretable feature space. As the formulation of NMCE is closely related to several important Self-supervised learning (SSL) methods, we believe this work can help us build a deeper understanding on SSL representation learning.

Tico: Transformation invariance and covariance contrast for self-supervised visual representation learning

Authors

Jiachen Zhu,Rafael M Moraes,Serkan Karakulak,Vlad Sobol,Alfredo Canziani,Yann LeCun

Journal

arXiv preprint arXiv:2206.10698

Published Date

2022/6/21

We present Transformation Invariance and Covariance Contrast (TiCo) for self-supervised visual representation learning. Similar to other recent self-supervised learning methods, our method is based on maximizing the agreement among embeddings of different distorted versions of the same image, which pushes the encoder to produce transformation invariant representations. To avoid the trivial solution where the encoder generates constant vectors, we regularize the covariance matrix of the embeddings from different images by penalizing low rank solutions. By jointly minimizing the transformation invariance loss and covariance contrast loss, we get an encoder that is able to produce useful representations for downstream tasks. We analyze our method and show that it can be viewed as a variant of MoCo with an implicit memory bank of unlimited size at no extra memory cost. This makes our method perform better than alternative methods when using small batch sizes. TiCo can also be seen as a modification of Barlow Twins. By connecting the contrastive and redundancy-reduction methods together, TiCo gives us new insights into how joint embedding methods work.

Deep generative models create new and diverse protein structures

Authors

Zeming Lin,Tom Sercu,Yann LeCun,Alexander Rives

Journal

Machine Learning for Structural Biology Workshop, NeurIPS

Published Date

2021

We explore the use of modern variational autoencoders for generating protein structures. Models are trained across a diverse set of natural protein domains. Threedimensional structures are encoded implicitly in the form of an energy function that expresses constraints on pairwise distances and angles. Atomic coordinates are recovered by optimizing the parameters of a rigid body representation of the protein chain to fit the constraints. The model generates diverse structures across a variety of folds, and exhibits local coherence at the level of secondary structure, generating alpha helices and beta sheets, as well as globally coherent tertiary structure. A number of generated protein sequences have high confidence predictions by AlphaFold that agree with their designs. The majority of these have no significant sequence homology to natural proteins.Most designed proteins are variations on existing proteins. It is of great interest to create de novo proteins that go beyond what has been invented by nature. A line of recent work has explored generative models for protein structures [1, 2, 3, 4, 5, 6]. The main challenge for a generative model is to propose stable structures that can be realized as the minimum energy state for a protein sequence, ie the endpoint of folding. The space of possible three-dimensional conformations of a protein sequence is exponentially large [7], but out of this set of possible conformations, most do not correspond to stable realizable structures.

Deep learning for AI

Authors

Yoshua Bengio,Yann Lecun,Geoffrey Hinton

Journal

Communications of the ACM

Published Date

2021/6/21

Authors

Jure Zbontar,Li Jing,Ishan Misra,Yann LeCun,Stéphane Deny

Published Date

2021/3/4

Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow’s redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

Recurrent parameter generators

Authors

Jiayun Wang,Yubei Chen,Stella Yu,Brian Cheung,Yann LeCun

Published Date

2021/10/6

We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer-size neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. We use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve ImageNet top-1 accuracy. Additionally, such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than times, which still achieves ImageNet top-1 accuracy. Furthermore, the RPG can be further pruned and quantized for better run-time performance in addition to the model size reduction. We provide a new perspective for model compression. Rather than shrinking parameters from a large model, RPG sets a certain parameter-size constraint and uses the gradient descent algorithm to automatically find the best model under the constraint. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator.

Implicit Rank-Minimizing Autoencoder

Authors

Li Jing,Jure Zbontar,Yann LeCun

Published Date

2020/10/1

An important component of autoencoder methods is the method by which the information capacity of the latent representation is minimized or limited. In this work, the rank of the covariance matrix of the codes is implicitly minimized by relying on the fact that gradient descent learning in multi-layer linear networks leads to minimum-rank solutions. By inserting a number of extra linear layers between the encoder and the decoder, the system spontaneously learns representations with a low effective dimension. The model, dubbed Implicit Rank-Minimizing Autoencoder (IRMAE), is simple, deterministic, and learns continuous latent space. We demonstrate the validity of the method on several image generation and representation learning tasks.

métodos, sistemas e mídias para detectar a falsificação em autenticação móvel

Published Date

2020/9/24

RCQGIGSZSJQZDX-KAMYIIQDSA-N 4-[(2Z)-2-(2-hydroxy-4-oxocyclohexa-2, 5-dien-1-ylidene) hydrazinyl] benzenesulfonic acid Chemical compound OC1= CC (= O) C= C\C1= N\NC1= CC= C (S (O)(= O)= O) C= C1 RCQGIGSZSJQZDX-KAMYIIQDSA-N 0.000 description 1

The mind of a mouse

Authors

Larry F Abbott,Davi D Bock,Edward M Callaway,Winfried Denk,Catherine Dulac,Adrienne L Fairhall,Ila Fiete,Kristen M Harris,Moritz Helmstaedter,Viren Jain,Narayanan Kasthuri,Yann LeCun,Jeff W Lichtman,Peter B Littlewood,Liqun Luo,John HR Maunsell,R Clay Reid,Bruce R Rosen,Gerald M Rubin,Terrence J Sejnowski,H Sebastian Seung,Karel Svoboda,David W Tank,Doris Tsao,David C Van Essen

Journal

Cell

Published Date

2020/9/17

Large scientific projects in genomics and astronomy are influential not because they answer any single question but because they enable investigation of continuously arising new questions from the same data-rich sources. Advances in automated mapping of the brain's synaptic connections (connectomics) suggest that the complicated circuits underlying brain function are ripe for analysis. We discuss benefits of mapping a mouse brain at the level of synapses.

System and method for biometric authentication in connection with camera-equipped devices

Published Date

2020/7/28

The present invention relates generally to the use of biometric technology for authentication and identification, and more particularly to non-contact based solutions for authenticating and identifying users, via computers, such as mobile devices, to selectively permit or deny access to various resources. In the present invention authentication and/or identification is performed using an image or a set of images of an individual's palm through a process involving the following key steps:(1) detecting the palm area using local classifiers;(2) extracting features from the region (s) of interest; and (3) computing the matching score against user models stored in a database, which can be augmented dynamically through a learning process.

See List of Professors in Yann LeCun University(New York University)