Eric Xing

Eric Xing

Carnegie Mellon University

H-index: 114

North America-United States

Eric Xing Information

University

Carnegie Mellon University

Position

President at Mohamed bin Zayed University of AI Professor of Computer Science U

Citations(all)

57613

Citations(since 2020)

33465

Cited By

37934

hIndex(all)

114

hIndex(since 2020)

87

i10Index(all)

435

i10Index(since 2020)

340

Email

University Profile Page

Carnegie Mellon University

Eric Xing Skills & Research Interests

Machine Learning

ML Systems

Optimization

Statistics

Network Analysis

Top articles of Eric Xing

Learning to Prompt Segment Anything Models

Authors

Jiaxing Huang,Kai Jiang,Jingyi Zhang,Han Qiu,Lewei Lu,Shijian Lu,Eric Xing

Journal

arXiv preprint arXiv:2401.04651

Published Date

2024/1/9

Segment Anything Models (SAMs) like SEEM and SAM have demonstrated great potential in learning to segment anything. The core design of SAMs lies with Promptable Segmentation, which takes a handcrafted prompt as input and returns the expected segmentation mask. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts), which work together to prompt SAMs to segment anything on downstream datasets. Despite the important role of prompts, how to acquire suitable prompts for SAMs is largely under-explored. In this work, we examine the architecture of SAMs and identify two challenges for learning effective prompts for SAMs. To this end, we propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs. Specifically, SSPrompt introduces spatial prompt learning and semantic prompt learning, which optimize spatial prompts and semantic prompts directly over the embedding space and selectively leverage the knowledge encoded in pre-trained prompt encoders. Extensive experiments show that SSPrompt achieves superior image segmentation performance consistently across multiple widely adopted datasets.

Judging llm-as-a-judge with mt-bench and chatbot arena

Authors

Lianmin Zheng,Wei-Lin Chiang,Ying Sheng,Siyuan Zhuang,Zhanghao Wu,Yonghao Zhuang,Zi Lin,Zhuohan Li,Dacheng Li,Eric Xing,Hao Zhang,Joseph E Gonzalez,Ion Stoica

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. We then verify the agreement between LLM judges and human preferences by introducing two benchmarks: MT-bench, a multi-turn question set; and Chatbot Arena, a crowdsourced battle platform. Our results reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80\% agreement, the same level of agreement between humans. Hence, LLM-as-a-judge is a scalable and explainable way to approximate human preferences, which are otherwise very expensive to obtain. Additionally, we show our benchmark and traditional benchmarks complement each other by evaluating several variants of LLaMA and Vicuna. The MT-bench questions, 3K expert votes, and 30K conversations with human preferences are publicly available at https://github. com/lm-sys/FastChat/tree/main/fastchat/llm_judge.

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Authors

Omkar Thawakar,Ashmal Vayani,Salman Khan,Hisham Cholakal,Rao M Anwer,Michael Felsberg,Tim Baldwin,Eric P Xing,Fahad Shahbaz Khan

Journal

arXiv preprint arXiv:2402.16840

Published Date

2024/2/26

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes is available at : https://github.com/mbzuai-oryx/MobiLlama.

Temporally Disentangled Representation Learning under Unknown Nonstationarity

Authors

Xiangchen Song,Weiran Yao,Yewen Fan,Xinshuai Dong,Guangyi Chen,Juan Carlos Niebles,Eric Xing,Kun Zhang

Journal

NeurIPS 2023

Published Date

2023/10/28

In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (eg, class labels and/or domain indexes) as side information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios. In this study, we further explored the Markov Assumption under time-delayed causally related process in nonstationary setting and showed that under mild conditions, the independent latent components can be recovered from their nonlinear mixture up to a permutation and a component-wise transformation, without the observation of auxiliary variables. We then introduce NCTRL, a principled estimation framework, to reconstruct time-delayed latent causal variables and identify their relations from measured sequential data only. Empirical evaluations demonstrated the reliable identification of time-delayed latent causal influences, with our methodology substantially outperforming existing baselines that fail to exploit the nonstationarity adequately and then, consequently, cannot distinguish distribution shifts.

Cappy: Outperforming and boosting large multi-task lms with a small scorer

Authors

Bowen Tan,Yun Zhu,Lijuan Liu,Eric Xing,Zhiting Hu,Jindong Chen

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

Large language models (LLMs) such as T0, FLAN, and OPT-IML excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer,\textit {Cappy}, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin …

AttentionPert: Accurately Modeling Multiplexed Genetic Perturbations with Multi-scale Effects

Authors

Ding Bai,Caleb Ellington,Shentong Mo,Le Song,Eric Xing

Journal

bioRxiv

Published Date

2024/2/7

Genetic perturbations (i.e. knockouts, variants) have laid the foundation for our understanding of many diseases, implicating pathogenic mechanisms and indicating therapeutic targets. However, experimental assays are fundamentally limited in the number of perturbation conditions they can measure. Computational methods can fill this gap by predicting perturbation effects under unseen conditions, but accurately predicting the transcriptional responses of cells to unseen perturbations remains a significant challenge. We address this by developing a novel attention-based neural network, AttentionPert, which accurately predicts gene expression under multiplexed perturbations and generalizes to unseen conditions. AttentionPert integrates global and local effects in a multi-scale model, representing both the non-uniform system-wide impact of the genetic perturbation and the localized disturbance in a network of gene-gene similarities, enhancing its ability to predict nuanced transcriptional responses to both single and multi-gene perturbations. In comprehensive experiments, AttentionPert demonstrates superior performance across multiple datasets outperforming the state-of-theart method in predicting differential gene expressions and revealing novel gene regulations. AttentionPert marks a significant improvement over current methods, particularly in handling the diversity of gene perturbations and in predicting out-of-distribution scenarios.

Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective

Authors

Zeyuan Yin*,Eric Xing,Zhiqiang Shen*

Journal

Advances in Neural Information Processing Systems, Spotlight

Published Date

2024/2/13

We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for efficient dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution synthesis, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5\% and 60.8\% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5\% and 32.9\%, respectively. Our approach also surpasses MTT in terms of speed by approximately 52(ConvNet-4) and 16(ResNet-18) faster with less memory consumption of 11.6 and 6.4 during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://github. com/VILA-Lab/SRe2L.

Generating, Reconstructing, and Representing Discrete and Continuous Data: Generalized Diffusion with Learnable Encoding-Decoding

Authors

Guangyi Liu,Yu Wang,Zeyu Feng,Qiyu Wu,Liping Tang,Yuan Gao,Zhen Li,Shuguang Cui,Julian McAuley,Eric P Xing,Zichao Yang,Zhiting Hu

Journal

arXiv preprint arXiv:2402.19009

Published Date

2024/2/29

The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), autoregressive models, and diffusion models, generally excel in specific capabilities and data types but fall short in others. We introduce generalized diffusion with learnable encoder-decoder (DiLED), that seamlessly integrates the core capabilities for broad applicability and enhanced performance. DiLED generalizes the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding. Crucially, DiLED is compatible with the well-established diffusion model objective and training recipes, allowing effective learning of the encoder-decoder parameters jointly with diffusion. By choosing appropriate encoder/decoder (e.g., large language models), DiLED naturally applies to different data types. Extensive experiments on text, proteins, and images demonstrate DiLED's flexibility to handle diverse data and tasks and its strong improvement over various existing models.

Toward Inference-optimal Mixture-of-Expert Large Language Models

Authors

Longfei Yun,Yonghao Zhuang,Yao Fu,Eric P Xing,Hao Zhang

Journal

arXiv preprint arXiv:2404.02852

Published Date

2024/4/3

Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens? We study the scaling law of MoE-based LLMs regarding the relations between the model performance, model size, dataset size, and the expert degree. Echoing previous research studying MoE in different contexts, we observe the diminishing return of increasing the number of experts, but this seems to suggest we should scale the number of experts until saturation, as the training cost would remain constant, which is problematic during inference time. We propose to amend the scaling law of MoE by introducing inference efficiency as another metric besides the validation loss. We find that MoEs with a few (4/8) experts are the most serving efficient solution under the same performance, but costs 2.5-3.5x more in training. On the other hand, training a (16/32) expert MoE much smaller (70-85%) than the loss-optimal solution, but with a larger training dataset is a promising setup under a training budget.

Semantic-aligned matching for enhanced detr convergence and multi-scale feature fusion

Authors

Gongjie Zhang,Zhipeng Luo,Jiaxing Huang,Shijian Lu,Eric P Xing

Journal

International Journal of Computer Vision (IJCV)

Published Date

2024

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR’s slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR’s convergence and improve detection performance. The core of SAM-DETR++ is a plug-and-play module that projects object queries and encoded image features into the same feature embedding space, where each object query can be easily matched to relevant regions with similar semantics. Besides, SAM-DETR++ searches for multiple representative keypoints and …

Making scalable meta learning practical

Authors

Sang Choe,Sanket Vaibhav Mehta,Hwijeen Ahn,Willie Neiswanger,Pengtao Xie,Emma Strubell,Eric Xing

Journal

Advances in neural information processing systems

Published Date

2024/2/13

Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (ie,\learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8 x increase in throughput and 2.0/3.8 x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small-and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.

Efficient Test-Time Adaptation of Vision-Language Models

Authors

Adilbek Karmanov,Dayan Guan,Shijian Lu,Abdulmotaleb El Saddik,Eric Xing

Journal

arXiv preprint arXiv:2403.18293

Published Date

2024/3/27

Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of-the-art. The code has been released in \url{https://kdiaaa.github.io/tda/}.

Trustllm: Trustworthiness in large language models

Authors

Lichao Sun,Yue Huang,Haoran Wang,Siyuan Wu,Qihui Zhang,Chujie Gao,Yixin Huang,Wenhan Lyu,Yixuan Zhang,Xiner Li,Zhengliang Liu,Yixin Liu,Yijue Wang,Zhikun Zhang,Bhavya Kailkhura,Caiming Xiong,Chao Zhang,Chaowei Xiao,Chunyuan Li,Eric Xing,Furong Huang,Hao Liu,Heng Ji,Hongyi Wang,Huan Zhang,Huaxiu Yao,Manolis Kellis,Marinka Zitnik,Meng Jiang,Mohit Bansal,James Zou,Jian Pei,Jian Liu,Jianfeng Gao,Jiawei Han,Jieyu Zhao,Jiliang Tang,Jindong Wang,John Mitchell,Kai Shu,Kaidi Xu,Kai-Wei Chang,Lifang He,Lifu Huang,Michael Backes,Neil Zhenqiang Gong,Philip S Yu,Pin-Yu Chen,Quanquan Gu,Ran Xu,Rex Ying,Shuiwang Ji,Suman Jana,Tianlong Chen,Tianming Liu,Tianyi Zhou,Willian Wang,Xiang Li,Xiangliang Zhang,Xiao Wang,Xing Xie,Xun Chen,Xuyu Wang,Yan Liu,Yanfang Ye,Yinzhi Cao,Yue Zhao

Journal

arXiv preprint arXiv:2401.05561

Published Date

2024/1/10

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize …

Identification of nonlinear latent hierarchical models

Authors

Lingjing Kong,Biwei Huang,Feng Xie,Eric Xing,Yuejie Chi,Kun Zhang

Published Date

2023/12

Identifying latent variables and causal structures from observational data is essential to many real-world applications involving biological data, medical data, and unstructured data such as images and languages. However, this task can be highly challenging, especially when observed variables are generated by causally related latent variables and the relationships are nonlinear. In this work, we investigate the identification problem for nonlinear latent hierarchical causal models in which observed variables are generated by a set of causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of causal structures and latent variables (up to invertible transformations) can be achieved under mild assumptions: on causal structures, we allow for multiple paths between any pair of variables in the graph, which relaxes latent tree assumptions in prior work; on structural functions, we permit general nonlinearity and multi-dimensional continuous variables, alleviating existing work's parametric assumptions. Specifically, we first develop an identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models.

Counterfactual generation with identifiability guarantees

Authors

Hanqi Yan,Lingjing Kong,Lin Gui,Yuejie Chi,Eric Xing,Yulan He,Kun Zhang

Journal

Advances in Neural Information Processing Systems

Published Date

2023/12

Counterfactual generation lies at the core of various machine learning tasks, including image translation and controllable text generation. This generation process usually requires the identification of the disentangled latent representations, such as content and style, that underlie the observed data. However, it becomes more challenging when faced with a scarcity of paired data and labelling information. Existing disentangled methods crucially rely on oversimplified assumptions, such as assuming independent content and style variables, to identify the latent variables, even though such assumptions may not hold for complex data distributions. For instance, food reviews tend to involve words like “tasty”, whereas movie reviews commonly contain words such as “thrilling” for the same positive sentiment. This problem is exacerbated when data are sampled from multiple domains since the dependence between content and style may vary significantly over domains. In this work, we tackle the domain-varying dependence between the content and the style variables inherent in the counterfactual generation task. We provide identification guarantees for such latent-variable models by leveraging the relative sparsity of the influences from different latent variables. Our theoretical insights enable the development of a doMain AdapTive counTerfactual gEneration model, called (MATTE). Our theoretically grounded framework achieves state-of-the-art performance in unsupervised style transfer tasks, where neither paired data nor style labels are utilized, across four large-scale datasets.

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Authors

Zhenting Qi,Hanlin Zhang,Eric Xing,Sham Kakade,Himabindu Lakkaraju

Journal

arXiv preprint arXiv:2402.17840

Published Date

2024/2/27

Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. Extending our study to production RAG models GPTs, we design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41% from a book of 77,000 words and 3% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves.

Defending Against Poisoning Attacks in Federated Learning with Blockchain

Authors

Nanqing Dong,Zhipeng Wang,Jiahao Sun,Michael Kampffmeyer,William Knottenbelt,Eric Xing

Journal

IEEE Transactions on Artificial Intelligence

Published Date

2024/3/18

In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust …

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

Authors

Loka Li,Guangyi Chen,Yusheng Su,Zhenhao Chen,Yixuan Zhang,Eric Xing,Kun Zhang

Journal

arXiv preprint arXiv:2402.12563

Published Date

2024/2/19

The recent success of Large Language Models (LLMs) has catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs, attempting to address the ongoing debate about its feasibility. Our research has identified an important latent factor - the ``confidence'' of LLMs - during the self-correction process. Overlooking this factor may cause the models to over-criticize themselves, resulting in unreliable conclusions regarding the efficacy of self-correction. We have experimentally observed that LLMs possess the capability to understand the ``confidence'' in their own responses. It motivates us to develop an ``If-or-Else'' (IoE) prompting framework, designed to guide LLMs in assessing their own ``confidence'', facilitating intrinsic self-corrections. We conduct extensive experiments and demonstrate that our IoE-based Prompt can achieve a consistent improvement regarding the accuracy of self-corrected responses over the initial answers. Our study not only sheds light on the underlying factors affecting self-correction in LLMs, but also introduces a practical framework that utilizes the IoE prompting principle to efficiently improve self-correction capabilities with ``confidence''. The code is available at \url{https://github.com/MBZUAI-CLeaR/IoE-Prompting.git}.

FedNAR: Federated Optimization with Normalized Annealing Regularization

Authors

Junbo Li,Ang Li,Chong Tian,Qirong Ho,Eric Xing,Hongyi Wang

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization}(FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at\href {https://anonymous …

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

Authors

Jiahui Zhang,Fangneng Zhan,Muyu Xu,Shijian Lu,Eric Xing

Journal

arXiv preprint arXiv:2403.06908

Published Date

2024/3/11

3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (e.g., Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently.

GET: a foundation model of transcription across human cell types

Authors

Xi Fu,Shentong Mo,Anqi Shao,Anouchka Laurent,Alejandro Buendia,Adolfo A Ferrando,Alberto Ciccia,Yanyan Lan,Teresa Palomero,David M Owens,Eric P Xing,Raul Rabadan

Journal

bioRxiv

Published Date

2023

Transcriptional regulation, involving the complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcriptions lack generalizability to accurately extrapolate in unseen cell types and conditions. Here, we introduce GET, an interpretable foundation model, designed to uncover regulatory grammars across 213 human fetal and adult cell types. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types. GET showcases remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovering universal and cell type specific transcription factor interaction networks. We evaluated its performance on prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors. Specifically, we show GET outperforms current models in predicting lentivirus-based massive parallel reporter assay readout with reduced input data. In Fetal erythroblast, we identify distal (>1Mbp) regulatory regions that were missed by previous models. In B cell, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a lymphoma-risk predisposing germline mutation. In sum, we provide a generalizable and accurate model for transcription together with catalogs of gene regulation and transcription factor interactions, all with cell type specificity. A …

On optimizing the communication of model parallelism

Authors

Yonghao Zhuang,Lianmin Zheng,Zhuohan Li,Eric Xing,Qirong Ho,Joseph Gonzalez,Ion Stoica,Hao Zhang,Hexu Zhao

Journal

Proceedings of Machine Learning and Systems

Published Date

2023/3/18

We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism–intra-operator and inter-operator parallelism–are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh, on which the tensor may be distributed with the same or different layouts. We formalize this as a many-to-many multicast communication problem, and show that existing approaches either are sub-optimal or do not generalize to different network topologies or tensor layouts, which result from different model architectures and parallelism strategies. We then propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an “overlapping-friendly" pipeline schedule. On microbenchmarks, our overall system outperforms existing ones by up to 10x across various tensor and mesh layouts. On end-to-end training of two large models, GPT-3 and U-Transformer, we improve throughput by 10% and 50%, respectively.

Defending Against Malicious Behaviors in Federated Learning with Blockchain

Authors

Nanqing Dong,Zhipeng Wang,Jiahao Sun,Michael Kampffmeyer,William Knottenbelt,Eric Xing

Journal

IEEE Transactions on Artificial Intelligence

Published Date

2024/3/18

In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust …

Slimpajama-dc: Understanding data combinations for llm training

Authors

Zhiqiang Shen,Tianhua Tao,Liqun Ma,Willie Neiswanger,Joel Hestness,Natalia Vassilieva,Daria Soboleva,Eric Xing

Journal

arXiv preprint arXiv:2309.10818

Published Date

2023/9/19

This paper aims to understand the impacts of various data combinations (e.g., web text, wikipedia, github, books) on the training of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T tokens RedPajama dataset contributed by Together. We've termed our research as SlimPajama-DC, an empirical analysis designed to uncover fundamental characteristics and best practices associated with employing SlimPajama in the training of large language models. During our research with SlimPajama, two pivotal observations emerged: (1) Global deduplication vs. local deduplication. We analyze and discuss how global (across different sources of datasets) and local (within the single source of dataset) deduplications affect the performance of trained models. (2) Proportions of high-quality/highly-deduplicated multi-source datasets in the combination. To study this, we construct six configurations of SlimPajama dataset and train individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best configuration outperforms the 1.3B model trained on RedPajama using the same number of training tokens by a significant margin. All our 1.3B models are trained on Cerebras 16 CS-2 cluster with a total of 80 PFLOP/s in bf16 mixed precision. We further extend our discoveries (such as increasing data diversity is crucial after global deduplication) on a 7B model with large batch-size training. Our models and the separate SlimPajama-DC datasets are available at: https://huggingface.co/MBZUAI-LLM and https …

SegMix: A Simple Structure-Aware Data Augmentation Method

Authors

Yuxin Pei,Pushkar Bhuse,Zhengzhong Liu,Eric Xing

Journal

arXiv preprint arXiv:2311.09505

Published Date

2023/11/16

Interpolation-based Data Augmentation (DA) methods (Mixup) linearly interpolate the inputs and labels of two or more training examples. Mixup has more recently been adapted to the field of Natural Language Processing (NLP), mainly for sequence labeling tasks. However, such a simple adoption yields mixed or unstable improvements over the baseline models. We argue that the direct-adoption methods do not account for structures in NLP tasks. To this end, we propose SegMix, a collection of interpolation-based DA algorithms that can adapt to task-specific structures. SegMix poses fewer constraints on data structures, is robust to various hyperparameter settings, applies to more task settings, and adds little computational overhead. In the algorithm's core, we apply interpolation methods on task-specific meaningful segments, in contrast to applying them on sequences as in prior work. We find SegMix to be a flexible framework that combines rule-based DA methods with interpolation-based methods, creating interesting mixtures of DA techniques. We show that SegMix consistently improves performance over strong baseline models in Named Entity Recognition (NER) and Relation Extraction (RE) tasks, especially under data-scarce settings. Furthermore, this method is easy to implement and adds negligible training overhead.

Memoization-Aware Bayesian Optimization for AI Pipelines with Unknown Costs

Authors

Abdelmajid Essofi,Ridwan Salahuddeen,Munachiso S Nwadike,Navish Kumar,Kun Zhang,Eric Xing,Willie Neiswanger,Qirong Ho

Published Date

2023/10/13

Bayesian optimization (BO) is an effective approach for optimizing expensive black-box functions via potentially noisy function evaluations. However, few BO techniques address the cost-aware setting, in which different samples impose different costs on the optimizer, particularly when costs are initially unknown. This cost-aware BO setting is of special interest in tuning multi-stage AI pipelines, in which we could apply caching techniques to store and reuse early-stage outputs in favor of optimizing later stages, without incurring the costs of re-running the full pipeline. In this paper, we propose the Expected-Expected Improvement Per Unit Cost (EEIPU), a novel extension to the Expected Improvement (EI) acquisition function that adapts to unknown costs in multi-stage pipelines. EEIPU fits individual Gaussian Process (GP) models for each stage's cost data and manages the different cost regions of the search space, while balancing exploration-exploitation trade-offs. Additionally, EEIPU incorporates early-stage memoization, reducing redundant computations and costs by reusing the results of earlier stages, allowing for more iterations than existing approaches within the specified budget. In the cost-aware setting, EEIPU significantly outperforms comparable methods when tested on both synthetic and real pipelines, returning higher objective function values at lower total execution costs. This offers a significant advancement in cost-aware BO for optimizing multi-stage machine learning pipelines.

A Study on the Calibration of In-context Learning

Authors

Hanlin Zhang,Yi-Fan Zhang,Yaodong Yu,Dhruv Madeka,Dean Foster,Eric Xing,Hima Lakkaraju,Sham Kakade

Journal

arXiv preprint arXiv:2312.04021

Published Date

2023/12/7

Modern auto-regressive language models are trained to minimize log loss on broad data by predicting the next token so they are expected to get calibrated answers when framing a problem as a next-token prediction task. We study this for in-context learning (ICL), a widely used way to adapt frozen large language models (LLMs) via crafting prompts, and investigate the trade-offs between performance and calibration on a wide range of natural language understanding and reasoning tasks. We conduct extensive experiments to show that such trade-offs may get worse as we increase model size, incorporate more ICL examples, and fine-tune models using instruction, dialog, or reinforcement learning from human feedback (RLHF) on carefully curated datasets. Furthermore, we find that common recalibration techniques that are widely effective such as temperature scaling provide limited gains in calibration errors, suggesting that new methods may be required for settings where models are expected to be reliable.

3d semantic segmentation in the wild: Learning generalized models for adverse-condition point clouds

Authors

Aoran Xiao,Jiaxing Huang,Weihao Xuan,Ruijie Ren,Kangcheng Liu,Dayan Guan,Abdulmotaleb El Saddik,Shijian Lu,Eric P Xing

Published Date

2023

Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We investigate universal 3DSS modeling with two tasks: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalized 3DSS that learns a generalizable model from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their encoded embeddings, ultimately leading to a generalizable model that effectively improves 3DSS under various adverse weather. The SemanticSTF and related codes are available at https://github. com/xiaoaoran/SemanticSTF.

One-for-all: Generalized lora for parameter-efficient fine-tuning

Authors

Arnav Chavan,Zhuang Liu,Deepak Gupta,Eric Xing,Zhiqiang Shen

Journal

arXiv preprint arXiv:2306.07967

Published Date

2023/6/13

We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks, achieving superior accuracy with fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2 also show considerable enhancements compared to the original LoRA in the language domain. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code and models are available at: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.

Memory-adaptive depth-wise heterogenous federated learning

Authors

Kai Zhang,Yutong Dai,Hongyi Wang,Eric Xing,Xun Chen,Lichao Sun

Journal

arXiv preprint arXiv:2303.04887

Published Date

2023/3/8

Federated learning is a promising paradigm that allows multiple clients to collaboratively train a model without sharing the local data. However, the presence of heterogeneous devices in federated learning, such as mobile phones and IoT devices with varying memory capabilities, would limit the scale and hence the performance of the model could be trained. The mainstream approaches to address memory limitations focus on width-slimming techniques, where different clients train subnetworks with reduced widths locally and then the server aggregates the subnetworks. The global model produced from these methods suffers from performance degradation due to the negative impact of the actions taken to handle the varying subnetwork widths in the aggregation phase. In this paper, we introduce a memory-adaptive depth-wise learning solution in FL called FeDepth, which adaptively decomposes the full model into blocks according to the memory budgets of each client and trains blocks sequentially to obtain a full inference model. Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively. We also demonstrate the effectiveness of depth-wise fine-tuning on ViT. Our findings highlight the importance of memory-aware techniques for federated learning with heterogeneous devices and the success of depth-wise training strategy in improving the global model's performance.

Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models

Authors

Neha Sengupta,Sunil Kumar Sahu,Bokang Jia,Satheesh Katipomu,Haonan Li,Fajri Koto,Osama Mohammed Afzal,Samta Kamboj,Onkar Pandit,Rahul Pal,Lalit Pradhan,Zain Muhammad Mujahid,Massa Baali,Alham Fikri Aji,Zhengzhong Liu,Andy Hock,Andrew Feldman,Jonathan Lee,Andrew Jackson,Preslav Nakov,Timothy Baldwin,Eric Xing

Journal

arXiv preprint arXiv:2308.16149

Published Date

2023/8/30

We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning capabilities in Arabic than any existing open Arabic and multilingual models by a sizable margin, based on extensive evaluation. Moreover, the models are competitive in English compared to English-centric open models of similar size, despite being trained on much less English data. We provide a detailed description of the training, the tuning, the safety alignment, and the evaluation of the models. We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs. Available at https://huggingface.co/inception-mbzuai/jais-13b-chat

Promptagent: Strategic planning with language models enables expert-level prompt optimization

Authors

Xinyuan Wang,Chenxi Li,Zhen Wang,Fan Bai,Haotian Luo,Jiayou Zhang,Nebojsa Jojic,Eric P Xing,Zhiting Hu

Journal

arXiv preprint arXiv:2310.16427

Published Date

2023/10/25

Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great …

Liteformer: Lightweight Evoformer for Protein Structure Prediction

Authors

Ning Sun,Xingyi Cheng,Shentong Mo,Chiming Liu,Hui Li,Eric Xing,Le Song

Published Date

2023/10/13

AlphaFold2 has achieved seminal success in predicting structures from amino acid sequences with remarkable atomic accuracy. However, its Evoformer module faces a critical challenge in terms of high memory consumption, particularly concerning the computational complexity associated with sequence length and the number of Multiple Sequence Alignments (MSA), denoted as . This challenge arises from the attention mechanism involving third-order MSA and pair-wise tensors, leading to a complexity of . This memory bottleneck poses difficulties when working with lengthy protein sequences. To tackle this problem, we introduce a novel and lightweight variant of Evoformer named Liteformer. Liteformer employs an innovative attention linearization mechanism, reducing complexity to through the implementation of a bias-aware flow attention mechanism, which seamlessly integrates MSA sequences and pair-wise information. Our extensive experiments, conducted on both monomeric and multimeric benchmark datasets, showcase the efficiency gains of our framework. Specifically, compared with Evoformer, Liteformer achieves up to a 44\% reduction in memory usage and a 23\% acceleration in training speed, all while maintaining competitive accuracy in protein structure prediction.

Weakly supervised 3d open-vocabulary segmentation

Authors

Kunhao Liu,Fangneng Zhan,Jiahui Zhang,Muyu Xu,Yingchen Yu,Abdulmotaleb El Saddik,Christian Theobalt,Eric Xing,Shijian Lu

Journal

Advances in Neural Information Processing Systems

Published Date

2023/12/15

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation. A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations in certain scenes, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs. Code is available at https://github. com/Kunhao-Liu/3D-OVS.

Kd-dlgan: Data limited image generation via knowledge distillation

Authors

Kaiwen Cui,Yingchen Yu,Fangneng Zhan,Shengcai Liao,Shijian Lu,Eric Xing

Published Date

2023/3/30

Generative Adversarial Networks (GANs) rely heavily on large-scale training data for training high-quality image generation models. With limited training data, the GAN discriminator often suffers from severe overfitting which directly leads to degraded generation especially in generation diversity. Inspired by the recent advances in knowledge distillation (KD), we propose KD-GAN, a knowledge-distillation based generation framework that introduces pre-trained vision-language models for training effective data-limited image generation models. KD-GAN consists of two innovative designs. The first is aggregated generative KD that mitigates the discriminator overfitting by challenging the discriminator with harder learning tasks and distilling more generalizable knowledge from the pre-trained models. The second is correlated generative KD that improves the generation diversity by distilling and preserving the diverse image-text correlation within the pre-trained models. Extensive experiments over multiple benchmarks show that KD-GAN achieves superior image generation with limited training data. In addition, KD-GAN complements the state-of-the-art with consistent and substantial performance gains. Note that codes will be released.

Federated learning as variational inference: A scalable expectation propagation approach

Authors

Han Guo,Philip Greengard,Hongyi Wang,Andrew Gelman,Yoon Kim,Eric P Xing

Journal

arXiv preprint arXiv:2302.04228

Published Date

2023/2/8

The canonical formulation of federated learning treats it as a distributed optimization problem where the model parameters are optimized against a global loss function that decomposes across client loss functions. A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shedivat et al., 2021). This paper extends the inference view and describes a variational inference formulation of federated learning where the goal is to find a global variational posterior that well-approximates the true posterior. This naturally motivates an expectation propagation approach to federated learning (FedEP), where approximations to the global posterior are iteratively refined through probabilistic message-passing between the central server and the clients. We conduct an extensive empirical study across various algorithmic considerations and describe practical strategies for scaling up expectation propagation to the modern federated setting. We apply FedEP on standard federated learning benchmarks and find that it outperforms strong baselines in terms of both convergence speed and accuracy.

3d open-vocabulary segmentation with foundation models

Authors

Kunhao Liu,Fangneng Zhan,Jiahui Zhang,Muyu Xu,Yingchen Yu,Abdulmotaleb El Saddik,Christian Theobalt,Eric Xing,Shijian Lu

Journal

arXiv preprint arXiv:2305.14093

Published Date

2023/5/23

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature significantly as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting the open-vocabulary multimodal knowledge and object reasoning capability of pre-trained foundation models CLIP and DINO, without necessitating any fine-tuning. Specifically, we distill open-vocabulary visual and textual knowledge from CLIP into a neural radiance field (NeRF) which effectively lifts 2D features into view-consistent 3D segmentation. Furthermore, we introduce the Relevancy-Distribution Alignment loss and Feature-Distribution Alignment loss to respectively mitigate the ambiguities of CLIP features and distill precise object boundaries from DINO features, eliminating the need for segmentation annotations during training. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs.

Iterative graph self-distillation

Authors

Hanlin Zhang,Shuai Lin,Weiyang Liu,Pan Zhou,Jian Tang,Xiaodan Liang,Eric P Xing

Journal

IEEE Transactions on Knowledge and Data Engineering (arXiv preprint arXiv:2010.12609)

Published Date

2023

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs. To address this, we propose a method called Iterative Graph Self-Distillation (IGSD) which learns graph-level representation in an unsupervised manner through instance discrimination using a self-supervised contrastive learning approach. IGSD involves a teacher-student distillation process that uses graph diffusion augmentations and constructs the teacher model using an exponential moving average of the student model. The intuition behind IGSD is to predict the teacher network representation of the graph pairs under different augmented views. As a natural extension, we also apply IGSD to semi-supervised scenarios by jointly regularizing the network with both supervised and self-supervised contrastive loss. Finally, we show that fine-tuning the IGSD-trained models with self-training can further improve …

Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

Authors

Caleb Ellington,Jannik Deuschel,Ben Lengerich,Yingtao Luo,Pascal Friederich,Eric Xing

Published Date

2023/10/13

Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models \textit{on-demand} as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units (% AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients (% AUROC vs. previous SOTA). With this improvement in predictive performance, CPR …

Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

Authors

Bowen Tan,Yun Zhu,Lijuan Liu,Hongyi Wang,Yonghao Zhuang,Jindong Chen,Eric Xing,Zhiting Hu

Journal

arXiv preprint arXiv:2310.16355

Published Date

2023/10/25

The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Redco, a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex …

Linker-Tuning: Optimizing Continuous Prompts for Heterodimeric Protein Prediction

Authors

Shuxian Zou,Hui Li,Shentong Mo,Xingyi Cheng,Eric Xing,Le Song

Journal

arXiv preprint arXiv:2312.01186

Published Date

2023/12/2

Predicting the structure of interacting chains is crucial for understanding biological systems and developing new drugs. Large-scale pre-trained Protein Language Models (PLMs), such as ESM2, have shown impressive abilities in extracting biologically meaningful representations for protein structure prediction. In this paper, we show that ESMFold, which has been successful in computing accurate atomic structures for single-chain proteins, can be adapted to predict the heterodimer structures in a lightweight manner. We propose Linker-tuning, which learns a continuous prompt to connect the two chains in a dimer before running it as a single sequence in ESMFold. Experiment results show that our method successfully predicts 56.98% of interfaces on the i.i.d. heterodimer test set, with an absolute improvement of +12.79% over the ESMFold-Linker baseline. Furthermore, our model can generalize well to the out-of-distribution (OOD) test set HeteroTest2 and two antibody test sets Fab and Fv while being faster than AF-Multimer.

Understanding masked autoencoders via hierarchical latent variable models

Authors

Lingjing Kong,Martin Q Ma,Guangyi Chen,Eric P Xing,Yuejie Chi,Louis-Philippe Morency,Kun Zhang

Published Date

2023

Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empirical insights and provide theoretical guarantees of MAE. We formulate the underlying data-generating process as a hierarchical latent variable model, and show that under reasonable assumptions, MAE provably identifies a set of latent variables in the hierarchical model, explaining why MAE can extract high-level information from pixels. Further, we show how key hyperparameters in MAE (the masking ratio and the patch size) determine which true latent variables to be recovered, therefore influencing the level of semantic information in the representation. Specifically, extremely large or small masking ratios inevitably lead to low-level representations. Our theory offers coherent explanations of existing empirical observations and provides insights for potential empirical improvements and fundamental limitations of the masked-reconstruction paradigm. We conduct extensive experiments to validate our theoretical insights.

Does compressing activations help model parallel training?

Authors

Song Bian,Dacheng Li,Hongyi Wang,Eric P Xing,Shivaram Venkataraman

Journal

arXiv preprint arXiv:2301.02654

Published Date

2023/1/6

Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression in a model-parallel setting is an understudied area. We have discovered that model parallelism has fundamentally different characteristics than data parallelism. In this work, we present the first empirical study on the effectiveness of compression methods for model parallelism. We implement and evaluate three common classes of compression algorithms - pruning-based, learning-based, and quantization-based - using a popular Transformer training framework. We evaluate these methods across more than 160 settings and 8 popular datasets, taking into account different hyperparameters, hardware, and both fine-tuning and pre-training stages. We also provide analysis when the model is scaled up. Finally, we provide insights for future development of model parallelism compression algorithms.

Improved logical reasoning of language models via differentiable symbolic programming

Authors

Hanlin Zhang,Jiani Huang,Ziyang Li,Mayur Naik,Eric Xing

Journal

arXiv preprint arXiv:2305.03742

Published Date

2023/5/5

Pre-trained large language models (LMs) struggle to perform logical reasoning reliably despite advances in scale and compositionality. In this work, we tackle this challenge through the lens of symbolic programming. We propose DSR-LM, a Differentiable Symbolic Reasoning framework where pre-trained LMs govern the perception of factual knowledge, and a symbolic module performs deductive reasoning. In contrast to works that rely on hand-crafted logic rules, our differentiable symbolic reasoning framework efficiently learns weighted rules and applies semantic loss to further improve LMs. DSR-LM is scalable, interpretable, and allows easy integration of prior knowledge, thereby supporting extensive symbolic programming to robustly derive a logical conclusion. The results of our experiments suggest that DSR-LM improves the logical reasoning abilities of pre-trained language models, resulting in a significant increase in accuracy of over 20% on deductive reasoning benchmarks. Furthermore, DSR-LM outperforms a variety of competitive baselines when faced with systematic changes in sequence length.

Autonomous industrial process control system and method that provides autonomous retraining of forecast model

Published Date

2023/8/8

The current disclosure is directed towards system and method for controlling industrial process. In one example, a method comprising deploying a forecast model for controlling an industrial process with training configurations that can be used as a single point of truth for guiding training and retraining versions of the forecast model using a model training algorithm without human input. The retraining and redeployment of the forecast model may be triggered when the performance of the forecast model degrades.

Lightseq: Sequence level parallelism for distributed training of long context transformers

Authors

Dacheng Li,Rulin Shao,Anze Xie,Eric P Xing,Joseph E Gonzalez,Ion Stoica,Xuezhe Ma,Hao Zhang

Journal

arXiv preprint arXiv:2310.03294

Published Date

2023/10/5

Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.

Contextualized machine learning

Authors

Benjamin Lengerich,Caleb N Ellington,Andrea Rubbi,Manolis Kellis,Eric P Xing

Journal

arXiv preprint arXiv:2310.11340

Published Date

2023/10/17

We examine Contextualized Machine Learning (ML), a paradigm for learning heterogeneous and context-dependent effects. Contextualized ML estimates heterogeneous functions by applying deep learning to the meta-relationship between contextual information and context-specific parametric models. This is a form of varying-coefficient modeling that unifies existing frameworks including cluster analysis and cohort modeling by introducing two reusable concepts: a context encoder which translates sample context into model parameters, and sample-specific model which operates on sample predictors. We review the process of developing contextualized models, nonparametric inference from contextualized models, and identifiability conditions of contextualized models. Finally, we present the open-source PyTorch package ContextualizedML.

Llm360: Towards fully transparent open-source llms

Authors

Zhengzhong Liu,Aurick Qiao,Willie Neiswanger,Hongyi Wang,Bowen Tan,Tianhua Tao,Junbo Li,Yuqi Wang,Suqi Sun,Omkar Pangarkar,Richard Fan,Yi Gu,Victor Miller,Yonghao Zhuang,Guowei He,Haonan Li,Fajri Koto,Liping Tang,Nikhil Ranjan,Zhiqiang Shen,Xuguang Ren,Roberto Iriondo,Cun Mu,Zhiting Hu,Mark Schulze,Preslav Nakov,Tim Baldwin,Eric P Xing

Journal

arXiv preprint arXiv:2312.06550

Published Date

2023/12/11

The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder progress in the field by degrading transparency into the training of LLMs and forcing teams to rediscover many details in the training process. We present LLM360, an initiative to fully open-source LLMs, which advocates for all training code and data, model checkpoints, and intermediate results to be made available to the community. The goal of LLM360 is to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible by everyone. As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses (at https://www.llm360.ai). We are committed to continually pushing the boundaries of LLMs through this open-source effort. More large-scale and stronger models are underway and will be released in the future.

Stylerf: Zero-shot 3d style transfer of neural radiance fields

Authors

Kunhao Liu,Fangneng Zhan,Yiwen Chen,Jiahui Zhang,Yingchen Yu,Shijian Lu

Published Date

2023/3

3D style transfer aims to render stylized novel views of a 3D scene with multi-view consistency. However, most existing work suffers from a three-way dilemma over accurate geometry reconstruction, high-quality stylization, and being generalizable to arbitrary new styles. We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by performing style transformation within the feature space of a radiance field. StyleRF employs an explicit grid of high-level features to represent 3D scenes, with which high-fidelity geometry can be reliably restored via volume rendering. In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer. StyleRF consists of two innovative designs. The first is sampling-invariant content transformation that makes the transformation invariant to the holistic statistics of the sampled 3D points and accordingly ensures multi-view consistency. The second is deferred style transformation of 2D feature maps which is equivalent to the transformation of 3D points but greatly reduces memory footprint without degrading multi-view consistency. Extensive experiments show that StyleRF achieves superior 3D stylization quality with precise geometry reconstruction and it can generalize to various new styles in a zero-shot manner. Project website: https://kunhao-liu. github. io/StyleRF/

Contextualized Networks Reveal Heterogeneous Transcriptomic Regulation in Tumors at Sample-Specific Resolution

Authors

Caleb N Ellington,Benjamin J Lengerich,Thomas BK Watkins,Jiekun Yang,Hanxi Xiao,Manolis Kellis,Eric P Xing

Journal

bioRxiv

Published Date

2023

Cancers are shaped by somatic mutations, microenvironment, and patient background, each altering gene expression and regulation in complex ways, resulting in heterogeneous cellular states and dynamics. Inferring gene regulatory network (GRN) models from expression data can help characterize this regulation-driven heterogeneity, but network inference requires many statistical samples, traditionally limiting GRNs to cluster-level analyses that ignore intra-cluster heterogeneity. We propose to move beyond cluster-based analyses by using contextualized learning, a multi-task learning paradigm which allows us to infer sample-specific models using phenotypic, molecular, and environmental information pertinent to the model, encoded as the model's "context" to be conditioned on. We unify three network model classes (Correlation, Markov, Neighborhood) and estimate context-specific GRNs for 7997 tumors across 25 tumor types, with each network contextualized by copy number and driver mutation profiles, tumor microenvironment, and patient demographics. Contextualized GRNs provide a structured view of expression dynamics at sample-specific resolution, which reveal co-expression modules in correlation networks (CNs), as well as cliques and independent regulatory elements in Markov Networks (MNs) and Neighborhood Regression Networks (NNs). Our generative modeling approach allows us to predict GRNs for unseen tumor types based on a pan-cancer model of how somatic mutations affect gene regulation. Finally, contextualized networks enable GRN-based precision oncology, explaining known biomarkers in terms of …

Cuttlefish: Low-rank model training without all the tuning

Authors

Hongyi Wang,Saurabh Agarwal,Yoshiki Tanaka,Eric Xing,Dimitris Papailiopoulos

Journal

Proceedings of Machine Learning and Systems

Published Date

2023/3/18

Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challenge by introducing Cuttlefish, an automated low-rank training approach that eliminates the need for tuning factorization hyperparameters. Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank (ie, an approximation of the true rank) of each layer stabilizes at a constant value. Cuttlefish switches from full-rank to low-rank training once the stable ranks of all layers have converged, setting the dimension of each factorization to its corresponding stable rank. Our results show that Cuttlefish generates models up to 5.6 times smaller than full-rank models, and attains up to a 1.2 times faster end-to-end training process while preserving comparable accuracy. Moreover, Cuttlefish outperforms state-of-the-art low-rank model training methods and other prominent baselines. The source code for our implementation can be found at: https://github. com/hwang595/Cuttlefish.

Neural-symbolic interaction and co-evolving

Authors

Bowen Tan,Shibo Hao,Eric Xing,Zhiting Hu

Journal

Compendium of Neurosymbolic Artificial Intelligence

Published Date

2023/8/4

Deep neural networks provide a powerful mechanism for learning patterns from massive data, achieving new levels of performance on image classification [24], speech recognition [25], machine translation [26], playing strategic board games [27], and so forth. Despite the impressive advances, the widely-used DNN methods still have limitations. The high predictive accuracy has heavily relied on large amounts of labeled data; and the purely data-driven learning can lead to uninterpretable and sometimes counterintuitive results [28, 29]. It is also difficult to encode human intention to guide the models to capture desired patterns, without expensive direct supervision or ad-hoc initialization. On the other hand, the cognitive process of human beings have indicated that people learn not only from concrete examples (as DNNs do) but also from different forms of general knowledge and rich experiences [30, 31]. Logic rules provide a flexible declarative language for communicating high-level cognition and expressing structured knowledge. It is therefore desirable to integrate logic rules into DNNs, to transfer human intention and domain knowledge to neural models, and regulate the learning process. In this section, we present a framework capable of enhancing general types of neural networks, such as convolutional networks (CNNs) and recurrent networks (RNNs), on various tasks, with logic rule knowledge. Combining symbolic representations with neural methods have been considered in different contexts. Neural-symbolic systems [32] construct a network from a given rule set to execute reasoning. To exploit a priori knowledge in general neural …

Fusing Models with Complementary Expertise

Authors

Hongyi Wang,Felipe Maia Polo,Yuekai Sun,Souvik Kundu,Eric Xing,Mikhail Yurochkin

Journal

arXiv preprint arXiv:2310.01542

Published Date

2023/10/2

Training AI models that generalize across tasks and domains has long been among the open problems driving AI research. The emergence of Foundation Models made it easier to obtain expert models for a given task, but the heterogeneity of data that may be encountered at test time often means that any single expert is insufficient. We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution and formulate it as an instance of supervised learning. Our method is applicable to both discriminative and generative tasks and leads to significant performance improvements in image and text classification, text summarization, multiple-choice QA, and automatic evaluation of generated text. We also extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time.

Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning

Authors

Han Guo,Philip Greengard,Eric P Xing,Yoon Kim

Journal

arXiv preprint arXiv:2311.12023

Published Date

2023/11/20

We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming formulation of the quantization component which enables dynamic configuration of quantization parameters (e.g., bit-width, block size) for each matrix given an overall target memory budget. We further explore a data-aware version of the algorithm which uses an approximation of the Fisher information matrix to weight the reconstruction objective during matrix decomposition. Experiments on adapting RoBERTa and LLaMA-2 (7B and 70B) demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines and moreover enables more aggressive quantization. For example, on the OpenAssistant benchmark LQ-LoRA is able to learn a 2.5-bit LLaMA-2 model that is competitive with a model finetuned with 4-bit QLoRA. When finetuned on a language modeling calibration dataset, LQ-LoRA can also be used for model compression; in this setting our 2.75-bit LLaMA-2-70B model (which has 2.85 bits on average when including the low-rank components and requires 27GB of GPU memory) is competitive with the original model in full precision.

RealChat-1M: A Large-Scale Real-World LLM Conversation Dataset

Authors

Lianmin Zheng,Wei-Lin Chiang,Ying Sheng,Tianle Li,Siyuan Zhuang,Zhanghao Wu,Yonghao Zhuang,Zhuohan Li,Zi Lin,Eric Xing,Joseph E Gonzalez,Ion Stoica,Hao Zhang

Published Date

2023/10/13

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce RealChat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our chat demo website. We offer an overview of the dataset's content, including its curation process, basic statistics, and topic distribution, highlighting its diversity, originality, and scale. We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions. We believe that this dataset will serve as a valuable resource for understanding and advancing LLM capabilities. The dataset will be publicly available.

Sliced recursive transformer

Authors

Zhiqiang Shen,Zechun Liu,Eric Xing

Journal

arXiv preprint arXiv:2111.05297 (ECCV 2022)

Published Date

2021/11/9

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across depth of transformer networks. The proposed method can obtain a substantial gain (2%) simply using naïve recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimal computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10–30% without sacrificing performance. We call our model Sliced Recursive Transformer (SReT), a novel and parameter-efficient vision transformer design that is compatible …

Amp: Automatically finding model parallel strategies with heterogeneity awareness

Authors

Dacheng Li,Hongyi Wang,Eric Xing,Hao Zhang

Journal

NeurIPS 2022

Published Date

2022/10/13

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers and cluster setups with more heterogeneous accelerators and bandwidth. We evaluate AMP on popular modelsand cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54 and 1.77 higher throughput than state-of-the-art model-parallel systems, respectively.

Beware the Black-Box of Medical Image Generation: an Uncertainty Analysis by the Learned Feature Space

Authors

Yunni Qu,David Yan,Eric Xing,Fengbo Zheng,Jie Zhang,Liangliang Liu,Gongbo Liang

Published Date

2022/7/11

Deep neural networks (DNNs) are the primary driving force for the current development of medical imaging analysis tools and often provide exciting performance on various tasks. However, such results are usually reported on the overall performance of DNNs, such as the Peak signal-to-noise ratio (PSNR) or mean square error (MSE) for imaging generation tasks. As a black-box, DNNs usually produce a relatively stable performance on the same task across multiple training trials, while the learned feature spaces could be significantly different. We believe additional insightful analysis, such as uncertainty analysis of the learned feature space, is equally important, if not more. Through this work, we evaluate the learned feature space of multiple U-Net architectures for image generation tasks using computational analysis and clustering analysis methods. We demonstrate that the learned feature spaces are easily …

Federated partially supervised learning with limited decentralized medical images

Authors

Nanqing Dong,Michael Kampffmeyer,Irina Voiculescu,Eric Xing

Journal

IEEE Transactions on Medical Imaging

Published Date

2022/12

Data government has played an instrumental role in securing the privacy-critical infrastructure in the medical domain and has led to an increased need of federated learning (FL). While decentralization can limit the effectiveness of standard supervised learning, the impact of decentralization on partially supervised learning remains unclear. Besides, due to data scarcity, each client may have access to only limited partially labeled data. As a remedy, this work formulates and discusses a new learning problem federated partially supervised learning (FPSL) for limited decentralized medical images with partial labels. We study the impact of decentralized partially labeled data on deep learning-based models via an exemplar of FPSL, namely, federated partially supervised learning multi-label classification . By dissecting FedAVG, a seminal FL framework, we formulate and analyze two major challenges of FPSL and …

Un-mix: Rethinking image mixtures for unsupervised visual representation learning

Authors

Zhiqiang Shen,Zechun Liu,Zhuang Liu,Marios Savvides,Trevor Darrell,Eric Xing

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2022/6/28

The recently advanced unsupervised learning approaches use the siamese-like framework to compare two" views" from the same image for learning representations. Making the two views distinctive is a core to guarantee that unsupervised methods can learn meaningful information. However, such frameworks are sometimes fragile on overfitting if the augmentations used for generating two views are not strong enough, causing the over-confident issue on the training data. This drawback hinders the model from learning subtle variance and fine-grained information. To address this, in this work we aim to involve the soft distance concept on label space in the contrastive-based unsupervised learning task and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces. Despite its conceptual simplicity, we show empirically that with the solution--Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet-1K with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, SwAV, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1~ 3% following exactly the same hyperparameters and training procedures of the base methods. Code is publicly available at https://github. com/szq0214/Un-Mix.

Structure correcting adversarial network for chest x-rays organ segmentation

Published Date

2020/6/30

Organ segmentation in chest X-rays using convolutional neural networks is disclosed. One embodiment provides a method to train a convolutional segmentation network with chest X-ray images to generate pixel-level predictions of target classes. Another embodiment will also train a critic network with an input mask, wherein the input mask is one of a segmentation network mask and a ground truth annotation, and outputting a probability that the input mask is the ground truth annotation instead of the prediction by the segmentation network, and to provide the probability output by the critic network to the segmentation network to guide the segmentation network to generate masks more consistent with learned higher-order structures.

Oracle-oriented Robustness: Robust Image Model Evaluation with Pretrained Models as Surrogate Oracle

Authors

Peiyan Zhang,Sunghun Kim,Eric Xing,Haohan Wang

Published Date

2022/9/29

Machine learning has demonstrated remarkable performances over finite datasets, yet whether the scores over the fixed benchmarks can sufficiently indicate the model’s performances in the real world is still in discussion. In reality, an ideal robust model will probably behave similarly to the oracle (*e.g.*, the human users), thus a good evaluation protocol is probably to evaluate the models’ behaviors in comparison to the oracle. In this paper, we introduce a new robustness measurement that directly measures the image classification model’s performance compared with a surrogate oracle. Besides, we design a simple method that can accomplish the evaluation beyond the scope of the benchmarks. Our method extends the image datasets with new samples that are sufficiently perturbed to be distinct from the ones in the original sets, but are still bounded within the same causal structure the original test image represents, constrained by a surrogate oracle model pretrained with a large amount of samples. As a result, our new method will offer us a new way to evaluate the models’ robustness performances, free of limitations of fixed benchmarks or constrained perturbations, although scoped by the power of the oracle. In addition to the evaluation results, we also leverage our generated data to understand the behaviors of the model and our new evaluation strategies.

Toward learning robust and invariant representations with alignment regularization and data augmentation

Authors

Haohan Wang,Zeyi Huang,Xindi Wu,Eric Xing

Published Date

2022/8/14

Data augmentation has been proven to be an effective technique for developing machine learning models that are robust to known classes of distributional shifts (e.g., rotations of images), and alignment regularization is a technique often used together with data augmentation to further help the model learn representations invariant to the shifts used to augment the data. In this paper, motivated by a proliferation of options of alignment regularizations, we seek to evaluate the performances of several popular design choices along the dimensions of robustness and invariance, for which we introduce a new test procedure. Our synthetic experiment results speak to the benefits of squared ℓ2 norm regularization. Further, we also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic. Finally, we test this simple technique we identify (worst-case …

Stochastic neural networks with infinite width are deterministic

Authors

Liu Ziyin,Hanlin Zhang,Xiangming Meng,Yuting Lu,Eric Xing,Masahito Ueda

Journal

arXiv preprint arXiv:2201.12724

Published Date

2022/1/30

This work theoretically studies stochastic neural networks, a main type of neural network in use. We prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Our theory justifies the common intuition that adding stochasticity to the model can help regularize the model by introducing an averaging effect. Two common examples that our theory can be relevant to are neural networks with dropout and Bayesian latent variable models in a special limit. Our result thus helps better understand how stochasticity affects the learning of neural networks and potentially design better architectures for practical problems.

MixMask: Revisiting Masking Strategy for Siamese ConvNets

Authors

Kirill Vishniakov,Eric Xing,Zhiqiang Shen

Journal

arXiv preprint arXiv:2210.11456

Published Date

2022/10/20

Recent advances in self-supervised learning have integrated Masked Image Modeling (MIM) and Siamese Networks into a unified framework that leverages the benefits of both techniques. However, several issues remain unaddressed when applying conventional erase-based masking with Siamese ConvNets. These include (I) the inability to drop uninformative masked regions in ConvNets as they process data continuously, resulting in low training efficiency compared to ViT models; and (II) the mismatch between erase-based masking and the contrastive-based objective in Siamese ConvNets, which differs from the MIM approach. In this paper, we propose a filling-based masking strategy called MixMask to prevent information incompleteness caused by the randomly erased regions in an image in the vanilla masking method. Furthermore, we introduce a flexible loss function design that considers the semantic distance change between two different mixed views to adapt the integrated architecture and prevent mismatches between the transformed input and objective in Masked Siamese ConvNets (MSCN). We conducted extensive experiments on various datasets, including CIFAR-100, Tiny-ImageNet, and ImageNet-1K. The results demonstrate that our proposed framework achieves superior accuracy on linear probing, semi-supervised, and supervised finetuning, outperforming the state-of-the-art MSCN by a significant margin. Additionally, we demonstrate the superiority of our approach in object detection and segmentation tasks. Our source code is available at https://github.com/LightnessOfBeing/MixMask.

Kernel Mixed Model for Transcriptome Association Study

Authors

Haohan Wang,Oscar Lopez,Eric P Xing,Wei Wu

Journal

Journal of Computational Biology

Published Date

2022/12/1

We introduce the python software package Kernel Mixed Model (KMM), which allows users to incorporate the network structure into transcriptome-wide association studies (TWASs). Our software is based on the association algorithm KMM, which is a method that enables the incorporation of the network structure as the kernels of the linear mixed model for TWAS. The implementation of the algorithm aims to offer users simple access to the algorithm through a one-line command. Furthermore, to improve the computing efficiency in case when the interaction network is sparse, we also provide the flexibility of computing with the sparse counterpart of the matrices offered in Python, which reduces both the computation operations and the memory required.

Betty: An automatic differentiation library for multilevel optimization

Authors

Sang Keun Choe,Willie Neiswanger,Pengtao Xie,Eric Xing

Journal

arXiv preprint arXiv:2207.02849

Published Date

2022/7/5

Gradient-based multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and meta-learning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for large-scale MLO. At its core, we devise a novel dataflow graph for MLO, which allows us to (1) develop efficient automatic differentiation for MLO that reduces the computational complexity from O(d^3) to O(d^2), (2) incorporate systems support such as mixed-precision and data-parallel training for scalability, and (3) facilitate implementation of MLO programs of arbitrary complexity while allowing a modular interface for diverse algorithmic and systems design choices. We empirically demonstrate that Betty can be used to implement an array of MLO programs, while also observing up to 11% increase in test accuracy, 14% decrease in GPU memory usage, and 20% decrease in training wall time over existing implementations on multiple benchmarks. We also showcase that Betty enables scaling MLO to models with hundreds of millions of parameters. We open-source the code at https://github.com/leopard-ai/betty.

The impact of symbolic representations on in-context learning for few-shot reasoning

Authors

Hanlin Zhang,Yi-Fan Zhang,Li Erran Li,Eric Xing

Journal

arXiv preprint arXiv:2212.08686

Published Date

2022/12/16

Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations (or ``chain-of-thought'' (CoT)) for in-context learning. On the other hand, these reasoning tasks are usually presumed to be more approachable for symbolic programming. To make progress towards understanding in-context learning, we curate synthetic datasets containing equivalent (natural, symbolic) data pairs, where symbolic examples contain first-order logic rules and predicates from knowledge bases (KBs). Then we revisit neuro-symbolic approaches and use Language Models as Logic Programmer (LMLP) that learns from demonstrations containing logic rules and corresponding examples to iteratively reason over KBs, recovering Prolog's backward chaining algorithm. Comprehensive experiments are included to systematically compare LMLP with CoT in deductive reasoning settings, showing that LMLP enjoys more than 25% higher accuracy than CoT on length generalization benchmarks even with fewer parameters.

Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep Learning

Authors

Lianmin Zheng,Zhuohan Li,Hao Zhang,Yonghao Zhuang,Zhifeng Chen,Yanping Huang,Yida Wang,Yuanzhong Xu,Danyang Zhuo,Eric P Xing,Joseph E Gonzalez,Ion Stoica

Journal

arXiv preprint arXiv:2201.12023

Published Date

2022/1/28

Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive efficient parallel execution plans at each parallelism level. Alpa implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans. Alpa's source code is publicly available at https://github. com/alpa-projects/alpa

Negational symmetry of quantum neural networks for binary pattern classification

Authors

Nanqing Dong,Michael Kampffmeyer,Irina Voiculescu,Eric Xing

Journal

Pattern Recognition

Published Date

2022/9

Although quantum neural networks (QNNs) have shown promising results in solving simple machine learning tasks recently, the behavior of QNNs in binary pattern classification is still underexplored. In this work, we find that QNNs have an Achilles’ heel in binary pattern classification. To illustrate this point, we provide a theoretical insight into the properties of QNNs by presenting and analyzing a new form of symmetry embedded in a family of QNNs with full entanglement, which we term negational symmetry. Due to negational symmetry, QNNs can not differentiate between a quantum binary signal and its negational counterpart. We empirically evaluate the negational symmetry of QNNs in binary pattern classification tasks using Google’s quantum computing framework. Both theoretical and experimental results suggest that negational symmetry is a fundamental property of QNNs, which is not shared by classical …

Meta-DETR: Image-level few-shot detection with inter-class correlation exploitation

Authors

Gongjie Zhang,Zhipeng Luo,Kaiwen Cui,Shijian Lu,Eric P Xing

Journal

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI).

Published Date

2022/8/2

Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, which (i) is the first image-level few-shot detector, and (ii) introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes for robust and accurate few-shot object detection. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. In addition, the introduced …

A Toolkit for Assessments in Introductory Programming Courses

Authors

Eric Xing,Guangming Xing

Published Date

2022/3/1

Traditional paper-based exams and LMS-provided online exams for introductory programming courses are not aligned with learning objectives that emphasize problem-solving and coding skills. In this poster, we present a cloud-based assessment solution for introductory programming courses. First, we discuss the requirements and challenges of conducting frequent assessments. We then outline the functions in our online exam toolkit that allow instructors to administer versatile assessments. Instead of relying on a traditional lockdown browser, the plagiarism and cheating detection in our toolkit allows instructors to administer exams in any modern browser for face-to-face classes.

Rlprompt: Optimizing discrete text prompts with reinforcement learning

Authors

Mingkai Deng,Jianyu Wang,Cheng-Ping Hsieh,Yihan Wang,Han Guo,Tianmin Shu,Meng Song,Eric P Xing,Zhiting Hu

Journal

arXiv preprint arXiv:2205.12548

Published Date

2022/5/25

Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicability when gradients are not accessible. Discrete prompt, on the other hand, is difficult to optimize, and is often created by "enumeration (e.g., paraphrasing)-then-selection" heuristics that do not explore the prompt space systematically. This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL). RLPrompt formulates a parameter-efficient policy network that generates the desired discrete prompt after training with reward. To overcome the complexity and stochasticity of reward signals by the large LM environment, we incorporate effective reward stabilization that substantially enhances the training efficiency. RLPrompt is flexibly applicable to different types of LMs, such as masked (e.g., BERT) and left-to-right models (e.g., GPTs), for both classification and generation tasks. Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods. Interestingly, the resulting optimized prompts are often ungrammatical gibberish text; and surprisingly, those gibberish prompts are transferrable between different LMs to retain significant performance, indicating LM prompting may not follow human language patterns.

MPCFormer: fast, performant and private Transformer inference with MPC

Authors

Dacheng Li,Rulin Shao,Hongyi Wang,Han Guo,Eric P Xing,Hao Zhang

Journal

arXiv preprint arXiv:2211.01452

Published Date

2022/11/2

Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillation (KD). Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model. On the IMDb dataset, it achieves similar performance to BERTBASE, while being 5.3x faster. On the GLUE benchmark, it achieves 97% performance of BERTBASE with a 2.2x speedup. MPCFORMER remains effective with different trained Transformer weights such as ROBERTABASE and larger models including BERTLarge. Code is available at https://github.com/MccRee177/MPCFormer.

Expeditious Saliency-guided Mix-up through Random Gradient Thresholding

Authors

Minh-Long Luu,Zeyi Huang,Eric P Xing,Yong Jae Lee,Haohan Wang

Journal

arXiv preprint arXiv:2212.04875

Published Date

2022/12/9

Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve saliency-guided procedures but minimal focus on the arbitrary path, leaving the randomization domain unexplored. In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes. By combining the best elements of randomness and saliency utilization, our method balances speed, simplicity, and accuracy. We name our method R-Mix following the concept of "Random Mix-up". We demonstrate its effectiveness in generalization, weakly supervised object localization, calibration, and robustness to adversarial attacks. Finally, in order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies based on the classifier's performance, reducing dependency on human-designed objectives and hyperparameter tuning. Extensive experiments further show that the agent is capable of performing at the cutting-edge level, laying the foundation for a fully automatic mix-up. Our code is released at [https://github.com/minhlong94/Random-Mixup].

Technology readiness levels for machine learning systems

Authors

Alexander Lavin,Ciarán M Gilligan-Lee,Alessya Visnjic,Siddha Ganju,Dava Newman,Sujoy Ganguly,Danny Lange,Atílím Güneş Baydin,Amit Sharma,Adam Gibson,Stephan Zheng,Eric P Xing,Chris Mattmann,James Parr,Yarin Gal

Journal

Nature Communications

Published Date

2022/10/20

The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, with mission critical measures and robustness throughout the process. Drawing on experience in both spacecraft engineering and machine learning (research through product across domain areas), we’ve developed a proven systems engineering approach for machine learning and artificial intelligence: the Machine Learning Technology Readiness Levels framework defines a principled process to ensure robust, reliable …

BertNet: Harvesting knowledge graphs with arbitrary relations from pretrained language models

Authors

Shibo Hao,Bowen Tan,Kaiwen Tang,Bin Ni,Xiyan Shao,Hengzhe Zhang,Eric P Xing,Zhiting Hu

Journal

arXiv preprint arXiv:2206.14268

Published Date

2022/6/28

It is crucial to automatically construct knowledge graphs (KGs) of diverse new relations to support knowledge discovery and broad applications. Previous KG construction methods, based on either crowdsourcing or text mining, are often limited to a small predefined set of relations due to manual cost or restrictions in text corpus. Recent research proposed to use pretrained language models (LMs) as implicit knowledge bases that accept knowledge queries with prompts. Yet, the implicit knowledge lacks many desirable properties of a full-scale symbolic KG, such as easy access, navigation, editing, and quality assurance. In this paper, we propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs. With minimal input of a relation definition (a prompt and a few shot of example entity pairs), the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge of the desired relation. We develop an effective search-and-rescore mechanism for improved efficiency and accuracy. We deploy the approach to harvest KGs of over 400 new relations from different LMs. Extensive human and automatic evaluations show our approach manages to extract diverse accurate knowledge, including tuples of complex relations (e.g., "A is capable of but not good at B"). The resulting KGs as a symbolic interpretation of the source LMs also reveal new insights into the LMs' knowledge capacities.

Prototypical graph contrastive learning

Authors

Shuai Lin,Chen Liu,Pan Zhou,Zi-Yuan Hu,Shuojia Wang,Ruihui Zhao,Yefeng Zheng,Liang Lin,Eric Xing,Xiaodan Liang

Journal

IEEE transactions on neural networks and learning systems

Published Date

2022/7/27

Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. However, in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs an instance discrimination task, which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this article, we propose a prototypical graph contrastive learning (PGCL) approach. Specifically, PGCL models the underlying …

Trade-offs of linear mixed models in genome-wide Association studies

Authors

Haohan Wang,Bryon Aragam,Eric P Xing

Journal

Journal of Computational Biology

Published Date

2022/3/1

Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique to trade off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors—population stratification and environmental confounding factors—and study how different methods that are commonly used in practice trade off these two confounding …

Dropout as a regularizer of interaction effects

Authors

Benjamin J Lengerich,Eric Xing,Rich Caruana

Published Date

2022/5/3

We examine Dropout through the perspective of interactions. This view provides a symmetry to explain Dropout: given N variables, there are N choose k possible sets of k variables to form an interaction (ie O (N^ k)); conversely, the probability an interaction of k variables survives Dropout at rate p is (1-p)^ k (decaying with k). These rates effectively cancel, and so Dropout regularizes against higher-order interactions. We prove this perspective analytically and empirically. This perspective of Dropout as a regularizer against interaction effects has several practical implications:(1) higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions,(2) caution should be exercised when interpreting Dropout-based explanations and uncertainty measures, and (3) networks trained with Input Dropout are biased estimators. We also compare Dropout to other regularizers and find that it is difficult to obtain the same selective pressure against high-order interactions with these methods.

Neural network decision-making criteria consistency analysis via inputs sensitivity

Authors

Eric Xing,Xin Xing,Liangliang Liu,Nathan Jacobs,Yunni Qu,Gongbo Liang

Published Date

2022/8/24

Neural networks (NNs) have demonstrated exciting results on various tasks within the last decade. For example, the performance on image classification tasks has been improved dramatically. However, the performance evaluations are often based on a black-box performance, such as accuracy, while insightful analysis of the black-box, such as the prediction formation mechanism, is often missing. Empirically, a NN usually produces a stable overall performance on the same task across multiple training trials when treating it as a black-box. However, when unveiling the black-box, the performance is usually volatile. The decision-making criteria learned by the training trials are often significantly different, which is problematic in many ways. We believe achieving consistent criteria between different training trials is equally important to achieving high performance, if not more. This work, firstly, evaluates the decision …

Data-free neural architecture search via recursive label calibration

Authors

Zechun Liu,Zhiqiang Shen,Yun Long,Eric Xing,Kwang-Ting Cheng,Chas Leichner

Journal

arXiv preprint arXiv:2112.02086 (ECCV 2022)

Published Date

2021/12/3

This paper aims to explore the feasibility of neural architecture search (NAS) given only a pre-trained model without using any original training data. This is an important circumstance for privacy protection, bias avoidance, etc., in real-world scenarios. To achieve this, we start by synthesizing usable data through recovering the knowledge from a pre-trained deep neural network. Then we use the synthesized data and their predicted soft labels to guide NAS. We identify that the quality of the synthesized data will substantially affect the NAS results. Particularly, we find NAS requires the synthesized images to possess enough semantics, diversity, and a minimal domain gap from the natural images. To meet these requirements, we propose recursive label calibration to encode more relative semantics in images, as well as regional update strategy to enhance the diversity. Further, we use input and feature-level …

Rare gems: Finding lottery tickets at initialization

Authors

Kartik Sreenivasan,Jy-yong Sohn,Liu Yang,Matthew Grinde,Alliot Nagle,Hongyi Wang,Kangwook Lee,Dimitris Papailiopoulos

Journal

Advances in neural information processing systems

Published Date

2022/11

Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming" train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training lottery tickets, ie, special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, eg, against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by proposing Gem-Miner which finds lottery tickets at initialization that beat current baselines. Gem-Miner finds lottery tickets trainable to accuracy competitive or better than Iterative Magnitude Pruning (IMP), and does so up to faster.

ASDOT: Any-shot data-to-text generation with pretrained language models

Authors

Jiannan Xiang,Zhengzhong Liu,Yucheng Zhou,Eric P Xing,Zhiting Hu

Journal

arXiv preprint arXiv:2210.04325

Published Date

2022/10/9

Data-to-text generation is challenging due to the great variety of the input data in terms of domains (e.g., finance vs sports) or schemata (e.g., diverse predicates). Recent end-to-end neural methods thus require substantial training examples to learn to disambiguate and describe the data. Yet, real-world data-to-text problems often suffer from various data-scarce issues: one may have access to only a handful of or no training examples, and/or have to rely on examples in a different domain or schema. To fill this gap, we propose Any-Shot Data-to-Text (ASDOT), a new approach flexibly applicable to diverse settings by making efficient use of any given (or no) examples. ASDOT consists of two steps, data disambiguation and sentence fusion, both of which are amenable to be solved with off-the-shelf pretrained language models (LMs) with optional finetuning. In the data disambiguation stage, we employ the prompted GPT-3 model to understand possibly ambiguous triples from the input data and convert each into a short sentence with reduced ambiguity. The sentence fusion stage then uses an LM like T5 to fuse all the resulting sentences into a coherent paragraph as the final description. We evaluate extensively on various datasets in different scenarios, including the zero-/few-/full-shot settings, and generalization to unseen predicates and out-of-domain data. Experimental results show that ASDOT consistently achieves significant improvement over baselines, e.g., a 30.81 BLEU gain on the DART dataset under the zero-shot setting.

Sdq: Stochastic differentiable quantization with mixed precision

Authors

Xijie Huang,Zhiqiang Shen,Shichao Li,Zechun Liu,Xianghong Hu,Jeffry Wicaksana,Eric Xing,Kwang-Ting Cheng

Published Date

2022/6/9

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports various-bit arithmetic operations, recent research on mixed precision quantization (MPQ) begins to fully leverage the capacity of representation by searching various bitwidths for different layers and modules in a network. However, previous studies mainly search the MPQ strategy in a costly scheme using reinforcement learning, neural architecture search, etc., or simply utilize partial prior knowledge for bitwidth distribution, which might be biased and sub-optimal. In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with a smoother gradient approximation. Particularly, Differentiable Bitwidth Parameters (DBPs) are employed as the probability factors in stochastic quantization between adjacent bitwidth. After the optimal MPQ strategy is acquired, we further train our network with the entropy-aware bin regularization and knowledge distillation. We extensively evaluate our method on different networks, hardwares (GPUs and FPGA), and datasets. SDQ outperforms all other state-of-the-art mixed or single precision quantization with less bitwidth, and are even better than the original full-precision counterparts across various ResNet and MobileNet families, demonstrating the effectiveness and superiority of our method. Code will be publicly available.

Robustar: Interactive Toolbox Supporting Precise Data Annotation for Robust Vision Learning

Authors

Chonghan Chen,Haohan Wang,Leyang Hu,Yuhao Zhang,Shuguang Lyu,Jingcheng Wu,Xinnuo Li,Linjing Sun,Eric P Xing

Journal

arXiv preprint arXiv:2207.08944

Published Date

2022/7/18

We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model's robustness is the tendency of the model's learning of spurious features, we aim to solve this problem from its root at the data perspective by removing the spurious features from the data before training. In particular, we introduce a software that helps the users to better prepare the data for training image classification models by allowing the users to annotate the spurious features at the pixel level of images. To facilitate this process, our software also leverages recent advances to help identify potential images and pixels worthy of attention and to continue the training with newly annotated data. Our software is hosted at the GitHub Repository https://github.com/HaohanWang/Robustar.

Efficient peer-to-peer architecture for distributed machine learning

Published Date

2022/2/15

A computer in a distributed peer-to-peer system is disclosed. The distributed system includes a plurality of computers configured to run a distributed machine learning (ML) program represented as an expression of a target loss function with a model parameter matrix. The computer includes: a parser module configured to convert a loss function in the distributed program into an expression graph and then one or more multiplication trees; a parameter replica module in communication with the parser module, the parameter replica module configured to maintain the model parameter matrix of the ML program; a compressor module in communication with the parameter replica module, the compressor module configured to extract sufficient factors from the expression graph for updating the model matrix; and a communication module in communication with the compressor module, the communication module configured …

Gene set priorization guided by regulatory networks with p-values through kernel mixed model

Authors

Haohan Wang,Oscar L Lopez,Wei Wu,Eric P Xing

Published Date

2022/4/29

The transcriptome association study has helped prioritize many causal genes for detailed study and thus further helped the development of many therapeutic strategies for multiple diseases. How- ever, prioritizing the causal gene only does not seem always to be able to offer sufficient guidance to the downstream analysis. Thus, in this paper, we propose to perform the association studies from another perspective: we aim to prioritize genes with a tradeoff between the pursuit of the causality evidence and the interest of the genes in the pathway. We introduce a new method for transcriptome association study by incorporating the information of gene regulatory networks. In addition to directly building the regularization into variable selection methods, we also expect the method to report p-values of the associated genes so that these p-values have been empirically proved trustworthy by geneticists. Thus, we introduce a …

System and Methods for Distributed Machine Learning with Multiple Data Sources, Multiple Programming Languages or Frameworks, and Multiple Devices or Infrastructures

Published Date

2022/8/18

Methods and systems are presented for consuming different data sources, and deploying artificial intelligence and machine learning programs on different target devices or infrastructures. Many data types can be transformed into machine learning data shards (MLDS) while many machine learning programs written in various programming languages or frameworks are transformed to common operator representations. Operator representations are transformed into execution graphs (EG) for a chosen target device or infrastructure. The MLDS and EG are input to the targeted devices and infrastructures, which then execute the machine learning programs (now transformed to EGs) on the MLDS to produce trained models or predictions with trained models.

A fast knowledge distillation framework for visual recognition

Authors

Zhiqiang Shen,Eric Xing

Journal

arXiv preprint arXiv:2112.01528 (ECCV 2022)

Published Date

2021/12/2

While Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as supervised classification and self-supervised representation learning, the main drawback of a vanilla KD framework is its mechanism that consumes the majority of the computational overhead on forwarding through the giant teacher networks, making the entire learning procedure inefficient and costly. The recently proposed solution ReLabel suggests creating a label map for the entire image. During training, it receives the cropped region-level label by RoI aligning on a pre-generated entire label map, which allows for efficient supervision generation without having to pass through the teachers repeatedly. However, as the pre-trained teacher employed in ReLabel is from the conventional multi-crop scheme, there are various mismatches between the global label-map and region-level labels in this technique …

Masked generative adversarial networks are data-efficient generation learners

Authors

Jiaxing Huang,Kaiwen Cui,Dayan Guan,Aoran Xiao,Fangneng Zhan,Shijian Lu,Shengcai Liao,Eric Xing

Published Date

2022/11

This paper shows that masked generative adversarial network (MaskedGAN) is robust image generation learners with limited training data. The idea of MaskedGAN is simple: it randomly masks out certain image information for effective GAN training with limited data. We develop two masking strategies that work along orthogonal dimensions of training images, including a shifted spatial masking that masks the images in spatial dimensions with random shifts, and a balanced spectral masking that masks certain image spectral bands with self-adaptive probabilities. The two masking strategies complement each other which together encourage more challenging holistic learning from limited training data, ultimately suppressing trivial solutions and failures in GAN training. Albeit simple, extensive experiments show that MaskedGAN achieves superior performance consistently across different network architectures (eg, CNNs including BigGAN and StyleGAN-v2 and Transformers including TransGAN and GANformer) and datasets (eg, CIFAR-10, CIFAR-100, ImageNet, 100-shot, AFHQ, FFHQ and Cityscapes).

Does Dataset Lottery Ticket Hypothesis Exist?

Authors

Zhiqiang Shen,Eric Xing

Published Date

2022/9/29

Tuning hyperparameters and exploring the suitable training schemes for the self-supervised models are usually expensive and resource-consuming, especially on large-scale datasets like ImageNet-1K. Critically, this means only a few establishments (e.g., Google, Meta, etc.) have the ability to afford the heavy experiments on this task, which seriously hinders more engagement and better development of this area. An ideal situation is that there exists a subset from the full large-scale dataset, the subset can correctly reflect the performance distinction when performing different training frameworks, hyper-parameters, etc. This new training manner will substantially decrease resource requirements and improve the computational performance of ablations without compromising accuracy on the full dataset. We formulate this interesting problem as the dataset lottery ticket hypothesis and the target subsets as the winning tickets. In this work, we analyze this problem through finding out partial empirical data on the class dimension that has a consistent {\em Empirical Risk Trend} as the full observed dataset. We also examine multiple solutions, including (i) a uniform selection scheme that has been widely used in literature; (ii) subsets by involving prior knowledge, for instance, using the sorted per-class performance of the strong supervised model to identify the desired subset, WordNet Tree on hierarchical semantic classes, etc., for generating the target winning tickets. We verify this hypothesis on the self-supervised learning task across a variety of recent mainstream methods, such as MAE, DINO, MoCo-V1/V2, etc., with different backbones like ResNet …

Learning from mistakes–a framework for neural architecture search

Authors

Bhanu Garg,Li Zhang,Pradyumna Sridhara,Ramtin Hosseini,Eric Xing,Pengtao Xie

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2022/6/28

Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, so as to deepen their understanding. In this paper, we investigate if this human learning strategy can be applied in machine learning. We propose a novel machine learning method called Learning From Mistakes (LFM), wherein the learner improves its ability to learn by focusing more on the mistakes during revision. We formulate LFM as a three-stage optimization problem: 1) learner learns; 2) learner re-learns focusing on the mistakes, and; 3) learner validates its learning. We develop an efficient algorithm to solve the LFM problem. We apply the LFM framework to neural architecture search on CIFAR-10, CIFAR-100, and Imagenet. Experimental results strongly demonstrate the effectiveness of our model.

MRCLens: an MRC Dataset Bias Detection Toolkit

Authors

Yifan Zhong,Haohan Wang,Eric P Xing

Journal

arXiv preprint arXiv:2207.08943

Published Date

2022/7/18

Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.

System for automated data engineering for large scale machine learning

Published Date

2022/4/12

Accordingly, a data engineering system for machine learning at scale is disclosed. In one embodiment, the data engineering system includes an ingest processing module having a schema update submodule and a feature statistics update submodule, wherein the schema update submodule is configured to discover new features and add them to a schema, and wherein the feature statistics update submodule collects statistics for each feature to be used in an online transformation, a record store to store data from a data source, and a transformation module, to receive a low dimensional data instance from the record store and to receive the schema and feature statistics from the ingest processing module, and to transform the low dimensional data instance into a high dimensional representation. One embodiment provides a method for data engineering for machine learning at scale, the method including calling a …

Toward learning human-aligned cross-domain robust models by countering misaligned features

Authors

Haohan Wang,Zeyi Huang,Hanlin Zhang,Yong Jae Lee,Eric P Xing

Published Date

2022/8/17

Machine learning has demonstrated remarkable prediction accuracy over iid data, but the accuracy often drops when tested with data from another distribution. In this paper, we aim to offer another view of this problem in a perspective assuming the reason behind this accuracy drop is the reliance of models on the features that are not aligned well with how a data annotator considers similar across these two datasets. We refer to these features as misaligned features. We extend the conventional generalization error bound to a new one for this setup with the knowledge of how the misaligned features are associated with the label. Our analysis offers a set of techniques for this problem, and these techniques are naturally linked to many previous methods in robust machine learning literature. We also compared the empirical strength of these methods demonstrated the performance when these previous techniques are combined, with implementation available.

Exploring transformer backbones for heterogeneous treatment effect estimation

Authors

Yi-Fan Zhang,Hanlin Zhang,Zachary C Lipton,Li Erran Li,Eric P Xing

Journal

arXiv preprint arXiv:2202.01336

Published Date

2022/2/2

Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application. Recent work uses multilayer perceptron (MLP) for modeling casual relationships, however, MLPs lag far behind recent advances in ML methodology, which limits their applicability and generalizability. To extend beyond the single domain formulation and towards more realistic learning scenarios, we explore model design spaces beyond MLPs, i.e., transformer backbones, which provide flexibility where attention layers govern interactions among treatments and covariates to exploit structural similarities of potential outcomes for confounding control. Through careful model design, Transformers as Treatment Effect Estimators (TransTEE) is proposed. We show empirically that TransTEE can: (1) serve as a general purpose treatment effect estimator that significantly outperforms competitive baselines in a variety of challenging TEE problems (e.g., discrete, continuous, structured, or dosage-associated treatments) and is applicable to both when covariates are tabular and when they consist of structural data (e.g., texts, graphs); (2) yield multiple advantages: compatibility with propensity score modeling, parameter efficiency, robustness to continuous treatment value distribution shifts, explainable in covariate adjustment, and real-world utility in auditing pre-trained language models

See List of Professors in Eric Xing University(Carnegie Mellon University)

Eric Xing FAQs

What is Eric Xing's h-index at Carnegie Mellon University?

The h-index of Eric Xing has been 87 since 2020 and 114 in total.

What are Eric Xing's top articles?

The articles with the titles of

Learning to Prompt Segment Anything Models

Judging llm-as-a-judge with mt-bench and chatbot arena

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Temporally Disentangled Representation Learning under Unknown Nonstationarity

Cappy: Outperforming and boosting large multi-task lms with a small scorer

AttentionPert: Accurately Modeling Multiplexed Genetic Perturbations with Multi-scale Effects

Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective

Generating, Reconstructing, and Representing Discrete and Continuous Data: Generalized Diffusion with Learnable Encoding-Decoding

...

are the top articles of Eric Xing at Carnegie Mellon University.

What are Eric Xing's research interests?

The research interests of Eric Xing are: Machine Learning, ML Systems, Optimization, Statistics, Network Analysis

What is Eric Xing's total number of citations?

Eric Xing has 57,613 citations in total.

What are the co-authors of Eric Xing?

The co-authors of Eric Xing are Michael I. Jordan, Li Fei-Fei, David Blei, Noah A. Smith, Jun Zhu, Edoardo M Airoldi.

    Co-Authors

    H-index: 203
    Michael I. Jordan

    Michael I. Jordan

    University of California, Berkeley

    H-index: 144
    Li Fei-Fei

    Li Fei-Fei

    Stanford University

    H-index: 106
    David Blei

    David Blei

    Columbia University in the City of New York

    H-index: 104
    Noah A. Smith

    Noah A. Smith

    University of Washington

    H-index: 75
    Jun Zhu

    Jun Zhu

    Tsinghua University

    H-index: 51
    Edoardo M Airoldi

    Edoardo M Airoldi

    Harvard University

    academic-engine

    Useful Links