Sparsity-aware reconfigurable compute-in-memory (CIM) static random access memory (SRAM)

Published On 2024/3/12

Sparsity-aware reconfiguration compute-in-memory (CIM) static random access memory (SRAM) systems are disclosed. In one aspect, a reconfigurable precision succession approximation register (SAR) analog-to-digital converter (ADC) that has the ability to form (n+ m) bit precision using n-bit and m-bit sub-ADCs is provided. By controlling which sub-ADCs are used based on data sparsity, precision may be maintained as needed while providing a more energy efficient design.

Authors

Kaushik Roy

Kaushik Roy

Purdue University

H-Index

123

Research Interests

Neuromorphic computing

Machine Learning

ML hardware

Neuro-mimetic devices

Energy-efficient computing

University Profile Page

Amogh Agrawal

Amogh Agrawal

Purdue University

H-Index

17

Research Interests

SRAM

Compute-in-Memory

STT-MRAM

University Profile Page

Mustafa Ali

Mustafa Ali

Purdue University

H-Index

11

Research Interests

Deep Learning Hardware

ASIC

VLSI

DL Accelerators

University Profile Page

Utkarsh Saxena

Utkarsh Saxena

Purdue University

H-Index

6

Research Interests

Machine Learning

In-Memory Computing

Non Volatile Memories

University Profile Page

Other Articles from authors

Kaushik Roy

Kaushik Roy

Purdue University

IEEE Robotics and Automation Letters

EV-Planner: Energy-Efficient Robot Navigation via Event-Based Physics-Guided Neuromorphic Planner

Vision-based object tracking is an essential precursor to performing autonomous aerial navigation in order to avoid obstacles. Biologically inspired neuromorphic event cameras are emerging as a powerful alternative to frame-based cameras, due to their ability to asynchronously detect varying intensities (even in poor lighting conditions), high dynamic range, and robustness to motion blur. Spiking neural networks (SNNs) have gained traction for processing events asynchronously in an energy-efficient manner. On the other hand, physics-based artificial intelligence (AI) has gained prominence recently, as they enable embedding system knowledge via physical modeling inside traditional analog neural networks (ANNs). In this letter, we present an event-based physics-guided neuromorphic planner (EV-Planner) to perform obstacle avoidance using neuromorphic event cameras and physics-based AI. We consider …

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2402.10353

Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models

Prompt learning is susceptible to intrinsic bias present in pre-trained language models (LMs), resulting in sub-optimal performance of prompt-based zero/few-shot learning. In this work, we propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained LMs. Different from prior efforts that address intrinsic bias primarily for social fairness and often involve excessive computational cost, our objective is to explore enhancing LMs' performance in downstream zero/few-shot learning while emphasizing the efficiency of intrinsic bias calibration. Specifically, we leverage a diverse set of auto-selected null-meaning inputs generated from GPT-4 to prompt pre-trained LMs for intrinsic bias probing. Utilizing the bias-reflected probability distribution, we formulate a distribution disparity loss for bias calibration, where we exclusively update bias parameters ( of total parameters) of LMs towards equal probability distribution. Experimental results show that the calibration promotes an equitable starting point for LMs while preserving language modeling abilities. Across a wide range of datasets, including sentiment analysis and topic classification, our method significantly improves zero/few-shot learning performance of LMs for both in-context learning and prompt-based fine-tuning (on average and , respectively).

Kaushik Roy

Kaushik Roy

Purdue University

arXiv e-prints

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature. Most of these algorithms introduce an additional hyper-parameter referred to as consensus step-size which is tuned based on the compression ratio at the beginning of the training. In this work, we propose AdaGossip, a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. We demonstrate the effectiveness of the proposed method through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100 …

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2403.08618

Verifix: Post-Training Correction to Improve Label Noise Robustness with Verified Samples

Label corruption, where training samples have incorrect labels, can significantly degrade the performance of machine learning models. This corruption often arises from non-expert labeling or adversarial attacks. Acquiring large, perfectly labeled datasets is costly, and retraining large models from scratch when a clean dataset becomes available is computationally expensive. To address this challenge, we propose Post-Training Correction, a new paradigm that adjusts model parameters after initial training to mitigate label noise, eliminating the need for retraining. We introduce Verifix, a novel Singular Value Decomposition (SVD) based algorithm that leverages a small, verified dataset to correct the model weights using a single update. Verifix uses SVD to estimate a Clean Activation Space and then projects the model's weights onto this space to suppress activations corresponding to corrupted data. We demonstrate Verifix's effectiveness on both synthetic and real-world label noise. Experiments on the CIFAR dataset with 25% synthetic corruption show 7.36% generalization improvements on average. Additionally, we observe generalization improvements of up to 2.63% on naturally corrupted datasets like WebVision1.0 and Clothing1M.

Utkarsh Saxena

Utkarsh Saxena

Purdue University

Scientific Reports

Ferroelectric capacitors and field-effect transistors as in-memory computing elements for machine learning workloads

This study discusses the feasibility of Ferroelectric Capacitors (FeCaps) and Ferroelectric Field-Effect Transistors (FeFETs) as In-Memory Computing (IMC) elements to accelerate machine learning (ML) workloads. We conducted an exploration of device fabrication and proposed system-algorithm co-design to boost performance. A novel FeCap device, incorporating an interfacial layer (IL) and (HZO), ensures a reduction in operating voltage and enhances HZO scaling while being compatible with CMOS circuits. The IL also enriches ferroelectricity and retention properties. When integrated into crossbar arrays, FeCaps and FeFETs demonstrate their effectiveness as IMC components, eliminating sneak paths and enabling selector-less operation, leading to notable improvements in energy efficiency and area utilization. However, it is worth noting that limited capacitance ratios in FeCaps introduced errors in …

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2211.10754

HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities

Event cameras detect changes in per-pixel intensity to generate asynchronous `event streams'. They offer great potential for accurate semantic map retrieval in real-time autonomous systems owing to their much higher temporal resolution and high dynamic range (HDR) compared to conventional cameras. However, existing implementations for event-based segmentation suffer from sub-optimal performance since these temporally dense events only measure the varying component of a visual signal, limiting their ability to encode dense spatial context compared to frames. To address this issue, we propose a hybrid end-to-end learning framework HALSIE, utilizing three key concepts to reduce inference cost by up to versus prior art while retaining similar performance: First, a simple and efficient cross-domain learning scheme to extract complementary spatio-temporal embeddings from both frames and events. Second, a specially designed dual-encoder scheme with Spiking Neural Network (SNN) and Artificial Neural Network (ANN) branches to minimize latency while retaining cross-domain feature aggregation. Third, a multi-scale cue mixer to model rich representations of the fused embeddings. These qualities of HALSIE allow for a very lightweight architecture achieving state-of-the-art segmentation performance on DDD-17, MVSEC, and DSEC-Semantic datasets with up to higher parameter efficiency and favorable inference cost (17.9mJ per cycle). Our ablation study also brings new insights into effective design choices that can prove beneficial for research across other vision tasks.

2022/11/19

Article Details
Kaushik Roy

Kaushik Roy

Purdue University

Advances in Neural Information Processing Systems

Global update tracking: A decentralized learning algorithm for heterogeneous data

Decentralized learning enables the training of deep learning models over large distributed datasets generated at different locations, without the need for a central server. However, in practical scenarios, the data distribution across these devices can be significantly different, leading to a degradation in model performance. In this paper, we focus on designing a decentralized learning algorithm that is less susceptible to variations in data distribution across devices. We propose Global Update Tracking (GUT), a novel tracking-based method that aims to mitigate the impact of heterogeneous data in decentralized learning without introducing any communication overhead. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, and ImageNette), model architectures, and network topologies. Our experiments show that the proposed method achieves state-of-the-art performance for decentralized learning on heterogeneous data via a 1-6% improvement in test accuracy compared to other existing techniques.

Kaushik Roy

Kaushik Roy

Purdue University

Vehicle battery current sensing system

A method of sensing a current in a conductor includes controlling a digital to analog converter output to cancel residual offset voltage in a magnetic tunnel junction device prior to sensing the current with the magnetic tunnel junction device. The method includes switching input to the magnetic tunnel junction device between a fixed voltage and an output of a digital to analog converter while switching input to a band pass filter between a lower and an upper voltage output of the magnetic tunnel junction device. The output of the digital to analog converter is modified to provide a low-amplitude unsaturated sine-wave at an output of the band pass filter, at which point changes in the output of the band pass filter are associated with the amount of current in a sensed conductor.

Utkarsh Saxena

Utkarsh Saxena

Purdue University

arXiv preprint arXiv:2403.13577

HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads

Analog Compute-in-Memory (CiM) accelerators are increasingly recognized for their efficiency in accelerating Deep Neural Networks (DNN). However, their dependence on Analog-to-Digital Converters (ADCs) for accumulating partial sums from crossbars leads to substantial power and area overhead. Moreover, the high area overhead of ADCs constrains the throughput due to the limited number of ADCs that can be integrated per crossbar. An approach to mitigate this issue involves the adoption of extreme low-precision quantization (binary or ternary) for partial sums. Training based on such an approach eliminates the need for ADCs. While this strategy effectively reduces ADC costs, it introduces the challenge of managing numerous floating-point scale factors, which are trainable parameters like DNN weights. These scale factors must be multiplied with the binary or ternary outputs at the columns of the crossbar to ensure system accuracy. To that effect, we propose an algorithm-hardware co-design approach, where DNNs are first trained with quantization-aware training. Subsequently, we introduce HCiM, an ADC-Less Hybrid Analog-Digital CiM accelerator. HCiM uses analog CiM crossbars for performing Matrix-Vector Multiplication operations coupled with a digital CiM array dedicated to processing scale factors. This digital CiM array can execute both addition and subtraction operations within the memory array, thus enhancing processing speed. Additionally, it exploits the inherent sparsity in ternary quantization to achieve further energy savings. Compared to an analog CiM baseline architecture using 7 and 4-bit ADC, HCiM achieves …

Kaushik Roy

Kaushik Roy

Purdue University

Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data

The current state-of-the-art decentralized learning algorithms mostly assume the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the distributed datasets can have significantly heterogeneous data distributions across the agents. In this work, we present a novel approach for decentralized learning on heterogeneous data, where data-free knowledge distillation through contrastive loss on cross-features is utilized to improve performance. Cross-features for a pair of neighboring agents are the features (ie, last hidden layer activations) obtained from the data of an agent with respect to the model parameters of the other agent. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, ImageNette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2401.17515

Towards Image Semantics and Syntax Sequence Learning

Convolutional neural networks and vision transformers have achieved outstanding performance in machine perception, particularly for image classification. Although these image classifiers excel at predicting image-level class labels, they may not discriminate missing or shifted parts within an object. As a result, they may fail to detect corrupted images that involve missing or disarrayed semantic information in the object composition. On the contrary, human perception easily distinguishes such corruptions. To mitigate this gap, we introduce the concept of "image grammar", consisting of "image semantics" and "image syntax", to denote the semantics of parts or patches of an image and the order in which these parts are arranged to create a meaningful object. To learn the image grammar relative to a class of visual objects/scenes, we propose a weakly supervised two-stage approach. In the first stage, we use a deep clustering framework that relies on iterative clustering and feature refinement to produce part-semantic segmentation. In the second stage, we incorporate a recurrent bi-LSTM module to process a sequence of semantic segmentation patches to capture the image syntax. Our framework is trained to reason over patch semantics and detect faulty syntax. We benchmark the performance of several grammar learning models in detecting patch corruptions. Finally, we verify the capabilities of our framework in Celeb and SUNRGBD datasets and demonstrate that it can achieve a grammar validation accuracy of 70 to 90% in a wide variety of semantic and syntactical corruption scenarios.

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2403.03292

Averaging Rate Scheduler for Decentralized Learning on Heterogeneous Data

State-of-the-art decentralized learning algorithms typically require the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the data distribution across the agents can have significant heterogeneity. In this work, we propose averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning. Our experiments illustrate the superiority of the proposed method (~3% improvement in test accuracy) compared to the conventional approach of employing a constant averaging rate.

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2404.00185

On Inherent Adversarial Robustness of Active Vision Systems

Current Deep Neural Networks are vulnerable to adversarial examples, which alter their predictions by adding carefully crafted noise. Since human eyes are robust to such inputs, it is possible that the vulnerability stems from the standard way of processing inputs in one shot by processing every pixel with the same importance. In contrast, neuroscience suggests that the human vision system can differentiate salient features by (1) switching between multiple fixation points (saccades) and (2) processing the surrounding with a non-uniform external resolution (foveation). In this work, we advocate that the integration of such active vision mechanisms into current deep learning systems can offer robustness benefits. Specifically, we empirically demonstrate the inherent robustness of two active vision methods - GFNet and FALcon - under a black box threat model. By learning and inferencing based on downsampled glimpses obtained from multiple distinct fixation points within an input, we show that these active methods achieve (2-3) times greater robustness compared to a standard passive convolutional network under state-of-the-art adversarial attacks. More importantly, we provide illustrative and interpretable visualization analysis that demonstrates how performing inference from distinct fixation points makes active vision methods less vulnerable to malicious inputs.

Kaushik Roy

Kaushik Roy

Purdue University

Scientific Reports

Ferroelectric capacitors and field-effect transistors as in-memory computing elements for machine learning workloads

This study discusses the feasibility of Ferroelectric Capacitors (FeCaps) and Ferroelectric Field-Effect Transistors (FeFETs) as In-Memory Computing (IMC) elements to accelerate machine learning (ML) workloads. We conducted an exploration of device fabrication and proposed system-algorithm co-design to boost performance. A novel FeCap device, incorporating an interfacial layer (IL) and (HZO), ensures a reduction in operating voltage and enhances HZO scaling while being compatible with CMOS circuits. The IL also enriches ferroelectricity and retention properties. When integrated into crossbar arrays, FeCaps and FeFETs demonstrate their effectiveness as IMC components, eliminating sneak paths and enabling selector-less operation, leading to notable improvements in energy efficiency and area utilization. However, it is worth noting that limited capacitance ratios in FeCaps introduced errors in …

Kaushik Roy

Kaushik Roy

Purdue University

arXiv preprint arXiv:2401.17497

Towards Visual Syntactical Understanding

Syntax is usually studied in the realm of linguistics and refers to the arrangement of words in a sentence. Similarly, an image can be considered as a visual 'sentence', with the semantic parts of the image acting as 'words'. While visual syntactic understanding occurs naturally to humans, it is interesting to explore whether deep neural networks (DNNs) are equipped with such reasoning. To that end, we alter the syntax of natural images (e.g. swapping the eye and nose of a face), referred to as 'incorrect' images, to investigate the sensitivity of DNNs to such syntactic anomaly. Through our experiments, we discover an intriguing property of DNNs where we observe that state-of-the-art convolutional neural networks, as well as vision transformers, fail to discriminate between syntactically correct and incorrect images when trained on only correct ones. To counter this issue and enable visual syntactic understanding with DNNs, we propose a three-stage framework- (i) the 'words' (or the sub-features) in the image are detected, (ii) the detected words are sequentially masked and reconstructed using an autoencoder, (iii) the original and reconstructed parts are compared at each location to determine syntactic correctness. The reconstruction module is trained with BERT-like masked autoencoding for images, with the motivation to leverage language model inspired training to better capture the syntax. Note, our proposed approach is unsupervised in the sense that the incorrect images are only used during testing and the correct versus incorrect labels are never used for training. We perform experiments on CelebA, and AFHQ datasets and obtain …

Kaushik Roy

Kaushik Roy

Purdue University

Nature Machine Intelligence

A collective AI via lifelong learning and sharing at the edge

One vision of a future artificial intelligence (AI) is where many separate units can learn independently over a lifetime and share their knowledge with each other. The synergy between lifelong learning and sharing has the potential to create a society of AI systems, as each individual unit can contribute to and benefit from the collective knowledge. Essential to this vision are the abilities to learn multiple skills incrementally during a lifetime, to exchange knowledge among units via a common language, to use both local data and communication to learn, and to rely on edge devices to host the necessary decentralized computation and data. The result is a network of agents that can quickly respond to and learn new tasks, that collectively hold more knowledge than a single agent and that can extend current knowledge in more diverse ways than a single agent. Open research questions include when and what knowledge …

Kaushik Roy

Kaushik Roy

Purdue University

IEEE Access

TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data

Advances in Federated Learning and an abundance of user data have enabled rich collaborative learning between multiple clients, without sharing user data. This is done via a central server that aggregates learning in the form of weight updates. However, this comes at the cost of repeated expensive communication between the clients and the server, and concerns about compromised user privacy. The inversion of gradients into the data that generated them is termed data leakage. Encryption techniques can be used to counter this leakage but at added expense. To address these challenges of communication efficiency and privacy, we propose TOFU, a novel algorithm that generates proxy data that encodes the weight updates for each client in its gradients. Instead of weight updates, this proxy data is now shared. Since input data is far lower in dimensional complexity than weights, this encoding allows us to send …