Winston Hsu

Winston Hsu

National Taiwan University

H-index: 46

Asia-Taiwan

About Winston Hsu

Winston Hsu, With an exceptional h-index of 46 and a recent h-index of 30 (since 2020), a distinguished researcher at National Taiwan University, specializes in the field of large-scale image/video retrieval/mining, visual recognition, and machine intelligence.

His recent articles reflect a diverse array of research interests and contributions to the field:

AED: Adaptable Error Detection for Few-shot Imitation Policy

TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling

Tracking-Assisted Object Detection with Event Cameras

Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

Fair Robust Active Learning by Joint Inconsistency

Self-Training with High-Dimensional Markers for Cell Instance Segmentation

Coarse-to-fine point cloud registration with se (3)-equivariant representations

CTCam: Enhancing Transportation Evaluation through Fusion of Cellular Traffic and Camera-Based Vehicle Flows

Winston Hsu Information

University

National Taiwan University

Position

Professor Dept. of Computer Science and Information Eng.

Citations(all)

10195

Citations(since 2020)

5212

Cited By

6868

hIndex(all)

46

hIndex(since 2020)

30

i10Index(all)

113

i10Index(since 2020)

69

Email

University Profile Page

National Taiwan University

Winston Hsu Skills & Research Interests

large-scale image/video retrieval/mining

visual recognition

and machine intelligence

Top articles of Winston Hsu

AED: Adaptable Error Detection for Few-shot Imitation Policy

Authors

Jia-Fong Yeh,Kuo-Han Hung,Pang-Chi Lo,Chi-Ming Chung,Tsung-Han Wu,Hung-Ting Su,Yi-Ting Chen,Winston H Hsu

Journal

arXiv preprint arXiv:2402.03860

Published Date

2024/2/6

We study how to report few-shot imitation (FSI) policies' behavior errors in novel environments, a novel task named adaptable error detection (AED). The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsistent with the intent of demonstrations. We develop a cross-domain benchmark for the challenging AED task, consisting of 329 base and 158 novel environments. This task introduces three challenges, including (1) detecting behavior errors in novel environments, (2) behavior errors occurring without revealing notable changes, and (3) lacking complete temporal information of the rollout due to the necessity of online detection. To address these challenges, we propose Pattern Observer (PrObe) to parse discernible patterns in the policy feature representations of normal or error states, whose effectiveness is verified in the proposed benchmark. Through our comprehensive evaluation, PrObe consistently surpasses strong baselines and demonstrates a robust capability to identify errors arising from a wide range of FSI policies. Moreover, we conduct comprehensive ablations and experiments (error correction, demonstration quality, etc.) to validate the practicality of our proposed task and methodology.

TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling

Authors

ChungYi Lin,Shen-Lung Tung,Hung-Ting Su,Winston H Hsu

Journal

arXiv preprint arXiv:2401.03138

Published Date

2024/1/6

To address the limitations of traffic prediction from location-bound detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data source that leverages the extensive coverage of cellular traffic to capture mobility patterns. Our extensive analysis validates its potential for transportation. Focusing on vehicle-related GCT flow prediction, we propose a graph neural network that integrates multivariate, temporal, and spatial facets for improved accuracy. Experiments reveal our model's superiority over baselines, especially in long-term predictions. We also highlight the potential for GCT flow integration into transportation systems.

Tracking-Assisted Object Detection with Event Cameras

Authors

Ting-Kang Yen,Igor Morawski,Shusil Dangi,Kai He,Chung-Yi Lin,Jia-Fong Yeh,Hung-Ting Su,Winston Hsu

Journal

arXiv preprint arXiv:2403.18330

Published Date

2024/3/27

Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various memory mechanisms to preserve as many features as possible at the current time, guided by temporal clues. While these implicit-learned memories retain some short-term information, they still struggle to preserve long-term features effectively. In this paper, we consider those invisible objects as pseudo-occluded objects and aim to reveal their features. Firstly, we introduce visibility attribute of objects and contribute an auto-labeling algorithm to append additional visibility labels on an existing event camera dataset. Secondly, we exploit tracking strategies for pseudo-occluded objects to maintain their permanence and retain their bounding boxes, even when features have not been available for a very long time. These strategies can be treated as an explicit-learned memory guided by the tracking objective to record the displacements of objects across frames. Lastly, we propose a spatio-temporal feature aggregation module to enrich the latent features and a consistency loss to increase the robustness of the overall pipeline. We conduct comprehensive experiments to verify our method's effectiveness where still objects are retained but real occluded objects are discarded. The results demonstrate that (1) the additional visibility labels can assist in supervised training, and (2 …

Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

Authors

ChungYi Lin,Shen-Lung Tung,Hung-Ting Su,Winston H Hsu

Journal

arXiv preprint arXiv:2403.12991

Published Date

2024/3/5

Vehicle flow, a crucial indicator for transportation, is often limited by detector coverage. With the advent of extensive mobile network coverage, we can leverage mobile user activities, or cellular traffic, on roadways as a proxy for vehicle flow. However, as counts of cellular traffic may not directly align with vehicle flow due to data from various user types, we present a new task: predicting vehicle flow in camera-free areas using cellular traffic. To uncover correlations within multi-source data, we deployed cameras on selected roadways to establish the Tel2Veh dataset, consisting of extensive cellular traffic and sparse vehicle flows. Addressing this challenge, we propose a framework that independently extracts features and integrates them with a graph neural network (GNN)-based fusion to discern disparities, thereby enabling the prediction of unseen vehicle flows using cellular traffic. This work advances the use of telecom data in transportation and pioneers the fusion of telecom and vision-based data, offering solutions for traffic management.

Fair Robust Active Learning by Joint Inconsistency

Authors

Tsung-Han Wu,Hung-Ting Su,Shang-Tse Chen,Winston H Hsu

Published Date

2023

We introduce a new learning framework, Fair Robust Active Learning (FRAL), generalizing conventional active learning to fair and adversarial robust scenarios. This framework enables us to achieve fair-performance and fair-robustness with limited labeled data, which is essential for various annotation-expensive visual applications with safety-critical needs. However, existing fairness-aware data selection strategies face two challenges when applied to the FRAL framework: they are either ineffective under severe data imbalance or inefficient due to huge computations of adversarial training. To address these issues, we develop a novel Joint INconsistency (JIN) method that exploits prediction inconsistencies between benign and adversarial inputs and between standard and robust models. By leveraging these two types of easy-to-compute inconsistencies simultaneously, JIN can identify valuable samples that contribute more to fairness gains and class imbalance mitigation in both standard and adversarial robust settings. Extensive experiments on diverse datasets and sensitive groups demonstrate that our approach outperforms existing active data selection baselines, achieving fair-performance and fair-robustness under white-box PGD attacks.

Self-Training with High-Dimensional Markers for Cell Instance Segmentation

Authors

Kuang-Cheng Lo,Cheng-Wei Lin,Hsin-Ying Lee,Hao Hsu,Winston H Hsu,Tung-Hung Su,Shih-Yu Chen,Yung-Ming Jeng

Published Date

2023/4/18

Cellular segmentation is a fundamental prerequisite to many biological analyses. With the development of multiplexed imaging technologies, the need for accurately segmenting individual cells has significantly increased in recent years. However, current deep learning methods cannot deal with staining markers in an arbitrary order or different numbers. Moreover, acquiring pixel-level annotation is incredibly time-consuming in high-dimensional images. To tackle these issues, we incorporate pathology knowledge into our model and present a novel self-training framework. Concretely, we apply a serial attention mechanism and pooling operation to compress the multi-channel image during the training process. Afterward, the nuclei information guides the self-training in the pseudo-label stage. Experiments demonstrate our method is superior to the existing methods in both qualitative and quantitative results.

Coarse-to-fine point cloud registration with se (3)-equivariant representations

Authors

Cheng-Wei Lin,Tung-I Chen,Hsin-Ying Lee,Wen-Chin Chen,Winston H Hsu

Published Date

2023/5/29

Point cloud registration is a crucial problem in computer vision and robotics. Existing methods either rely on matching local geometric features, which are sensitive to the pose differences, or leverage global shapes, which leads to inconsistency when facing distribution variances such as partial overlapping. Combining the advantages of both types of methods, we adopt a coarse-to-fine pipeline that concurrently handles both issues. We first reduce the pose differences between input point clouds by aligning global features; then we match the local features to further refine the inaccurate alignments resulting from distribution variances. As global feature alignment requires the features to preserve the poses of input point clouds and local feature matching expects the features to be invariant to these poses, we propose an SE(3)-equivariant feature extractor to simultaneously generate two types of features. In this feature …

CTCam: Enhancing Transportation Evaluation through Fusion of Cellular Traffic and Camera-Based Vehicle Flows

Authors

ChungYi Lin,Shen-Lung Tung,Hung-Ting Su,Winston H Hsu

Published Date

2023/10/21

Traffic prediction utility often faces infrastructural limitations, which restrict its coverage. To overcome this challenge, we present Geographical Cellular Traffic (GCT) flow that leverages cellular network data as a new source for transportation evaluation. The broad coverage of cellular networks allows GCT flow to capture various mobile user activities across regions, aiding city authorities in resource management through precise predictions. Acknowledging the complexity arising from the diversity of mobile users in GCT flow, we supplement it with camera-based vehicle flow data from limited deployments and verify their spatio-temporal attributes and correlations through extensive data analysis. Our two-stage fusion approach integrates these multi-source data, addressing their coverage and magnitude discrepancies, thereby enhancing the prediction of GCT flow for accurate transportation evaluation. Overall, we …

Crossdtr: Cross-view and depth-guided transformers for 3d object detection

Authors

Ching-Yu Tseng,Yi-Rong Chen,Hsin-Ying Lee,Tsung-Han Wu,Wen-Chin Chen,Winston H Hsu

Published Date

2023/5/29

To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra depth …

MuRAL: Multi-Scale Region-based Active Learning for Object Detection

Authors

Yi-Syuan Liou,Tsung-Han Wu,Jia-Fong Yeh,Wen-Chin Chen,Winston H Hsu

Journal

arXiv preprint arXiv:2303.16637

Published Date

2023/3/29

Obtaining large-scale labeled object detection dataset can be costly and time-consuming, as it involves annotating images with bounding boxes and class labels. Thus, some specialized active learning methods have been proposed to reduce the cost by selecting either coarse-grained samples or fine-grained instances from unlabeled data for labeling. However, the former approaches suffer from redundant labeling, while the latter methods generally lead to training instability and sampling bias. To address these challenges, we propose a novel approach called Multi-scale Region-based Active Learning (MuRAL) for object detection. MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects and improve training performance. The informative region score is designed to consider both the predicted confidence of instances and the distribution of each object category, enabling our method to focus more on difficult-to-detect classes. Moreover, MuRAL employs a scale-aware selection strategy that ensures diverse regions are selected from different scales for labeling and downstream finetuning, which enhances training stability. Our proposed method surpasses all existing coarse-grained and fine-grained baselines on Cityscapes and MS COCO datasets, and demonstrates significant improvement in difficult category performance.

Multi-Task Reinforcement Learning with Shared-Unique Features and Task-Aware Prioritized Experience Replay

Authors

Po-Shao Lin,Jia-Fong Yeh,Yi-Ting Chen,Winston H Hsu

Published Date

2023/10/13

Multi-task reinforcement learning (MTRL) has emerged as a challenging problem to reduce the computational cost of reinforcement learning and leverage shared features among tasks to improve the performance of individual tasks. However, a key challenge lies in determining which features should be shared across tasks and how to preserve the unique features that differentiate each task. This challenge often leads to the problem of task performance imbalance, where certain tasks may dominate the learning process while others are neglected. In this paper, we propose a novel approach called shared-unique features along with task-aware prioritized experience replay to improve training stability and leverage shared and unique features effectively. We incorporate a simple yet effective task-specific embeddings to preserve the unique features of each task to mitigate the potential problem of task performance imbalance. Additionally, we introduce task-aware settings to the prioritized experience replay (PER) algorithm to accommodate multi-task training and enhancing training stability. Our approach achieves state-of-the-art average success rates on the Meta-World benchmark, while maintaining stable performance across all tasks, avoiding task performance imbalance issues. The results demonstrate the effectiveness of our method in addressing the challenges of MTRL.

Pay Attention to Multi-Channel for Improving Graph Neural Networks

Authors

ChungYi Lin,Shen-Lung Tung,Winston H Hsu

Published Date

2023/3/1

We propose Multi-channel Graph Attention (MGAT) to efficiently handle channel-specific representations encoded by convolutional kernels, enhancing the incorporation of attention with graph convolutional network (GCN)-based architectures. Our experiments demonstrate the effectiveness of integrating our proposed MGAT with various spatial-temporal GCN models for improving prediction performance.

Orbeez-slam: A real-time monocular visual slam with orb features and nerf-realized mapping

Authors

Chi-Ming Chung,Yang-Che Tseng,Ya-Ching Hsu,Xiang-Qian Shi,Yun-Hung Hua,Jia-Fong Yeh,Wen-Chin Chen,Yi-Ting Chen,Winston H Hsu

Published Date

2023/5/29

A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their components. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with implicit neural representation and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. Results show that our SLAM is up to 800x faster than the strong baseline with superior rendering outcomes. Code link: https://github.com/MarvinChung/Orbeez-SLAM.

Minisuperb: Lightweight benchmark for self-supervised speech models

Authors

Yu-Hsiang Wang,Huang-Yu Chen,Kai-Wei Chang,Winston Hsu,Hung-yi Lee

Published Date

2023/12/16

SUPERB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models across various tasks. However, it incurs high computational costs due to the large datasets and diverse tasks. In this paper, we introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly. We carefully select representative tasks, sample datasets, and extract model representations offline. Our approach achieves a Spearman’s rank correlation of 0.954 and 0.982 with SUPERB Paper and SUPERB Challenge, respectively. Additionally, we reduce the computational cost by 97 % in terms of Multiply-ACcumulate operations (MACs). Furthermore, we evaluate SSL speech models in few-shot scenarios and observe significant variations in their performance. To our knowledge, this is the first study to …

WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection

Authors

Tsung-Lin Tsou,Tsung-Han Wu,Winston H Hsu

Journal

arXiv preprint arXiv:2310.03821

Published Date

2023/10/5

In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA). Yet, without any target annotations, the performance gap between the UDA approaches and the fully-supervised approach is still noticeable, which is impractical for real-world applications. On the other hand, weakly-supervised domain adaptation (WDA) is an underexplored yet practical task that only requires few labeling effort on the target domain. To improve the DA performance in a cost-effective way, we propose a general weak labels guided self-training framework, WLST, designed for WDA on 3D object detection. By incorporating autolabeler, which can generate 3D pseudo labels from 2D bounding boxes, into the existing self-training pipeline, our method is able to generate more robust and consistent pseudo labels that would benefit the training process on the target domain. Extensive experiments demonstrate the effectiveness, robustness, and detector-agnosticism of our WLST framework. Notably, it outperforms previous state-of-the-art methods on all evaluation tasks.

Geographical Cellular Traffic Prediction with Multivariate Spatio-Temporal Modeling

Authors

ChungYi Lin,Shen-Lung Tung,Winston H Hsu

Published Date

2023

This paper presents a novel approach for evaluating road traffic usage using multi-type Geographical Cellular Traffic (GCT). Working with a major telecom company, we propose a new prediction task for transportation traffic using GCT data. To accurately tackle this task, we propose a model that effectively integrates multivariate relation exploration and spatio-temporal modeling across multiple regions. Furthermore, we develop a new core as the foundation of each modeling component, efficiently improving the incorporation of attention mechanisms in the CNN-based architecture. Extensive experiments demonstrate the superior performance of our model in successfully handling the prediction task and reveal the influence of various GCT combinations. It is worth noting that our proposed data and model can pave a new path for intelligent transportation systems and urban planning.

Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth

Authors

Yi-Rong Chen,Ching-Yu Tseng,Yi-Syuan Liou,Tsung-Han Wu,Winston H Hsu

Published Date

2023/12/2

Monocular 3D object detection has seen significant advancements with the incorporation of depth information. However, there remains a considerable performance gap compared to LiDAR-based methods, largely due to inaccurate depth estimation. We argue that this issue stems from the commonly used pixel-wise depth map loss, which inherently creates the imbalance of loss weighting between near and distant objects. To address these challenges, we propose MonoHBD (Monocular Hierarchical Balanced Depth), a comprehensive solution with the hierarchical mechanism. We introduce the Hierarchical Depth Map (HDM) structure that incorporates depth bins and depth offsets to enhance the localization accuracy for objects. Leveraging RoIAlign, our Balanced Depth Extractor (BDE) module captures both scene-level depth relationships and object-specific depth characteristics while considering the geometry properties through the inclusion of camera calibration parameters. Furthermore, we propose a novel depth map loss that regularizes object-level depth features to mitigate imbalanced loss propagation. Our model reaches state-of-the-art results on the KITTI 3D object detection benchmark while supporting real-time detection. Excessive ablation studies are also conducted to prove the efficacy of our proposed modules.

Cfvs: Coarse-to-fine visual servoing for 6-dof object-agnostic peg-in-hole assembly

Authors

Bo-Siang Lu,Tung-I Chen,Hsin-Ying Lee,Winston H Hsu

Published Date

2023/5/29

Robotic peg-in-hole assembly remains a challenging task due to its high accuracy demand. Previous work tends to simplify the problem by restricting the degree of freedom of the end-effector, or limiting the distance between the target and the initial pose position, which prevents them from being deployed in real-world manufacturing. Thus, we present a Coarse-to-Fine Visual Servoing (CFVS) peg-in-hole method, achieving 6-DoF end-effector motion control based on 3D visual feedback. CFVS can handle arbitrary tilt angles and large initial alignment errors through a fast pose estimation before refinement. Furthermore, by introducing a confidence map to ignore the irrelevant contour of objects, CFVS is robust against noise and can deal with various targets beyond training data. Extensive experiments show CFVS outperforms state-of-the-art methods and obtains 100%, 91%, and 82% average success rates in 3 …

Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

Authors

Chien Cheng Chyou,Hung-Ting Su,Winston H Hsu

Journal

arXiv preprint arXiv:2308.03243

Published Date

2023/8/7

Adversarial robustness poses a critical challenge in the deployment of deep learning models for real-world applications. Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data, which is often impractical. Existing unsupervised adversarial detection methods identify whether the target model works properly, but they suffer from bad accuracies owing to the use of common cross-entropy training loss, which relies on unnecessary features and strengthens adversarial attacks. We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks. The detection rate (true positive rate) against all given white-box attacks is above 93.9% except for attacks without limits (DF()), while the false positive rate is barely 2.5%. The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Authors

Hung-Ting Su,Yulei Niu,Xudong Lin,Winston H Hsu,Shih-Fu Chang

Published Date

2023

Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video. Existing question synthesis methods pre-trained question generation (QG) systems on reading comprehension datasets with text descriptions as inputs. However, QG models only learn to ask association questions (eg," what is someone doing...") and result in inferior performance due to the poor transfer of association knowledge to CVidQA, which focuses on causal questions like" why is someone doing...". Observing this, we proposed to exploit causal knowledge to generate question-answer pairs, and proposed a novel framework, Causal Knowledge Extraction from Language Models (CaKE-LM), leveraging causal commonsense knowledge from language models to tackle CVidQA. To extract knowledge from LMs, CaKE-LM generates causal questions containing two events with one triggering another (eg," score a goal" triggers" soccer player kicking ball") by prompting LM with the action (soccer player kicking ball) to retrieve the intention (to score a goal). CaKE-LM significantly outperforms conventional methods by 4% to 6% of zero-shot CVidQA accuracy on NExT-QA and Causal-VidQA datasets. We also conduct comprehensive analyses and provide key findings for future research.

See List of Professors in Winston Hsu University(National Taiwan University)

Winston Hsu FAQs

What is Winston Hsu's h-index at National Taiwan University?

The h-index of Winston Hsu has been 30 since 2020 and 46 in total.

What are Winston Hsu's top articles?

The articles with the titles of

AED: Adaptable Error Detection for Few-shot Imitation Policy

TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling

Tracking-Assisted Object Detection with Event Cameras

Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

Fair Robust Active Learning by Joint Inconsistency

Self-Training with High-Dimensional Markers for Cell Instance Segmentation

Coarse-to-fine point cloud registration with se (3)-equivariant representations

CTCam: Enhancing Transportation Evaluation through Fusion of Cellular Traffic and Camera-Based Vehicle Flows

...

are the top articles of Winston Hsu at National Taiwan University.

What are Winston Hsu's research interests?

The research interests of Winston Hsu are: large-scale image/video retrieval/mining, visual recognition, and machine intelligence

What is Winston Hsu's total number of citations?

Winston Hsu has 10,195 citations in total.

What are the co-authors of Winston Hsu?

The co-authors of Winston Hsu are Shih-Fu Chang, Lexing Xie, Wen-Huang Cheng, Min-Chun Hu, Hung-Ting Su.

    Co-Authors

    H-index: 134
    Shih-Fu Chang

    Shih-Fu Chang

    Columbia University in the City of New York

    H-index: 45
    Lexing Xie

    Lexing Xie

    Australian National University

    H-index: 35
    Wen-Huang Cheng

    Wen-Huang Cheng

    National Chiao Tung University

    H-index: 21
    Min-Chun Hu

    Min-Chun Hu

    National Tsing Hua University

    H-index: 10
    Hung-Ting Su

    Hung-Ting Su

    National Taiwan University

    academic-engine

    Useful Links