Lexing Xie

Australian National University

H-index: 45

Oceania-Australia

About Lexing Xie

Lexing Xie, With an exceptional h-index of 45 and a recent h-index of 32 (since 2020), a distinguished researcher at Australian National University, specializes in the field of machine learning, social media, web, multimedia.

His recent articles reflect a diverse array of research interests and contributions to the field:

Measuring Moral Dimensions in Social Media with Mformer

The Shapes of the Fourth Estate During the Pandemic: Profiling COVID-19 News Consumption in Eight Countries

Stability and Efficiency of Personalised Cultural Markets

Determinantal Point Process Likelihoods for Sequential Recommendation

Method and System for Visualizing Data Differentiation

Smallset Timelines: A Visual Representation of Data Preprocessing Decisions

Roslingifier: Semi-Automated Storytelling for Animated Scatterplots

A longitudinal study of topic classification on Twitter

Lexing Xie Information

University	Australian National University
Position	___
Citations(all)	10640
Citations(since 2020)	5282
Cited By	7331
hIndex(all)	45
hIndex(since 2020)	32
i10Index(all)	112
i10Index(since 2020)	65
Email	Access Email
University Profile Page	Australian National University

Lexing Xie Skills & Research Interests

machine learning

social media

web

multimedia

Top articles of Lexing Xie

Measuring Moral Dimensions in Social Media with Mformer

Authors

Tuan Dung Nguyen,Ziyu Chen,Nicholas George Carroll,Alasdair Tran,Colin Klein,Lexing Xie

Journal

arXiv preprint arXiv:2311.10219

Published Date

2023/11/16

The ever-growing textual records of contemporary social issues, often discussed online with moral rhetoric, present both an opportunity and a challenge for studying how moral concerns are debated in real life. Moral foundations theory is a taxonomy of intuitions widely used in data-driven analyses of online content, but current computational tools to detect moral foundations suffer from the incompleteness and fragility of their lexicons and from poor generalization across data domains. In this paper, we fine-tune a large language model to measure moral foundations in text based on datasets covering news media and long- and short-form online discussions. The resulting model, called Mformer, outperforms existing approaches on the same domains by 4--12% in AUC and further generalizes well to four commonly used moral text datasets, improving by up to 17% in AUC. We present case studies using Mformer to analyze everyday moral dilemmas on Reddit and controversies on Twitter, showing that moral foundations can meaningfully describe people's stance on social issues and such variations are topic-dependent. Pre-trained model and datasets are released publicly. We posit that Mformer will help the research community quantify moral dimensions for a range of tasks and data domains, and eventually contribute to the understanding of moral situations faced by humans and machines.

The Shapes of the Fourth Estate During the Pandemic: Profiling COVID-19 News Consumption in Eight Countries

Authors

Cai Yang,Lexing Xie,Siqi Wu

Journal

Proceedings of the ACM on Human-Computer Interaction

Published Date

2023/10/4

News media is often referred to as the Fourth Estate, a recognition of its political power. New understandings of how media shape political beliefs and influence collective behaviors are urgently needed in an era when public opinion polls do not necessarily reflect election results and users influence each other in real-time under algorithm-mediated content personalization. In this work, we measure not only the average but also the distribution of audience political leanings for different media across different countries. The methodological components of these new measurements include a high-fidelity COVID-19 tweet dataset; high-precision user geolocation extraction; and user political leaning estimated from the within-country retweet networks involving local politicians. We focus on geolocated users from eight countries, profile user leaning distribution for each country, and analyze bridging users who have …

Stability and Efficiency of Personalised Cultural Markets

Authors

Haiqing Zhu,Yun Kuen Cheung,Lexing Xie

Journal

arXiv preprint arXiv:2302.06226

Published Date

2023/2/13

This work is concerned with the dynamics of online cultural markets, namely, attention allocation of many users on a set of digital goods with infinite supply. Such dynamics are important in shaping processes and outcomes in society, from trending items in entertainment, collective knowledge creation, to election outcomes. The outcomes of online cultural markets are susceptible to intricate social influence dynamics, particularly so when the community comprises consumers with heterogeneous interests. This has made formal analysis of these markets improbable. In this paper, we remedy this by establishing robust connections between influence dynamics and optimization processes, in trial-offer markets where the consumer preferences are modelled by multinomial logit. Among other results, we show that the proportional-response-esque influence dynamic is equivalent to stochastic mirror descent on a convex …

Determinantal Point Process Likelihoods for Sequential Recommendation

Authors

Yuli Liu,Christian Walder,Lexing Xie

Published Date

2022/7/6

Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions. In the training process of recommender systems, the loss function plays an essential role in guiding the optimization of recommendation models to generate accurate suggestions for users. However, most existing sequential recommendation tech- niques focus on designing algorithms or neural network architectures, and few efforts have been made to tailor loss functions that fit naturally into the practical application scenario of sequential recommender systems. Ranking-based losses, such as cross-entropy and Bayesian Personalized Ranking (BPR) are widely used in the sequential recommendation area. We argue that such objective functions suffer from two inherent drawbacks: i) the …

Method and System for Visualizing Data Differentiation

Published Date

2022/3/3

A computer-implemented method and data display system for identifying and visualizing differences between a first set comprising one or more text items and a second set com prising one or more text items is disclosed. The method includes extracting a collection of named entities from the first and second sets of one or more text items, generating a composite graph structure from the collection of named entities, the composite graph structure configured to display differences between the first and second set of text items and then displaying spatially the composite graph structure.

Smallset Timelines: A Visual Representation of Data Preprocessing Decisions

Authors

Lydia R Lucchesi,Petra M Kuhnert,Jenny L Davis,Lexing Xie

Published Date

2022/6/21

Data preprocessing is a crucial stage in the data analysis pipeline, with both technical and social aspects to consider. Yet, the attention it receives is often lacking in research practice and dissemination. We present the Smallset Timeline, a visualisation to help reflect on and communicate data preprocessing decisions. A “Smallset” is a small selection of rows from the original dataset containing instances of dataset alterations. The Timeline is comprised of Smallset snapshots representing different points in the preprocessing stage and captions to describe the alterations visualised at each point. Edits, additions, and deletions to the dataset are highlighted with colour. We develop the R software package, smallsets, that can create Smallset Timelines from R and Python data preprocessing scripts. Constructing the figure asks practitioners to reflect on and revise decisions as necessary, while sharing it aims to make the …

Roslingifier: Semi-Automated Storytelling for Animated Scatterplots

Authors

Minjeong Shin,Joohee Kim,Yunha Han,Lexing Xie,Mitchell Whitelaw,Bum Chul Kwon,Sungahn Ko,Niklas Elmqvist

Journal

IEEE Transactions on Visualization and Computer Graphics

Published Date

2022/1/27

We present Roslingifier, a data-driven storytelling method for animated scatterplots. Like its namesake, Hans Rosling (1948–2017), a professor of public health and a spellbinding public speaker, Roslingifier turns a sequence of entities changing over time—such as countries and continents with their demographic data—into an engaging narrative elling the story of the data. This data-driven storytelling method with an in-person presenter is a new genre of storytelling technique and has never been studied before. In this article, we aim to define a design space for this new genre—data presentation—and provide a semi-automated authoring tool for helping presenters create quality presentations. From an in-depth analysis of video clips of presentations using interactive visualizations, we derive three specific techniques to achieve this: natural language narratives, visual effects that highlight events, and temporal …

A longitudinal study of topic classification on Twitter

Authors

Mohamed Reda Bouadjenek,Scott Sanner,Zahra Iman,Lexing Xie,Daniel Xiaoliang Shi

Journal

PeerJ Computer Science

Published Date

2022

Twitter represents a massively distributed information source over topics ranging from social and political events to entertainment and sports news. While recent work has suggested this content can be narrowed down to the personalized interests of individual users by training topic filters using standard classifiers, there remain many open questions about the efficacy of such classification-based filtering approaches. For example, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? In addition, how robust is a topic classifier over the time horizon, eg., can a model trained in 1 year be used for making predictions in the subsequent year? Furthermore, what features, feature classes, and feature attributes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 diverse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study of topic classifier performance provide a number of important insights, among them that:(i) such classifiers can indeed generalize to novel topical content with high precision over a year or more after training though performance degrades with time,(ii) the classes of hashtags and simple terms contain the most informative feature instances,(iii) removing tweets containing training hashtags from the validation set allows better generalization, and (iv) the simple volume of tweets by a user correlates more with …

Mapping Topics in 100,000 Real-Life Moral Dilemmas

Authors

Tuan Dung Nguyen,Georgiana Lyall,Alasdair Tran,Minjeong Shin,Nicholas George Carroll,Colin Klein,Lexing Xie

Journal

Proceedings of the International AAAI Conference on Web and Social Media

Published Date

2022/5/31

Moral dilemmas play an important role in theorizing both about ethical norms and moral psychology. Yet thought experiments borrowed from the philosophical literature often lack the nuances and complexity of real life. We leverage 100,000 threads—the largest collection to date—from Reddit’sr/AmItheAsshole to examine the features of everyday moral dilemmas. Combining topic modeling with evaluation from both expert and crowd-sourced workers, we discover 47 fine-grained, meaningful topics and group them into five meta-categories. We show that most dilemmas combine at least two topics, such as family and money. We also observe that the pattern of topic co-occurrence carries interesting information about the structure of everyday moral concerns: for example, the generation of moral dilemmas from nominally neutral topics, and interaction effects in which final verdicts do not line up with the moral concerns in the original stories in any simple way. Our analysis demonstrates the utility of a fine-grained data-driven approach to online moral dilemmas, and provides a valuable resource for researchers aiming to explore the intersection of practical and theoretical ethics.

Fair Wrapping for Black-box Predictions

Authors

Alexander Soen,Ibrahim M Alabdulmohsin,Sanmi Koyejo,Yishay Mansour,Nyalleng Moorosi,Richard Nock,Ke Sun,Lexing Xie

Journal

Advances in Neural Information Processing Systems

Published Date

2022/12/6

We introduce a new family of techniques to post-process (``wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an -tree, which modifies the prediction. We provide two generic boosting algorithms to learn -trees. We show that our modification has appealing properties in terms of composition of -trees, generalization, interpretability, and KL divergence between modified and original predictions. We exemplify the use of our technique in three fairness notions: conditional value-at-risk, equality of opportunity, and statistical parity; and provide experiments on several readily available datasets.

Whose Advantage? Measuring Attention Dynamics across YouTube and Twitter on Controversial Topics

Authors

JooYoung Lee,Siqi Wu,Ali Mert Ertugrul,Yu-Ru Lin,Lexing Xie

Journal

Proceedings of the International AAAI Conference on Web and Social Media

Published Date

2022/5/31

The ideological asymmetries have been recently observed in contested online spaces, where conservative voices seem to be relatively more pronounced even though liberals are known to have the population advantage on digital platforms. Most prior research, however, focused on either one single platform or one single political topic. Whether an ideological group garners more attention across platforms and/or topics, and how the attention dynamics evolve over time, have not been explored. In this work, we present a quantitative study that links collective attention across two social platforms--YouTube and Twitter, centered on online activities surrounding popular videos of three controversial political topics including Abortion, Gun control, and Black Lives Matter over 16 months. We propose several sets of video-centric metrics to characterize how online attention is accumulated for different ideological groups. We find that neither side is on a winning streak: left-leaning videos are overall more viewed, more engaging, but less tweeted than right-leaning videos. The attention time series unfold quicker for left-leaning videos, but span a longer time for right-leaning videos. Network analysis on the early adopters and tweet cascades show that the information diffusion for left-leaning videos tends to involve centralized actors; while that for right-leaning videos starts earlier in the attention lifecycle. In sum, our findings go beyond the static picture of ideological asymmetries in digital spaces and provide a set of methods to quantify attention dynamics across different social platforms.

Interval-censored Hawkes processes

Authors

Marian-Andrei Rizoiu,Alexander Soen,Shidi Li,Pio Calderon,Leanne J Dong,Aditya Krishna Menon,Lexing Xie

Journal

Journal of Machine Learning Research

Published Date

2022

Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point process log-likelihood) cannot be computed without exact event times. Furthermore, it does not have the independent increments property to use the Poisson likelihood. This work builds a novel point process, a set of tools, and approximations for fitting Hawkes processes within interval-censored data scenarios. First, we define the Mean Behavior Poisson process (MBPP), a novel Poisson process with a direct parameter correspondence to the popular self-exciting Hawkes process. We fit MBPP in the interval-censored setting using an interval-censored Poisson log-likelihood (IC-LL). We use the parameter equivalence to uncover the parameters of the associated Hawkes process. Second, we introduce two novel exogenous functions to distinguish the exogenous from the endogenous events. We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes. Third, we provide several approximation methods to estimate the intensity and compensator function of MBPP when no analytical solution exists. Fourth and finally, we connect the interval-censored loss of MBPP to a broader …

AttentionFlow: Visualising Influence in Networks of Time Series

Authors

Minjeong Shin,Alasdair Tran,Siqi Wu,Alexander Mathews,Rong Wang,Georgiana Lyall,Lexing Xie

Published Date

2021/3/8

The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence, and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and …

Factorized Fourier Neural Operators

Authors

Alasdair Tran,Alexander Mathews,Lexing Xie,Cheng Soon Ong

Journal

arXiv preprint arXiv:2111.13802

Published Date

2021/11/27

We propose the Factorized Fourier Neural Operator (F-FNO), a learning-based approach for simulating partial differential equations (PDEs). Starting from a recently proposed Fourier representation of flow fields, the F-FNO bridges the performance gap between pure machine learning approaches to that of the best numerical or hybrid solvers. This is achieved with new representations - separable spectral layers and improved residual connections - and a combination of training strategies such as the Markov assumption, Gaussian noise, and cosine learning rate decay. On several challenging benchmark PDEs on regular grids, structured meshes, and point clouds, the F-FNO can scale to deeper networks and outperform both the FNO and the geo-FNO, reducing the error by 83% on the Navier-Stokes problem, 31% on the elasticity problem, 57% on the airfoil flow problem, and 60% on the plastic forging problem. Compared to the state-of-the-art pseudo-spectral method, the F-FNO can take a step size that is an order of magnitude larger in time and achieve an order of magnitude speedup to produce the same solution quality.

UNIPoint: Universally Approximating Point Processes Intensities

Authors

Alexander Soen,Alexander Mathews,Daniel Grixti-Cheng,Lexing Xie

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2021/5/18

Point processes are a useful mathematical tool for describing events over time, and so there are many recent approaches for representing and learning them. One notable open question is how to precisely describe the flexibility of point process models and whether there exists a general model that can represent all point processes. Our work bridges this gap. Focusing on the widely used event intensity function representation of point processes, we provide a proof that a class of learnable functions can universally approximate any valid intensity function. The proof connects the well known Stone-Weierstrass Theorem for function approximation, the uniform density of non-negative continuous functions using a transfer functions, the formulation of the parameters of a piece-wise continuous functions as a dynamic system, and a recurrent neural network implementation for capturing the dynamics. Using these insights, we design and implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event. Evaluations on synthetic and real world datasets show that this simpler representation performs better than Hawkes process variants and more complex neural network-based approaches. We expect this result will provide a practical basis for selecting and tuning models, as well as furthering theoretical work on representational complexity and learnability.

Radflow: A recurrent, aggregated, and decomposable model for networks of time series

Authors

Alasdair Tran,Alexander Mathews,Cheng Soon Ong,Lexing Xie

Published Date

2021/4/19

We propose a new model for networks of time series that influence each other. Graph structures among time series are found in diverse domains, such as web traffic influenced by hyperlinks, product sales influenced by recommendation, or urban transport volume influenced by road networks and weather. There has been recent progress in graph modeling and in time series forecasting, respectively, but an expressive and scalable approach for a network of series does not yet exist. We introduce Radflow, a novel model that embodies three key ideas: a recurrent neural network to obtain node embeddings that depend on time, the aggregation of the flow of influence from neighboring nodes with multi-head attention, and the multi-layer decomposition of time series. Radflow naturally takes into account dynamic networks where nodes and edges change over time, and it can be used for prediction and data imputation …

Quantile Propagation for Wasserstein-Approximate Gaussian Processes

Authors

Rui Zhang,Christian Walder,Edwin V Bonilla,Marian-Andrei Rizoiu,Lexing Xie

Journal

Advances in Neural Information Processing Systems

Published Date

2020

Approximate inference techniques are the cornerstone of probabilistic methods based on Gaussian process priors. Despite this, most work approximately optimizes standard divergence measures such as the Kullback-Leibler (KL) divergence, which lack the basic desiderata for the task at hand, while chiefly offering merely technical convenience. We develop a new approximate inference method for Gaussian process models which overcomes the technical challenges arising from abandoning these convenient divergences. Our method---dubbed Quantile Propagation (QP)---is similar to expectation propagation (EP) but minimizes the Wasserstein distance (WD) instead of the KL divergence. The WD exhibits all the required properties of a distance metric, while respecting the geometry of the underlying sample space. We show that QP matches quantile functions rather than moments as in EP and has the same mean update but a smaller variance update than EP, thereby alleviating EP's tendency to over-estimate posterior variances. Crucially, despite the significant complexity of dealing with the WD, QP has the same favorable locality property as EP, and thereby admits an efficient algorithm. Experiments on classification and Poisson regression show that QP outperforms both EP and variational Bayes.

Variation across Scales: Measurement Fidelity under Twitter Data Sampling

Authors

Siqi Wu,Marian-Andrei Rizoiu,Lexing Xie

Journal

Proceedings of the International AAAI Conference on Web and Social Media

Published Date

2020/5/26

A comprehensive understanding of data quality is the cornerstone of measurement studies in social media research. This paper presents in-depth measurements on the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing complete tweet streams, we show that Twitter rate limit message is an accurate indicator for the volume of missing tweets. Sampling also differs significantly across timescales. While the hourly sampling rate is influenced by the diurnal rhythm in different time zones, the millisecond level sampling is heavily affected by the implementation choices. For Twitter entities such as users, we find the Bernoulli process with a uniform rate approximates the empirical distributions well. It also allows us to estimate the true ranking with the observed sample data. For networks on Twitter, their structures are altered significantly and some components are more likely to be preserved. For retweet cascades, we observe changes in distributions of tweet inter-arrival time and user influence, which will affect models that rely on these features. This work calls attention to noises and potential biases in social data, and provides a few tools to measure Twitter sampling effects.

ASNets: Deep Learning for Generalised Planning

Authors

Sam Toyer,Sylvie Thiébaux,Felipe Trevizan,Lexing Xie

Journal

Journal of Artificial Intelligence Research

Published Date

2020/5/4

In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P) PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.

Exploiting Uncertainty in Popularity Prediction of Information Diffusion Cascades Using Self-exciting Point Processes

Authors

Quyu Kong,Marian-Andrei Rizoiu,Lexing Xie

Journal

arXiv preprint arXiv:2001.11132

Published Date

2020/1/29

Hawkes processes have been successfully applied to understand online information diffusion and popularity of online items. Most prior work concentrate on individually modeling successful diffusion cascades, while discarding smaller cascades which, however, account for a majority proportion of the available data. In this work, we propose a set of tools to leverage information in the small cascades: a joint fitting procedure that accounts for cascade size bias in the sample, a Borel mixture model and a clustering algorithm to uncover latent groups within these cascades, and the posterior final size distribution of Hawkes processes. On a dataset of Twitter cascades, we show that, compared to the state-of-art models, the proposed method improves the generalization performance on unseen data, delivers better prediction for final popularity and provides means to characterize online content from the way Twitter users discuss about it.

See List of Professors in Lexing Xie University(Australian National University)

Lexing Xie FAQs

What is Lexing Xie's h-index at Australian National University?

The h-index of Lexing Xie has been 32 since 2020 and 45 in total.

What are Lexing Xie's top articles?

The articles with the titles of

Measuring Moral Dimensions in Social Media with Mformer

The Shapes of the Fourth Estate During the Pandemic: Profiling COVID-19 News Consumption in Eight Countries

Stability and Efficiency of Personalised Cultural Markets

Determinantal Point Process Likelihoods for Sequential Recommendation

Method and System for Visualizing Data Differentiation

Smallset Timelines: A Visual Representation of Data Preprocessing Decisions

Roslingifier: Semi-Automated Storytelling for Animated Scatterplots

A longitudinal study of topic classification on Twitter

...

are the top articles of Lexing Xie at Australian National University.

What are Lexing Xie's research interests?

The research interests of Lexing Xie are: machine learning, social media, web, multimedia

What is Lexing Xie's total number of citations?

Lexing Xie has 10,640 citations in total.

Useful Links