Jon Kleinberg

Jon Kleinberg

Cornell University

H-index: 122

North America-United States

About Jon Kleinberg

Jon Kleinberg, With an exceptional h-index of 122 and a recent h-index of 76 (since 2020), a distinguished researcher at Cornell University, specializes in the field of algorithms, data mining, information networks, social networks, Web mining.

Jon Kleinberg Information

University

Cornell University

Position

Professor of Computer Science

Citations(all)

123425

Citations(since 2020)

40183

Cited By

99840

hIndex(all)

122

hIndex(since 2020)

76

i10Index(all)

283

i10Index(since 2020)

214

Email

University Profile Page

Cornell University

Jon Kleinberg Skills & Research Interests

algorithms

data mining

information networks

social networks

Web mining

Top articles of Jon Kleinberg

From Graphs to Hypergraphs: Hypergraph Projection and its Remediation

Authors

Yanbang Wang,Jon Kleinberg

Published Date

2024

We study the implications of the modeling choice to use a graph, instead of a hypergraph, to represent real-world interconnected systems whose constituent relationships are of higher order by nature. Such a modeling choice typically involves an underlying projection process that maps the original hypergraph onto a graph, and is common in graph-based analysis. While hypergraph projection can potentially lead to loss of higher-order relations, there exists very limited studies on the consequences of doing so, as well as its remediation. This work fills this gap by doing two things: (1) we develop analysis based on graph and set theory, showing two ubiquitous patterns of hyperedges that are root to structural information loss in all hypergraph projections; we also quantify the combinatorial impossibility of recovering the lost higher-order structures if no extra help is provided; (2) we still seek to recover the lost higher-order structures in hypergraph projection, and in light of (1)'s findings we propose to relax the problem into a learning-based setting. Under this setting, we develop a learning-based hypergraph reconstruction method based on an important statistic of hyperedge distributions that we find. Our reconstruction method is evaluated on 8 real-world datasets under different settings, and exhibits consistently good performance. We also demonstrate benefits of the reconstructed hypergraphs via use cases of protein rankings and link predictions.

Modeling reputation-based behavioral biases in school choice

Authors

Jon Kleinberg,Sigal Oren,Emily Ryu,Éva Tardos

Journal

arXiv preprint arXiv:2403.04616

Published Date

2024/3/7

A fundamental component in the theoretical school choice literature is the problem a student faces in deciding which schools to apply to. Recent models have considered a set of schools of different selectiveness and a student who is unsure of their strength and can apply to at most schools. Such models assume that the student cares solely about maximizing the quality of the school that they attend, but experience suggests that students' decisions are also influenced by a set of behavioral biases based on reputational effects: a subjective reputational benefit when admitted to a selective school, whether or not they attend; and a subjective loss based on disappointment when rejected. Guided by these observations, and inspired by recent behavioral economics work on loss aversion relative to expectations, we propose a behavioral model by which a student chooses schools to balance these behavioral effects with the quality of the school they attend. Our main results show that a student's choices change in dramatic ways when these reputation-based behavioral biases are taken into account. In particular, where a rational applicant spreads their applications evenly, a biased student applies very sparsely to highly selective schools, such that above a certain threshold they apply to only an absolute constant number of schools even as their budget of applications grows to infinity. Consequently, a biased student underperforms a rational student even when the rational student is restricted to a sufficiently large upper bound on applications and the biased student can apply to arbitrarily many. Our analysis shows that the reputation-based model is rich …

Hypergraph patterns and collaboration structure

Authors

Jonas L Juul,Austin R Benson,Jon Kleinberg

Journal

arXiv preprint arXiv:2210.02163

Published Date

2022/10/5

Humans collaborate in different contexts such as in creative or scientific projects, in workplaces and in sports. Depending on the project and external circumstances, a newly formed collaboration may include people that have collaborated before in the past, and people with no collaboration history. Such existing relationships between team members have been reported to influence the performance of teams. However, it is not clear how existing relationships between team members should be quantified, and whether some relationships are more likely to occur in new collaborations than others. Here we introduce a new family of structural patterns, m-patterns, which formalize relationships between collaborators and we study the prevalence of such structures in data and a simple random-hypergraph null model. We analyze the frequency with which different collaboration structures appear in our null model and show how such frequencies depend on size and hyperedge density in the hypergraphs. Comparing the null model to data of human and non-human collaborations, we find that some collaboration structures are vastly under- and overrepresented in empirical datasets. Finally, we find that structures of scientific collaborations on COVID-19 papers in some cases are statistically significantly different from those of non-COVID-19 papers. Examining citation counts for 4 different scientific fields, we also find indications that repeat collaborations are more successful for 2-author scientific publications and less successful for 3-author scientific publications as compared to other collaboration structures.

Replicating Electoral Success

Authors

Kiran Tomlinson,Tanvi Namjoshi,Johan Ugander,Jon Kleinberg

Journal

arXiv preprint arXiv:2402.17109

Published Date

2024/2/27

A core tension in the study of plurality elections is the clash between the classic Hotelling-Downs model, which predicts that two office-seeking candidates should position themselves at the median voter's policy, and the empirical observation that real-world democracies often have two major parties with divergent policies. Motivated by this tension and drawing from bounded rationality, we introduce a dynamic model of candidate positioning based on a simple behavioral heuristic: candidates imitate the policy of previous winners. The resulting model is closely connected to evolutionary replicator dynamics and exhibits complex behavior, despite its simplicity. For uniformly-distributed voters, we prove that when there are , , or candidates per election, any symmetric candidate distribution converges over time to a concentration of candidates at the center. With , however, we prove that the candidate distribution does not converge to the center. For initial distributions without any extreme candidates, we prove a stronger statement than non-convergence, showing that the density in an interval around the center goes to zero when . As a matter of robustness, our conclusions are qualitatively unchanged if a small fraction of candidates are not winner-copiers and are instead positioned uniformly at random. Beyond our theoretical analysis, we illustrate our results in simulation; for five or more candidates, we find a tendency towards the emergence of two clusters, a mechanism suggestive of Duverger's Law, the empirical finding that plurality leads to two-party systems. Our simulations also explore several variations of the model, including non …

Language Generation in the Limit

Authors

Jon Kleinberg,Sendhil Mullainathan

Journal

arXiv preprint arXiv:2404.06757

Published Date

2024/4/10

Although current large language models are complex, the most basic specifications of the underlying language generation problem itself are simple to state: given a finite set of training samples from an unknown language, produce valid new strings from the language that don't already appear in the training data. Here we ask what we can conclude about language generation using only this specification, without further assumptions. In particular, suppose that an adversary enumerates the strings of an unknown target language L that is known only to come from one of a possibly infinite list of candidates. A computational agent is trying to learn to generate from this language; we say that the agent generates from L in the limit if after some finite point in the enumeration of L, the agent is able to produce new elements that come exclusively from L and that have not yet been presented by the adversary. Our main result is that there is an agent that is able to generate in the limit for every countable list of candidate languages. This contrasts dramatically with negative results due to Gold and Angluin in a well-studied model of language learning where the goal is to identify an unknown language from samples; the difference between these results suggests that identifying a language is a fundamentally different problem than generating from it.

Equilibria, Efficiency, and Inequality in Network Formation for Hiring and Opportunity

Authors

Cynthia Dwork,Chris Hays,Jon Kleinberg,Manish Raghavan

Journal

arXiv preprint arXiv:2402.13841

Published Date

2024/2/21

Professional networks -- the social networks among people in a given line of work -- can serve as a conduit for job prospects and other opportunities. Here we propose a model for the formation of such networks and the transfer of opportunities within them. In our theoretical model, individuals strategically connect with others to maximize the probability that they receive opportunities from them. We explore how professional networks balance connectivity, where connections facilitate opportunity transfers to those who did not get them from outside sources, and congestion, where some individuals receive too many opportunities from their connections and waste some of them. We show that strategic individuals are over-connected at equilibrium relative to a social optimum, leading to a price of anarchy for which we derive nearly tight asymptotic bounds. We also show that, at equilibrium, individuals form connections to those who provide similar benefit to them as they provide to others. Thus, our model provides a microfoundation in professional networking contexts for the fundamental sociological principle of homophily, that "similarity breeds connection," which in our setting is realized as a form of status homophily based on alignment in individual benefit. We further explore how, even if individuals are a priori equally likely to receive opportunities from outside sources, equilibria can be unequal, and we provide nearly tight bounds on how unequal they can be. Finally, we explore the ability for online platforms to intervene to improve social welfare and show that natural heuristics may result in adverse effects at equilibrium. Our simple model allows for a …

The Moderating Effect of Instant Runoff Voting

Authors

Kiran Tomlinson,Johan Ugander,Jon Kleinberg

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2024/3/24

Instant runoff voting (IRV) has recently gained popularity as an alternative to plurality voting for political elections, with advocates claiming a range of advantages, including that it produces more moderate winners than plurality and could thus help address polarization. However, there is little theoretical backing for this claim, with existing evidence focused on case studies and simulations. In this work, we prove that IRV has a moderating effect relative to plurality voting in a precise sense, developed in a 1-dimensional Euclidean model of voter preferences. We develop a theory of exclusion zones, derived from properties of the voter distribution, which serve to show how moderate and extreme candidates interact during IRV vote tabulation. The theory allows us to prove that if voters are symmetrically distributed and not too concentrated at the extremes, IRV cannot elect an extreme candidate over a moderate. In contrast, we show plurality can and validate our results computationally. Our methods provide new frameworks for the analysis of voting systems, deriving exact winner distributions geometrically and establishing a connection between plurality voting and stick-breaking processes.

Microstructures and Accuracy of Graph Recall by Large Language Models

Authors

Yanbang Wang,Hejie Cui,Jon Kleinberg

Journal

arXiv preprint arXiv:2402.11821

Published Date

2024/2/19

Graphs data is crucial for many applications, and much of it exists in the relations described in textual format. As a result, being able to accurately recall and encode a graph described in earlier text is a basic yet pivotal ability that LLMs need to demonstrate if they are to perform reasoning tasks that involve graph-structured information. Human performance at graph recall by has been studied by cognitive scientists for decades, and has been found to often exhibit certain structural patterns of bias that align with human handling of social relationships. To date, however, we know little about how LLMs behave in analogous graph recall tasks: do their recalled graphs also exhibit certain biased patterns, and if so, how do they compare with humans and affect other graph reasoning tasks? In this work, we perform the first systematical study of graph recall by LLMs, investigating the accuracy and biased microstructures (local structural patterns) in their recall. We find that LLMs not only underperform often in graph recall, but also tend to favor more triangles and alternating 2-paths. Moreover, we find that more advanced LLMs have a striking dependence on the domain that a real-world graph comes from -- by yielding the best recall accuracy when the graph is narrated in a language style consistent with its original domain.

Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification

Authors

A Feder Cooper,Katherine Lee,Madiha Zahrah Choksi,Solon Barocas,Christopher De Sa,James Grimmelmann,Jon Kleinberg,Siddhartha Sen,Baobao Zhang

Published Date

2024

Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Authors

Yanbang Wang,Jon Kleinberg

Published Date

2023

In an online social network, link recommendations are a way for users to discover relevant links to people they may know, thereby potentially increasing their engagement on the platform. However, the addition of links to a social network can also have an effect on the level of conflict in the network---expressed in terms of polarization and disagreement. To date, however, we have very little understanding of how these two implications of link formation relate to each other: are the goals of high relevance and conflict reduction aligned, or are the links that users are most likely to accept fundamentally different from the ones with the greatest potential for reducing conflict? Here we provide the first analysis of this question, using the recently popular Friedkin-Johnsen model of opinion dynamics. We first present a surprising result on how link additions shift the level of opinion conflict, followed by explanation work that relates the amount of shift to structural features of the added links. We then characterize the gap in conflict reduction between the set of links achieving the largest reduction and the set of links achieving the highest relevance. The gap is measured on real-world data, based on instantiations of relevance defined by 13 link recommendation algorithms. We find that some, but not all, of the more accurate algorithms actually lead to better reduction of conflict. Our work suggests that social links recommended for increasing user engagement may not be as conflict-provoking as people might have thought.

On the actionability of outcome prediction

Authors

Lydia T Liu,Solon Barocas,Jon Kleinberg,Karen Levy

Published Date

2024

Predicting future outcomes is a prevalent application of machine learning in social impact domains. Examples range from predicting student success in education to predicting disease risk in healthcare. Practitioners often recognize that the ultimate goal is not just to predict but to act effectively, and increasing empirical evidence suggests that relying on outcome predictions for downstream interventions may not lead to desired results. In most domains there exists a multitude of possible interventions for each individual, making the challenge of taking effective action more acute. Even when causal mechanism connecting the individual’s latent states to outcomes is well understood, in any given instance (a specific student, or patient), practitioners still need to infer—from budgeted measurements of latent states—which of many possible interventions will be most effective for this individual. With this in mind, we ask: when are accurate predictors of outcomes helpful for identifying the most suitable intervention? Through a simple model encompassing actions, latent states, and measurements, we demonstrate that pure outcome prediction rarely results in the most effective policy for taking actions, even when combined with other measurements. We find that except in cases where there is a single decisive action for improving the outcome, outcome prediction never maximizes “action value”, the utility of taking actions. Making measurements of actionable latent states, where specific actions lead to desired outcomes, considerably enhances the action value compared to outcome prediction, and the degree of improvement depends on action costs and the …

Node-based generalized friendship paradox fails

Authors

Anna Evtushenko,Jon Kleinberg

Journal

Scientific reports

Published Date

2023/2/6

The Friendship Paradox—the principle that “your friends have more friends than you do”—is a combinatorial fact about degrees in a graph; but given that many web-based social activities are correlated with a user’s degree, this fact has been taken more broadly to suggest the empirical principle that “your friends are also more active than you are.” This Generalized Friendship Paradox, the notion that any attribute positively correlated with degree obeys the Friendship Paradox, has been established mathematically in a network-level version that essentially aggregates uniformly over all the edges of a network. Here we show, however, that the natural node-based version of the Generalized Friendship Paradox—which aggregates over nodes, not edges—may fail, even for degree-attribute correlations approaching 1. Whether this version holds depends not only on degree-attribute correlations, but also on the …

Dynamic Interventions for Networked Contagions

Authors

Marios Papachristou,Siddhartha Banerjee,Jon Kleinberg

Published Date

2023/4/30

We study the problem of designing dynamic intervention policies for minimizing cascading failures in online financial networks, as well we more general demand-supply networks. Formally, we consider a dynamic version of the celebrated Eisenberg-Noe model of financial network liabilities, and use this to study the design of external intervention policies. Our controller has a fixed resource budget in each round, and can use this to minimize the effect of demand/supply shocks in the network. We formulate the optimal intervention problem as a Markov Decision Process, and show how we can leverage the problem structure to efficiently compute optimal intervention policies with continuous interventions, and give approximation algorithms in the case of discrete interventions. Going beyond financial networks, we argue that our model captures dynamic network intervention in a much broader class of dynamic demand …

Foundations of data science

Authors

Avrim Blum,John Hopcroft,Ravindran Kannan

Published Date

2020/1/23

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.

Strategic Evaluation

Authors

Joseph Sarkis,RP Sundarraj

Published Date

2001

Enterprise information technologies (EITs), which are strategic systems seeking to integrate the processes and databases of the entire organization and beyond, require a significant investment of money and human resources in return for the promise of a global business model and its associated far-reaching benefits. Their evaluation/justification must be completed with organizational goals and requirements included in the decision, or the organization could lose financially and competitively. Besides traditional financial models, eg, ROI (return on investment), that are primarily meant for short-term financial justification purposes, there is a paucity of methods for the evaluation of the strategic and intangible costs and benefits that EITs afford organizations as a whole. This article introduces the use of a robust quantitative technique called the analytical hierarchy process (AHP) that can integrate a diverse range of …

Arbitrariness and Prediction: The Confounding Role of Variance in Fair Classification

Authors

A Feder Cooper,Katherine Lee,Madiha Choksi,Solon Barocas,Christopher De Sa,James Grimmelmann,Jon Kleinberg,Siddhartha Sen,Baobao Zhang

Journal

Proceedings of the 38th AAAI Conference on Artificial Intelligence

Published Date

2023/1/31

Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.

Moderation in instant runoff voting

Authors

Kiran Tomlinson,Johan Ugander,Jon Kleinberg

Journal

arXiv preprint arXiv:2303.09734

Published Date

2023/3/17

Instant runoff voting (IRV) has gained popularity in recent years as an alternative to traditional plurality voting. Advocates of IRV claim that one of its benefits relative to plurality voting is its tendency toward moderation: that it produces more moderate winners than plurality and could therefore be a useful tool for addressing polarization. However, there is little theoretical backing for this claim, and existing evidence has focused on simulations and case studies. In this work, we prove that IRV has a moderating effect relative to traditional plurality voting in a specific sense, developed in a 1-dimensional Euclidean model of voter preferences. Our results show that as long as voters are symmetrically distributed and not too concentrated at the extremes, IRV will not elect a candidate that is beyond a certain threshold in the tails of the distribution, while plurality can. For the uniform distribution, we provide an approach for deriving the exact distributions of the plurality and IRV winner positions, enabling further analysis. We also extend a classical analysis of so-called stick-breaking processes to derive the asymptotic winning plurality vote share, which we use to prove that plurality can elect arbitrarily extreme candidates even when there are many moderate options.

Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models

Authors

Benjamin Laufer,Jon Kleinberg,Hoda Heidari

Journal

arXiv preprint arXiv:2308.04399

Published Date

2023/8/8

Major advances in Machine Learning (ML) and Artificial Intelligence (AI) increasingly take the form of developing and releasing general-purpose models. These models are designed to be adapted by other businesses and agencies to perform a particular, domain-specific function. This process has become known as adaptation or fine-tuning. This paper offers a model of the fine-tuning process where a Generalist brings the technological product (here an ML model) to a certain level of performance, and one or more Domain-specialist(s) adapts it for use in a particular domain. Both entities are profit-seeking and incur costs when they invest in the technology, and they must reach a bargaining agreement on how to share the revenue for the technology to reach the market. For a relatively general class of cost and revenue functions, we characterize the conditions under which the fine-tuning game yields a profit-sharing solution. We observe that any potential domain-specialization will either contribute, free-ride, or abstain in their uptake of the technology, and we provide conditions yielding these different strategies. We show how methods based on bargaining solutions and sub-game perfect equilibria provide insights into the strategic behavior of firms in these types of interactions, and we find that profit-sharing can still arise even when one firm has significantly higher costs than another. We also provide methods for identifying Pareto-optimal bargaining arrangements for a general set of utility functions.

Informational Diversity and Affinity Bias in Team Growth Dynamics

Authors

Hoda Heidari,Solon Barocas,Jon Kleinberg,Karen Levy

Published Date

2023

Prior work has provided strong evidence that, within organizational settings, teams that bring a diversity of information and perspectives to a task are more effective than teams that do not. If this form of informational diversity confers performance advantages, why do we often see largely homogeneous teams in practice? One canonical argument is that the benefits of informational diversity are in tension with affinity bias. To better understand the impact of this tension on the makeup of teams, we analyze a sequential model of team formation in which individuals care about their team’s performance (captured in terms of accurately predicting some future outcome based on a set of features) but experience a cost as a result of interacting with teammates who use different approaches to the prediction task. Our analysis of this simple model reveals a set of subtle behaviors that team-growth dynamics can exhibit: (i) from …

Combinatorial characterizations and impossibilities for higher-order homophily

Authors

Nate Veldt,Austin R Benson,Jon Kleinberg

Journal

Science Advances

Published Date

2023/1/6

Homophily is the seemingly ubiquitous tendency for people to connect and interact with other individuals who are similar to them. This is a well-documented principle and is fundamental for how society organizes. Although many social interactions occur in groups, homophily has traditionally been measured using a graph model, which only accounts for pairwise interactions involving two individuals. Here, we develop a framework using hypergraphs to quantify homophily from group interactions. This reveals natural patterns of group homophily that appear with gender in scientific collaboration and political affiliation in legislative bill cosponsorship and also reveals distinctive gender distributions in group photographs, all of which cannot be fully captured by pairwise measures. At the same time, we show that seemingly natural ways to define group homophily are combinatorially impossible. This reveals important …

Optimizing the order of actions in a model of contact tracing

Authors

Michela Meister,Jon Kleinberg

Journal

PNAS nexus

Published Date

2023/3

Contact tracing is a key tool for managing epidemic diseases like HIV, tuberculosis, COVID-19, and monkeypox. Manual investigations by human-contact tracers remain a dominant way in which this is carried out. This process is limited by the number of contact tracers available, who are often overburdened during an outbreak or epidemic. As a result, a crucial decision in any contact tracing strategy is, given a set of contacts, which person should a tracer trace next? In this work, we develop a formal model that articulates these questions and provides a framework for comparing contact tracing strategies. Through analyzing our model, we give provably optimal prioritization policies via a clean connection to a tool from operations research called a “branching bandit”. Examining these policies gives qualitative insight into trade-offs in contact tracing applications.

Use large language models to promote equity

Authors

Emma Pierson,Divya Shanmugam,Rajiv Movva,Jon Kleinberg,Monica Agrawal,Mark Dredze,Kadija Ferryman,Judy Wawira Gichoya,Dan Jurafsky,Pang Wei Koh,Karen Levy,Sendhil Mullainathan,Ziad Obermeyer,Harini Suresh,Keyon Vafa

Journal

arXiv preprint arXiv:2312.14804

Published Date

2023/12/22

Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biases have been well-documented. But equally vital, and much less discussed, is the more opportunity-focused counterpoint: "what promising applications do LLMs enable that could promote equity?" If LLMs are to enable a more equitable world, it is not enough just to play defense against their biases and failure modes. We must also go on offense, applying them positively to equity-enhancing use cases to increase opportunities for underserved groups and reduce societal discrimination. There are many choices which determine the impact of AI, and a fundamental choice very early in the pipeline is the problems we choose to apply it to. If we focus only later in the pipeline -- making LLMs marginally more fair as they facilitate use cases which intrinsically entrench power -- we will miss an important opportunity to guide them to equitable impacts. Here, we highlight the emerging potential of LLMs to promote equity by presenting four newly possible, promising research directions, while keeping risks and cautionary points in clear view.

Reconciling the accuracy-diversity trade-off in recommendations

Authors

Kenny Peng,Manish Raghavan,Emma Pierson,Jon Kleinberg,Nikhil Garg

Journal

arXiv preprint arXiv:2307.15142

Published Date

2023/7/27

In recommendation settings, there is an apparent trade-off between the goals of accuracy (to recommend items a user is most likely to want) and diversity (to recommend items representing a range of categories). As such, real-world recommender systems often explicitly incorporate diversity separately from accuracy. This approach, however, leaves a basic question unanswered: Why is there a trade-off in the first place? We show how the trade-off can be explained via a user's consumption constraints -- users typically only consume a few of the items they are recommended. In a stylized model we introduce, objectives that account for this constraint induce diverse recommendations, while objectives that do not account for this constraint induce homogeneous recommendations. This suggests that accuracy and diversity appear misaligned because standard accuracy metrics do not consider consumption constraints. Our model yields precise and interpretable characterizations of diversity in different settings, giving practical insights into the design of diverse recommendations.

Content Moderation and the Formation of Online Communities: A Theoretical Framework

Authors

Cynthia Dwork,Chris Hays,Jon Kleinberg,Manish Raghavan

Journal

arXiv preprint arXiv:2310.10573

Published Date

2023/10/16

We study the impact of content moderation policies in online communities. In our theoretical model, a platform chooses a content moderation policy and individuals choose whether or not to participate in the community according to the fraction of user content that aligns with their preferences. The effects of content moderation, at first blush, might seem obvious: it restricts speech on a platform. However, when user participation decisions are taken into account, its effects can be more subtle $\unicode{x2013}$ and counter-intuitive. For example, our model can straightforwardly demonstrate how moderation policies may increase participation and diversify content available on the platform. In our analysis, we explore a rich set of interconnected phenomena related to content moderation in online communities. We first characterize the effectiveness of a natural class of moderation policies for creating and sustaining stable communities. Building on this, we explore how resource-limited or ideological platforms might set policies, how communities are affected by differing levels of personalization, and competition between platforms. Our model provides a vocabulary and mathematically tractable framework for analyzing platform decisions about content moderation.

Augmented Sparsifiers for Generalized Hypergraph Cuts

Authors

Nate Veldt,Austin R Benson,Jon Kleinberg

Journal

Journal of Machine Learning Research

Published Date

2023

Hypergraph generalizations of many graph cut problems and algorithms have recently been introduced to better model data and systems characterized by multiway relationships. Recent work in machine learning and theoretical computer science uses a generalized cut function for a hypergraph that associates each hyperedge with a splitting function , which assigns a penalty to each way of separating the nodes of . When each satisfies ${\bf w} _e (S)= g (\lvert S\rvert) $ for some concave function , previous work has shown how to reduce the generalized hypergraph cut problem to a directed graph cut problem, although the resulting graph may be very dense. We introduce a framework for sparsifying hypergraph-to-graph reductions, where the hypergraph cut function is -approximated by a cut on a directed graph. For we need at most edges to reduce any hyperedge , while only edges are needed to approximate the clique expansion, a widely used heuristic in hypergraph clustering. Our framework leads to improved results for solving cut problems in co-occurrence graphs, decomposable submodular function minimization problems, and localized hypergraph clustering problems.

Private Blotto: Viewpoint Competition with Polarized Agents

Authors

Kate Donahue,Jon Kleinberg

Journal

arXiv preprint arXiv:2302.14123

Published Date

2023/2/27

Colonel Blotto games are one of the oldest settings in game theory, originally proposed over a century ago in Borel 1921. However, they were originally designed to model two centrally-controlled armies competing over zero-sum "fronts", a specific scenario with limited modern-day application. In this work, we propose and study Private Blotto games, a variant connected to crowdsourcing and social media. One key difference in Private Blotto is that individual agents act independently, without being coordinated by a central "Colonel". This model naturally arises from scenarios such as activist groups competing over multiple issues, partisan fund-raisers competing over elections in multiple states, or politically-biased social media users labeling news articles as misinformation. In this work, we completely characterize the Nash Stability of the Private Blotto game. Specifically, we show that the outcome function has a critical impact on the outcome of the game: we study whether a front is won by majority rule (median outcome) or a smoother outcome taking into account all agents (mean outcome). We study how this impacts the amount of "misallocated effort", or agents whose choices doesn't influence the final outcome. In general, mean outcome ensures that, if a stable arrangement exists, agents are close to evenly spaced across fronts, minimizing misallocated effort. However, mean outcome functions also have chaotic patterns as to when stable arrangements do and do not exist. For median outcome, we exactly characterize when a stable arrangement exists, but show that this outcome function frequently results in extremely unbalanced allocation of …

Ballot length in instant runoff voting

Authors

Kiran Tomlinson,Johan Ugander,Jon Kleinberg

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2023/6/26

Instant runoff voting (IRV) is an increasingly-popular alternative to traditional plurality voting in which voters submit rankings over the candidates rather than single votes. In practice, elections using IRV often restrict the ballot length, the number of candidates a voter is allowed to rank on their ballot. We theoretically and empirically analyze how ballot length can influence the outcome of an election, given fixed voter preferences. We show that there exist preference profiles over k candidates such that up to k-1 different candidates win at different ballot lengths. We derive exact lower bounds on the number of voters required for such profiles and provide a construction matching the lower bound for unrestricted voter preferences. Additionally, we characterize which sequences of winners are possible over ballot lengths and provide explicit profile constructions achieving any feasible winner sequence. We also examine how classic preference restrictions influence our results—for instance, single-peakedness makes k-1 different winners impossible but still allows at least Ω (√ k). Finally, we analyze a collection of 168 real-world elections, where we truncate rankings to simulate shorter ballots. We find that shorter ballots could have changed the outcome in one quarter of these elections. Our results highlight ballot length as a consequential degree of freedom in the design of IRV elections.

The challenge of understanding what users want: Inconsistent preferences and engagement optimization

Authors

Jon Kleinberg,Sendhil Mullainathan,Manish Raghavan

Journal

Management Science

Published Date

2023/11/7

Online platforms have a wealth of data, run countless experiments, and use industrial-scale algorithms to optimize user experience. Despite this, many users seem to regret the time they spend on these platforms. One possible explanation is that incentives are misaligned: platforms are not optimizing for user happiness. We suggest the problem runs deeper, transcending the specific incentives of any particular platform, and instead stems from a mistaken foundational assumption. To understand what users want, platforms look at what users do. This is a kind of revealed-preference assumption that is ubiquitous in the way user models are built. Yet research has demonstrated, and personal experience affirms, that we often make choices in the moment that are inconsistent with what we actually want. The behavioral economics and psychology literatures suggest, for example, that we can choose mindlessly or that we …

Designing Skill-Compatible AI: Methodologies and Frameworks in Chess

Authors

Karim Hamade,Reid McIlroy-Young,Siddhartha Sen,Jon Kleinberg,Ashton Anderson

Published Date

2023/10/13

Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achieving superhuman performance alone is not sufficient; they also need to account for suboptimal actions or idiosyncratic style from their less-skilled counterparts. We propose a formal evaluation framework for assessing the compatibility of near-optimal AI with interaction partners who may have much lower levels of skill; we use popular collaborative chess variants as model systems to study and develop AI agents that can successfully interact with lower-skill entities. Traditional chess engines designed to output near-optimal moves prove to be inadequate partners when paired with engines of various lower skill levels in this domain, as they are not designed to consider the presence of other agents. We contribute three methodologies to explicitly create skill-compatible AI agents in complex decision-making settings, and two chess game frameworks designed to foster collaboration between powerful AI agents and less-skilled partners. On these frameworks, our agents outperform state-of-the-art chess AI (based on AlphaZero) despite being weaker in conventional chess, demonstrating that skill-compatibility is a tangible trait that is qualitatively and measurably distinct from raw performance. Our evaluations further explore and clarify the mechanisms by …

Calibrated recommendations for users with decaying attention

Authors

Jon Kleinberg,Emily Ryu,Éva Tardos

Journal

arXiv preprint arXiv:2302.03239

Published Date

2023/2/7

Recommendation systems capable of providing diverse sets of results are a focus of increasing importance, with motivations ranging from fairness to novelty and other aspects of optimizing user experience. One form of diversity of recent interest is calibration, the notion that personalized recommendations should reflect the full distribution of a user's interests, rather than a single predominant category -- for instance, a user who mainly reads entertainment news but also wants to keep up with news on the environment and the economy would prefer to see a mixture of these genres, not solely entertainment news. Existing work has formulated calibration as a subset selection problem; this line of work observes that the formulation requires the unrealistic assumption that all recommended items receive equal consideration from the user, but leaves as an open question the more realistic setting in which user attention decays as they move down the list of results. In this paper, we consider calibration with decaying user attention under two different models. In both models, there is a set of underlying genres that items can belong to. In the first setting, where items are represented by fine-grained mixtures of genre percentages, we provide a -approximation algorithm by extending techniques for constrained submodular optimization. In the second setting, where items are coarsely binned into a single genre each, we surpass the barrier imposed by submodular maximization and give a -approximate greedy algorithm. Our work thus addresses the problem of capturing ordering effects due to decaying attention, allowing for the extension of …

Human bias in algorithm design

Authors

Carey K Morewedge,Sendhil Mullainathan,Haaya F Naushan,Cass R Sunstein,Jon Kleinberg,Manish Raghavan,Jens O Ludwig

Journal

Nature Human Behaviour

Published Date

2023/11

Algorithms are designed to learn user preferences by observing user behaviour. This causes algorithms to fail to reflect user preferences when psychological biases affect user decision making. For algorithms to enhance social welfare, algorithm design needs to be psychologically informed.

Fairness in model-sharing games

Authors

Kate Donahue,Jon Kleinberg

Published Date

2023/4/30

In many real-world situations, data is distributed across multiple self-interested agents. These agents can collaborate to build a machine learning model based on data from multiple agents, potentially reducing the error each experiences. However, sharing models in this way raises questions of fairness: to what extent can the error experienced by one agent be significantly lower than the error experienced by another agent in the same coalition? In this work, we consider two notions of fairness that each may be appropriate in different circumstances: egalitarian fairness (which aims to bound how dissimilar error rates can be) and proportional fairness (which aims to reward players for contributing more data). We similarly consider two common methods of model aggregation, one where a single model is created for all agents (uniform), and one where an individualized model is created for each agent. For egalitarian …

The inversion problem: Why algorithms should infer mental state and not just predict behavior

Authors

Jon Kleinberg,Jens Ludwig,Sendhil Mullainathan,Manish Raghavan

Journal

Perspectives on Psychological Science

Published Date

2023/10/6

More and more machine learning is applied to human behavior. Increasingly these algorithms suffer from a hidden—but serious—problem. It arises because they often predict one thing while hoping for another. Take a recommender system: It predicts clicks but hopes to identify preferences. Or take an algorithm that automates a radiologist: It predicts in-the-moment diagnoses while hoping to identify their reflective judgments. Psychology shows us the gaps between the objectives of such prediction tasks and the goals we hope to achieve: People can click mindlessly; experts can get tired and make systematic errors. We argue such situations are ubiquitous and call them “inversion problems”: The real goal requires understanding a mental state that is not directly measured in behavioral data but must instead be inverted from the behavior. Identifying and solving these problems require new tools that draw on both …

Mimetic models: Ethical implications of ai that acts like you

Authors

Reid McIlroy-Young,Jon Kleinberg,Siddhartha Sen,Solon Barocas,Ashton Anderson

Published Date

2022/7/26

An emerging theme in artificial intelligence research is the creation of models to simulate the decisions and behavior of specific people, in domains including game-playing, text generation, and artistic expression. These models go beyond earlier approaches in the way they are tailored to individuals, and the way they are designed for interaction rather than simply the reproduction of fixed, pre-computed behaviors. We refer to these as mimetic models, and in this paper we develop a framework for characterizing the ethical and social issues raised by their growing availability. Our framework includes a number of distinct scenarios for the use of such models, and considers the impacts on a range of different participants, including the target being modeled, the operator who deploys the model, and the entities that interact with it.

Supervised hypergraph reconstruction

Authors

Yanbang Wang,Jon Kleinberg

Journal

arXiv preprint arXiv:2211.13343

Published Date

2022/11/23

We study an issue commonly seen with graph data analysis: many real-world complex systems involving high-order interactions are best encoded by hypergraphs; however, their datasets often end up being published or studied only in the form of their projections (with dyadic edges). To understand this issue, we first establish a theoretical framework to characterize this issue's implications and worst-case scenarios. The analysis motivates our formulation of the new task, supervised hypergraph reconstruction: reconstructing a real-world hypergraph from its projected graph, with the help of some existing knowledge of the application domain. To reconstruct hypergraph data, we start by analyzing hyperedge distributions in the projection, based on which we create a framework containing two modules: (1) to handle the enormous search space of potential hyperedges, we design a sampling strategy with efficacy guarantees that significantly narrows the space to a smaller set of candidates; (2) to identify hyperedges from the candidates, we further design a hyperedge classifier in two well-working variants that capture structural features in the projection. Extensive experiments validate our claims, approach, and extensions. Remarkably, our approach outperforms all baselines by an order of magnitude in accuracy on hard datasets. Our code and data can be downloaded from bit.ly/SHyRe.

Hypergraph cuts with general splitting functions

Authors

Nate Veldt,Austin R Benson,Jon Kleinberg

Journal

SIAM Review

Published Date

2022

The minimum - cut problem in graphs is one of the most fundamental problems in combinatorial optimization, and graph cuts underlie algorithms throughout discrete mathematics, theoretical computer science, operations research, and data science. While graphs are a standard model for pairwise relationships, hypergraphs provide the flexibility to model multiway relationships and are now a standard model for complex data and systems. However, when generalizing from graphs to hypergraphs, the notion of a “cut hyperedge” is less clear, as a hyperedge's nodes can be split in several ways. Here, we develop a framework for hypergraph cuts by considering the problem of separating two terminal nodes in a hypergraph in a way that minimizes a sum of penalties at split hyperedges. In our setup, different ways of splitting the same hyperedge have different penalties, and the penalty is encoded by what we call a splitting function …

On the effect of triadic closure on network segregation

Authors

Rediet Abebe,Nicole Immorlica,Jon Kleinberg,Brendan Lucier,Ali Shirali

Published Date

2022/7/12

The tendency for individuals to form social ties with others who are similar to themselves, known as homophily, is one of the most robust sociological principles. Since this phenomenon can lead to patterns of interactions that segregate people along different demographic dimensions, it can also lead to inequalities in access to information, resources, and opportunities. As we consider potential interventions that might alleviate the effects of segregation, we face the challenge that homophily constitutes a pervasive and organic force that is difficult to push back against. Designing effective interventions can therefore benefit from identifying counterbalancing social processes that might be harnessed to work in opposition to segregation. In this work, we show that triadic closure---another common phenomenon that posits that individuals with a mutual connection are more likely to be connected to one another---can be one …

Containing the spread of a contagion on a tree

Authors

Michela Meister,Jon Kleinberg

Journal

arXiv preprint arXiv:2210.13247

Published Date

2022/10/24

Contact tracing can be thought of as a race between two processes: an infection process and a tracing process. In this paper, we study a simple model of infection spreading on a tree, and a tracer who stabilizes one node at a time. We focus on the question, how should the tracer choose nodes to stabilize so as to prevent the infection from spreading further? We study simple policies, which prioritize nodes based on time, infectiousness, or probability of generating new contacts.

Four years of FAccT: A reflexive, mixed-methods analysis of research contributions, shortcomings, and future prospects

Authors

Benjamin Laufer,Sameer Jain,A Feder Cooper,Jon Kleinberg,Hoda Heidari

Published Date

2022/6/21

Fairness, Accountability, and Transparency (FAccT) for socio-technical systems has been a thriving area of research in recent years. An ACM conference bearing the same name has been the central venue for scholars in this area to come together, provide peer feedback to one another, and publish their work. This reflexive study aims to shed light on FAccT’s activities to date and identify major gaps and opportunities for translating contributions into broader positive impact. To this end, we utilize a mixed-methods research design. On the qualitative front, we develop a protocol for reviewing and coding prior FAccT papers, tracing their distribution of topics, methods, datasets, and disciplinary roots. We also design and administer a questionnaire to reflect the voices of FAccT community members and affiliates on a wide range of topics. On the quantitative front, we use the full text and citation network associated with …

Exporting Geography Into A Virtual Landscape: A Global Pandemic Locally Discussed

Authors

Katherine Van Koevering,Yiquan Hong,Jon Kleinberg

Journal

arXiv preprint arXiv:2210.07187

Published Date

2022/10/13

The COVID-19 pandemic has been a global health crisis playing out in the age of social media. Even though the virtual environment makes interaction possible regardless of physical location, many of the most pressing issues during the pandemic -- case counts, lockdown policies, vaccine availability -- have played out in an intensely local fashion. Reflecting this locality, many of the online COVID communities that formed have been closely tied to physical location, at different spatial scales ranging from cities to countries to entire global platforms. This provides an opportunity to study how the real-world geography of the pandemic translates into a virtual landscape. By analyzing almost 300 geographically-linked COVID discussion communities on Reddit, we show how these discussions were organized geographically and temporally in three aspects: what were people talking about, who were they talking about it with, and how did they self-organize these conversations?

Allocating stimulus checks in times of crisis

Authors

Marios Papachristou,Jon Kleinberg

Published Date

2022/4/25

We study the problem of financial assistance (bailouts, stimulus payments, or subsidy allocations) in a network where individuals experience income shocks. These questions are pervasive both in policy domains and in the design of new Web-enabled forms of financial interaction. We build on the financial clearing framework of Eisenberg and Noe that allows the incorporation of a bailout policy that is based on discrete bailouts motivated by stimulus programs in both off-line and on-line settings. We show that optimally allocating such bailouts on a financial network in order to maximize a variety of social welfare objectives of this form is a computationally intractable problem. We develop approximation algorithms to optimize these objectives and establish guarantees for their approximation ratios. Then, we incorporate multiple fairness constraints in the optimization problems and study their boundedness. Finally, we …

Learning models of individual behavior in chess

Authors

Reid McIlroy-Young,Russell Wang,Siddhartha Sen,Jon Kleinberg,Ashton Anderson

Published Date

2022/8/14

AI systems that can capture human-like behavior are becoming increasingly useful in situations where humans may want to learn from these systems, collaborate with them, or engage with them as partners for an extended duration. In order to develop human-oriented AI systems, the problem of predicting human actions---as opposed to predicting optimal actions---has received considerable attention. Existing work has focused on capturing human behavior in an aggregate sense, which potentially limits the benefit any particular individual could gain from interaction with these systems. We extend this line of work by developing highly accurate predictive models of individual human behavior in chess. Chess is a rich domain for exploring human-AI interaction because it combines a unique set of properties: AI systems achieved superhuman performance many years ago, and yet humans still interact with them closely …

Measuring the completeness of economic models

Authors

Drew Fudenberg,Jon Kleinberg,Annie Liang,Sendhil Mullainathan

Journal

Journal of Political Economy

Published Date

2021/1/18

Economic models are evaluated by testing the correctness of their predictions. We suggest an additional measure, “completeness”: the fraction of the predictable variation in the data that the model captures. We calculate the completeness of prominent models in three problems from experimental economics: assigning certainty equivalents to lotteries, predicting initial play in games, and predicting human generation of random sequences. The completeness measure reveals new insights about these models, including how much room there is for improving their predictions.

Core-periphery models for hypergraphs

Authors

Marios Papachristou,Jon Kleinberg

Published Date

2022/8/14

We introduce a random hypergraph model for core-periphery structure. By leveraging our model's sufficient statistics, we develop a novel statistical inference algorithm that is able to scale to large hypergraphs with runtime that is practically linear wrt. the number of nodes in the graph after a preprocessing step that is almost linear in the number of hyperedges, as well as a scalable sampling algorithm. Our inference algorithm is capable of learning embeddings that correspond to the reputation (rank) of a node within the hypergraph. We also give theoretical bounds on the size of the core of hypergraphs generated by our model. We experiment with hypergraph data that range to ∼ 105 hyperedges mined from the Microsoft Academic Graph, Stack Exchange, and GitHub and show that our model outperforms baselines wrt. producing good fits.

Learning to reason with neural networks: Generalization, unseen data and boolean measures

Authors

Emmanuel Abbe,Samy Bengio,Elisabetta Cornacchia,Jon Kleinberg,Aryo Lotfi,Maithra Raghu,Chiyuan Zhang

Journal

Advances in Neural Information Processing Systems

Published Date

2022/12/6

This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where areasoning'function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.

Ordered submodularity and its applications to diversifying recommendations

Authors

Jon Kleinberg,Emily Ryu,Éva Tardos

Journal

arXiv preprint arXiv:2203.00233

Published Date

2022/3/1

A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of items from a larger set. Submodularity has been very effective in allowing approximation algorithms for such subset selection problems. However, in several applications, we are interested not only in the elements of a set, but also the order in which they appear, breaking the assumption that all selected items receive equal consideration. One such category of applications involves the presentation of search results, product recommendations, news articles, and other content, due to the well-documented phenomenon that humans pay greater attention to higher-ranked items. As a result, optimization in content presentation for diversity, user coverage, calibration, or other objectives more accurately represents a sequence selection problem, to which traditional submodularity approximation results no longer apply. Although extensions of submodularity to sequences have been proposed, none is designed to model settings where items contribute based on their position in a ranked list, and hence they are not able to express these types of optimization problems. In this paper, we aim to address this modeling gap. Here, we propose a new formalism of ordered submodularity that captures these ordering problems in content presentation, and more generally a category of optimization problems over ranked sequences in which different list positions contribute differently to the objective function. We analyze the natural ordered analogue of the greedy algorithm and show that it …

Understanding and Measuring Income Shocks as Precursors to Poverty

Authors

Rediet Abebe,Jon Kleinberg,Andrew Wang

Published Date

2021

Poverty and economic hardship are multi-faceted and dynamic phenomena impacting over 50 million people in the United States and billions of people world-wide. Despite the prevalence of poverty, there remains much to be understood about what makes families susceptible to experiencing economic distress and what interventions many be effective and for which families. An important set of questions is related to the role of income shocks. Shocks may constitute unexpected expenses such as a medical bill or a parking ticket or interruptions to one’s income flow, such as a delayed paycheck or loss of public benefits. Recently these phenomena have garnered increased attention, with a growing body of empirical and computational work showing their impact on various measures of socioeconomic welfare. We present a computational study of a large survey-based longitudinal data-set to understand the role of …

Algorithmic monoculture and social welfare

Authors

Jon Kleinberg,Manish Raghavan

Journal

Proceedings of the National Academy of Sciences

Published Date

2021/6/1

As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. Here, we show that the dangers of algorithmic monoculture run much deeper, in that monocultural convergence on a single algorithm by a group of decision-making agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents. Unexpected shocks are therefore not needed to expose the risks of monoculture; it can hurt accuracy even under “normal” operations and even for algorithms that are more accurate when …

Polarization in geometric opinion dynamics

Authors

Jason Gaitonde,Jon Kleinberg,Éva Tardos

Published Date

2021/7/18

In light of increasing recent attention to political polarization, understanding how polarization can arise poses an important theoretical question. While more classical models of opinion dynamics seem poorly equipped to study this phenomenon, a recent novel approach by H\ka zł a, Jin, Mossel, and Ramnarayan (HJMR) proposes a simple geometric model of opinion evolution that provably exhibits strong polarization in specialized cases. Moreover, polarization arises quite organically in their model: in each time step, each agent updates opinions according to their correlation/response with an issue drawn at random. However, their techniques do not seem to extend beyond a set of special cases they identify, which benefit from fragile symmetry or contractiveness assumptions, leaving open how general this phenomenon really is. In this paper, we further the study of polarization in related geometric models. We show …

Stochastic model for sunk cost bias

Authors

Jon Kleinberg,Sigal Oren,Manish Raghavan,Nadav Sklar

Published Date

2021/12/1

We present a novel model for capturing the behavior of an agent exhibiting sunk-cost bias in a stochastic environment. Agents exhibiting sunk-cost bias take into account the effort they have already spent on an endeavor when they evaluate whether to continue or abandon it. We model planning tasks in which an agent with this type of bias tries to reach a designated goal. Our model structures this problem as a type of Markov decision process: loosely speaking, the agent traverses a directed acyclic graph with probabilistic transitions, paying costs for its actions as it tries to reach a target node containing a specified reward. The agent’s sunk cost bias is modeled by a cost that it incurs for abandoning the traversal: if the agent decides to stop traversing the graph, it incurs a cost of , where is a parameter that captures the extent of the bias and is the sum of costs already invested. We analyze the behavior of two types of agents: naive agents that are unaware of their bias, and sophisticated agents that are aware of it. Since optimal (bias-free) behavior in this problem can involve abandoning the traversal before reaching the goal, the bias exhibited by these types of agents can result in sub-optimal behavior by shifting their decisions about abandonment. We show that in contrast to optimal agents, it is computationally hard to compute the optimal policy for a sophisticated agent. Our main results quantify the loss exhibited by these two types of agents with respect to an optimal agent. We present both general and topology-specific bounds.

Model-sharing games: Analyzing federated learning under voluntary participation

Authors

Kate Donahue,Jon Kleinberg

Journal

AAAI 2021

Published Date

2020/10/2

Federated learning is a setting where agents, each with access to their own data source, combine models learned from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they join the global model or stay with their local model? In this work, we show how this situation can be naturally analyzed through the framework of coalitional game theory. Motivated by these considerations, we propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. In this work, we derive exact expected MSE values for problems in linear regression and mean estimation. We use these values to analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly constructs a single model. In a case with arbitrarily many players that each have either a" small" or" large" amount of data, we …

The generalized mean densest subgraph problem

Authors

Nate Veldt,Austin R Benson,Jon Kleinberg

Published Date

2021/8/14

Finding dense subgraphs of a large graph is a standard problem in graph mining that has been studied extensively both for its theoretical richness and its many practical applications. In this paper we introduce a new family of dense subgraph objectives, parameterized by a single parameter p, based on computing generalized means of degree sequences of a subgraph. Our objective captures both the standard densest subgraph problem and the maximum k-core as special cases, and provides a way to interpolate between and extrapolate beyond these two objectives when searching for other notions of dense subgraphs. In terms of algorithmic contributions, we first show that our objective can be minimized in polynomial time for all p ≥ 1 using repeated submodular minimization. A major contribution of our work is analyzing the performance of different types of peeling algorithms for dense subgraphs both in theory …

Hypergraph ego-networks and their temporal evolution

Authors

Cazamere Comrie,Jon Kleinberg

Published Date

2021/12/7

Interactions involving multiple objects simultaneously are ubiquitous across many domains. The systems these interactions inhabit can be modelled using hypergraphs, a generalization of traditional graphs in which each edge can connect any number of nodes. Analyzing the global and static properties of these hypergraphs has led to a plethora of novel findings regarding how these modelled system are structured. However, less is known about the localized structure of these systems and how they evolve over time. In this paper, we propose the study of hypergraph ego-networks, a structure that can be used to model higher-order interactions involving a single node. We also propose the temporal reconstruction of hypergraph ego-networks as a benchmark problem for models that aim to predict the local temporal structure of hypergraphs. By combining a deep learning binary classifier with a hill-climbing algorithm …

Optimal stopping with behaviorally biased agents: The role of loss aversion and changing reference points

Authors

Jon Kleinberg,Robert Kleinberg,Sigal Oren

Published Date

2021/7/18

One of the central human biases studied in behavioral economics is reference dependence - people's tendency to evaluate an outcome not in absolute terms but instead relative to a reference point that reflects some notion of the status quo [4]. Reference dependence interacts closely with a related behavioral bias, loss aversion, in which people weigh losses more strongly than gains of comparable absolute values. Taken together, these two effects produce a fundamental behavioral regularity in human choices: once a reference point has been established, people tend to avoid outcomes in which they experience a loss relative to the reference point. A well-known instance of the effect is the empirical evidence that individual investors will tend to avoid selling a stock unless it has exceeded the price at which they purchased it. In more complex examples, the reference may shift while an agent is making a decision …

Planted hitting set recovery in hypergraphs

Authors

Ilya Amburg,Jon Kleinberg,Austin R Benson

Journal

Journal of Physics: Complexity

Published Date

2021/5/5

In various application areas, networked data is collected by measuring interactions involving some specific set of core nodes. This results in a network dataset containing the core nodes along with a potentially much larger set of fringe nodes that all have at least one interaction with a core node. In many settings, this type of data arises for structures that are richer than graphs, because they involve the interactions of larger sets; for example, the core nodes might be a set of individuals under surveillance, where we observe the attendees of meetings involving at least one of the core individuals. We model such scenarios using hypergraphs, and we study the problem of core recovery: if we observe the hypergraph but not the labels of core and fringe nodes, can we recover the'planted'set of core nodes in the hypergraph? We provide a theoretical framework for analyzing the recovery of such a set of core nodes and use …

Pointer value retrieval: A new benchmark for understanding the limits of neural network generalization

Authors

Chiyuan Zhang,Maithra Raghu,Jon Kleinberg,Samy Bengio

Journal

arXiv preprint arXiv:2107.12580

Published Date

2021/7/27

Central to the success of artificial neural networks is their ability to generalize. But does neural network generalization primarily rely on seeing highly similar training examples (memorization)? Or are neural networks capable of human-intelligence styled reasoning, and if so, to what extent? These remain fundamental open questions on artificial neural networks. In this paper, as steps towards answering these questions, we introduce a new benchmark, Pointer Value Retrieval (PVR) to study the limits of neural network reasoning. The PVR suite of tasks is based on reasoning about indirection, a hallmark of human intelligence, where a first stage (task) contains instructions for solving a second stage (task). In PVR, this is done by having one part of the task input act as a pointer, giving instructions on a different input location, which forms the output. We show this simple rule can be applied to create a diverse set of tasks across different input modalities and configurations. Importantly, this use of indirection enables systematically varying task difficulty through distribution shifts and increasing functional complexity. We conduct a detailed empirical study of different PVR tasks, discovering large variations in performance across dataset sizes, neural network architectures and task complexity. Further, by incorporating distribution shift and increased functional complexity, we develop nuanced tests for reasoning, revealing subtle failures and surprising successes, suggesting many promising directions of exploration on this benchmark.

On modeling human perceptions of allocation policies with uncertain outcomes

Authors

Hoda Heidari,Solon Barocas,Jon Kleinberg,Karen Levy

Published Date

2021

Many policies allocate harms or benefits that are uncertain in nature: they produce distributions over the population in which individuals have different probabilities of incurring harm or benefit. Comparing different policies thus involves a comparison of their corresponding probability distributions, and we observe that in many instances the policies selected in practice are hard to explain by preferences based only on the expected value of the total harm or benefit they produce. In cases where the expected value analysis is not a sufficient explanatory framework, what would be a reasonable model for societal preferences over these distributions? Here we investigate explanations based on the framework of probability weighting from the behavioral sciences, which over several decades has identified systematic biases in how people perceive probabilities. We show that probability weighting can be used to make …

Detecting individual decision-making style: Exploring behavioral stylometry in chess

Authors

Reid McIlroy-Young,Yu Wang,Siddhartha Sen,Jon Kleinberg,Ashton Anderson

Journal

Advances in Neural Information Processing Systems

Published Date

2021/12/6

The advent of machine learning models that surpass human decision-making ability in complex domains has initiated a movement towards building AI systems that interact with humans. Many building blocks are essential for this activity, with a central one being the algorithmic characterization of human behavior. While much of the existing work focuses on aggregate human behavior, an important long-range goal is to develop behavioral models that specialize to individual people and can differentiate among them. To formalize this process, we study the problem of behavioral stylometry, in which the task is to identify a decision-maker from their decisions alone. We present a transformer-based approach to behavioral stylometry in the context of chess, where one attempts to identify the player who played a set of games. Our method operates in a few-shot classification framework, and can correctly identify a player from among thousands of candidate players with 98% accuracy given only 100 labeled games. Even when trained on amateur play, our method generalises to out-of-distribution samples of Grandmaster players, despite the dramatic differences between amateur and world-class players. Finally, we consider more broadly what our resulting embeddings reveal about human style in chess, as well as the potential ethical implications of powerful methods for identifying individuals from behavioral data.

Random graphs with prescribed k-core sequences: A new null model for network analysis

Authors

Katherine Van Koevering,Austin Benson,Jon Kleinberg

Published Date

2021/4/19

In the analysis of large-scale network data, a fundamental operation is the comparison of observed phenomena to the predictions provided by null models: when we find an interesting structure in a family of real networks, it is important to ask whether this structure is also likely to arise in random networks with similar characteristics to the real ones. A long-standing challenge in network analysis has been the relative scarcity of reasonable null models for networks; arguably the most common such model has been the configuration model, which starts with a graph G and produces a random graph with the same node degrees as G. This leads to a very weak form of null model, since fixing the node degrees does not preserve many of the crucial properties of the network, including the structure of its subgraphs. Guided by this challenge, we establish a new family of network null models that operate on the k-core …

Using a cross-task grid of linear probes to interpret cnn model predictions on retinal images

Authors

Katy Blumer,Subhashini Venugopalan,Michael P Brenner,Jon Kleinberg

Journal

arXiv preprint arXiv:2107.11468

Published Date

2021/7/23

We analyze a dataset of retinal images using linear probes: linear regression models trained on some "target" task, using embeddings from a deep convolutional (CNN) model trained on some "source" task as input. We use this method across all possible pairings of 93 tasks in the UK Biobank dataset of retinal images, leading to ~164k different models. We analyze the performance of these linear probes by source and target task and by layer depth. We observe that representations from the middle layers of the network are more generalizable. We find that some target tasks are easily predicted irrespective of the source task, and that some other target tasks are more accurately predicted from correlated source tasks than from embeddings trained on the same task.

Integrating explanation and prediction in computational social science

Authors

Jake M Hofman,Duncan J Watts,Susan Athey,Filiz Garip,Thomas L Griffiths,Jon Kleinberg,Helen Margetts,Sendhil Mullainathan,Matthew J Salganik,Simine Vazire,Alessandro Vespignani,Tal Yarkoni

Published Date

2021/7/8

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions—the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes—and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining …

Approximate decomposable submodular function minimization for cardinality-based components

Authors

Nate Veldt,Austin R Benson,Jon Kleinberg

Journal

Advances in Neural Information Processing Systems

Published Date

2021/12/6

Minimizing a sum of simple submodular functions of limited support is a special case of general submodular function minimization that has seen numerous applications in machine learning. We develop faster techniques for instances where components in the sum are cardinality-based, meaning they depend only on the size of the input set. This variant is one of the most widely applied in practice, encompassing, eg, common energy functions arising in image segmentation and recent generalized hypergraph cut functions. We develop the first approximation algorithms for this problem, where the approximations can be quickly computed via reduction to a sparse graph cut problem, with graph sparsity controlled by the desired approximation factor. Our method relies on a new connection between sparse graph reduction techniques and piecewise linear approximations to concave functions. Our sparse reduction technique leads to significant improvements in theoretical runtimes, as well as substantial practical gains in problems ranging from benchmark image segmentation tasks to hypergraph clustering problems.

Allocating opportunities in a dynamic model of intergenerational mobility

Authors

Hoda Heidari,Jon Kleinberg

Published Date

2021/3/3

Opportunities such as higher education can promote intergenerational mobility, leading individuals to achieve levels of socioeconomic status above that of their parents. We develop a dynamic model for allocating such opportunities in a society that exhibits bottlenecks in mobility; the problem of optimal allocation reflects a trade-off between the benefits conferred by the opportunities in the current generation and the potential to elevate the socioeconomic status of recipients, shaping the composition of future generations in ways that can benefit further from the opportunities. We show how optimal allocations in our model arise as solutions to continuous optimization problems over multiple generations, and we find in general that these optimal solutions can favor recipients of low socioeconomic status over slightly higher-performing individuals of high socioeconomic status --- a form of socioeconomic affirmative action …

The paradox of second-order homophily in networks

Authors

Anna Evtushenko,Jon Kleinberg

Journal

Scientific Reports

Published Date

2021/6/25

Homophily—the tendency of nodes to connect to others of the same type—is a central issue in the study of networks. Here we take a local view of homophily, defining notions of first-order homophily of a node (its individual tendency to link to similar others) and second-order homophily of a node (the aggregate first-order homophily of its neighbors). Through this view, we find a surprising result for homophily values that applies with only minimal assumptions on the graph topology. It can be phrased most simply as “in a graph of red and blue nodes, red friends of red nodes are on average more homophilous than red friends of blue nodes”. This gap in averages defies simple intuitive explanations, applies to globally heterophilous and homophilous networks and is reminiscent of but structually distinct from the Friendship Paradox. The existence of this gap suggests intrinsic biases in homophily measurements between …

Opinion dynamics optimization by varying susceptibility to persuasion via non-convex local search

Authors

Rediet Abebe,T-H HUBERT Chan,Jon Kleinberg,Zhibin Liang,David Parkes,Mauro Sozio,Charalampos E Tsourakakis

Journal

ACM Transactions on Knowledge Discovery from Data (TKDD)

Published Date

2021/7/21

A long line of work in social psychology has studied variations in people’s susceptibility to persuasion—the extent to which they are willing to modify their opinions on a topic. This body of literature suggests an interesting perspective on theoretical models of opinion formation by interacting parties in a network: in addition to considering interventions that directly modify people’s intrinsic opinions, it is also natural to consider interventions that modify people’s susceptibility to persuasion. In this work, motivated by this fact, we propose an influence optimization problem. Specifically, we adopt a popular model for social opinion dynamics, where each agent has some fixed innate opinion, and a resistance that measures the importance it places on its innate opinion; agents influence one another’s opinions through an iterative process. Under certain conditions, this iterative process converges to some equilibrium opinion …

Optimality and stability in federated learning: A game-theoretic approach

Authors

Kate Donahue,Jon Kleinberg

Journal

Neurips 2021

Published Date

2021/6/17

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players). First, we provide and prove the correctness of an efficient algorithm to calculate an optimal (error minimizing) arrangement of players. Next, we analyze the relationship between the stability and optimality of an arrangement. First, we show that for some regions of parameter space, all stable arrangements are optimal (Price of Anarchy equal to 1). However, we show this is not true for all settings: there exist examples of stable arrangements with higher cost than optimal (Price of Anarchy greater than 1). Finally, we give the first constant-factor bound on the performance gap between stability and optimality, proving that the total error of the worst stable solution can be no higher than 9 times the total error of an optimal solution (Price of Anarchy bound of 9).

Roles for computing in social change

Authors

Rediet Abebe,Solon Barocas,Jon Kleinberg,Karen Levy,Manish Raghavan,David G Robinson

Published Date

2020/1/27

A recent normative turn in computer science has brought concerns about fairness, bias, and accountability to the core of the field. Yet recent scholarship has warned that much of this technical work treats problematic features of the status quo as fixed, and fails to address deeper patterns of injustice and inequality. While acknowledging these critiques, we posit that computational research has valuable roles to play in addressing social problems --- roles whose value can be recognized even from a perspective that aspires toward fundamental social change. In this paper, we articulate four such roles, through an analysis that considers the opportunities as well as the significant risks inherent in such work. Computing research can serve as a diagnostic, helping us to understand and measure social problems with precision and clarity. As a formalizer, computing shapes how social problems are explicitly defined …

Minimizing localized ratio cut objectives in hypergraphs

Authors

Nate Veldt,Austin R Benson,Jon Kleinberg

Published Date

2020/8/23

Hypergraphs are a useful abstraction for modeling multiway relationships in data, and hypergraph clustering is the task of detecting groups of closely related nodes in such data.Graph clustering has been studied extensively, and there are numerous methods for detecting small, localized clusters without having to explore an entire input graph. However, there are only a few specialized approaches for localized clustering in hypergraphs. Here we present a framework for local hypergraph clustering based on minimizing localized ratio cut objectives. Our framework takes an input set of reference nodes in a hypergraph and solves a sequence of hypergraph minimum s-t cut problems in order to identify a nearby well-connected cluster of nodes that overlaps substantially with the input set. Our methods extend graph-based techniques but are significantly more general and have new output quality guarantees. First, our …

An economic perspective on algorithmic fairness

Authors

Ashesh Rambachan,Jon Kleinberg,Jens Ludwig,Sendhil Mullainathan

Journal

AEA Papers and Proceedings

Published Date

2020/5/1

There are widespread concerns that the growing use of machine learning algorithms in important decisions may reproduce and reinforce existing discrimination against legally protected groups. Most of the attention to date on issues of “algorithmic bias” or “algorithmic fairness” has come from computer scientists and machine learning researchers. We argue that concerns about algorithmic fairness are at least as much about questions of how discrimination manifests itself in data, decision-making under uncertainty, and optimal regulation. To fully answer these questions, an economic framework is necessary--and as a result, economists have much to contribute.

Fairness and utilization in allocating resources with uncertain demand

Authors

Kate Donahue,Jon Kleinberg

Published Date

2020/1/27

Resource allocation problems are a fundamental domain in which to evaluate the fairness properties of algorithms. The trade-offs between fairness and utilization have a long history in this domain. A recent line of work has considered fairness questions for resource allocation when the demands for the resource are distributed across multiple groups and drawn from probability distributions. In such cases, a natural fairness requirement is that individuals from different groups should have (approximately) equal probabilities of receiving the resource. A largely open question in this area has been to bound the gap between the maximum possible utilization of the resource and the maximum possible utilization subject to this fairness condition. Here, we obtain some of the first provable upper bounds on this gap. We obtain an upper bound for arbitrary distributions, as well as much stronger upper bounds for specific …

Aligning superhuman ai with human behavior: Chess as a model system

Authors

Reid McIlroy-Young,Siddhartha Sen,Jon Kleinberg,Ashton Anderson

Published Date

2020/8/23

As artificial intelligence becomes increasingly intelligent---in some cases, achieving superhuman performance---there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in …

Frozen binomials on the web: Word ordering and language conventions in online text

Authors

Katherine Van Koevering,Austin R Benson,Jon Kleinberg

Published Date

2020/4/20

There is inherent information captured in the order in which we write words in a list. The orderings of binomials — lists of two words separated by ‘and’ or ‘or’ — has been studied for more than a century. These binomials are common across many areas of speech, in both formal and informal text. In the last century, numerous explanations have been given to describe what order people use for these binomials, from differences in semantics to differences in phonology. These rules describe primarily ‘frozen’ binomials that exist in exactly one ordering and have lacked large-scale trials to determine efficacy. Text in online social media such as Reddit provides a unique opportunity to study these lists in the context of informal text at a very large scale. In this work, we expand the view of binomials to include a large-scale analysis of both frozen and non-frozen binomials in a quantitative way. Using this data, we then …

Algorithmic classification and strategic effort

Authors

Jon Kleinberg,Manish Raghavan

Journal

ACM SIGecom Exchanges

Published Date

2020/12/2

In this letter, we summarize our recent work examining the incentives produced by algorithmic decision-making. Drawing upon principal-agent models in the mechanism design literature, we construct and analyze a model of strategic behavior under algorithmic evaluation. We characterize which behaviors can be incentivized by any reasonable mechanism, showing that simple linear mechanisms are sufficient to incentivize desired behavior. However, we find that it is computationally hard to optimize even simple objectives subject to the constraint that the resulting linear mechanism induces the desired incentives.

Network-aware product rollout in online social networks

Published Date

2018/4/3

In one embodiment, a method includes accessing a social graph comprising a plurality of nodes and a plurality of edges connecting the nodes, each node corresponding to a user of an online social network, identifying a plurality of clusters in the social graph using graph clustering, providing a treatment to a first set of users based on the clusters, and determining a treatment effect treatment for the users in the first set based on a network exposure to the treatment for each user.

Designing Evaluation Rules That Are Robust to Strategic Behavior

Authors

Jon Kleinberg,Manish Raghavan

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2020/4/3

Machine learning is often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to “gaming” the classifier. We show that whenever any “reasonable” mechanism can do so, a simple linear mechanism suffices. This work is based on “How Do Classifiers Induce Agents To Invest Effort Strategically?” published in Economics and Computation 2019 (Kleinberg and Raghavan 2019).

Algorithms as discrimination detectors

Authors

Jon Kleinberg,Jens Ludwig,Sendhil Mullainathan,Cass R Sunstein

Journal

Proceedings of the National Academy of Sciences

Published Date

2020/12/1

Preventing discrimination requires that we have means of detecting it, and this can be enormously difficult when human beings are making the underlying decisions. As applied today, algorithms can increase the risk of discrimination. But as we argue here, algorithms by their nature require a far greater level of specificity than is usually possible with human decision making, and this specificity makes it possible to probe aspects of the decision in additional ways. With the right changes to legal and regulatory systems, algorithms can thus potentially make it easier to detect—and hence to help prevent—discrimination.

Learning personalized models of human behavior in chess

Authors

Reid McIlroy-Young,Russell Wang,Siddhartha Sen,Jon Kleinberg,Ashton Anderson

Journal

arXiv preprint arXiv:2008.10086

Published Date

2020/8

Even when machine learning systems surpass human ability in a domain, there are many reasons why AI systems that capture human-like behavior would be desirable: humans may want to learn from them, they may need to collaborate with them, or they may expect them to serve as partners in an extended interaction. Motivated by this goal of human-like AI systems, the problem of predicting human actions—as opposed to predicting optimal actions—has become an increasingly useful task. We extend this line of work by developing highly accurate personalized models of human behavior in the context of chess. Chess is a rich domain for exploring these questions, since it combines a set of appealing features: AI systems have achieved superhuman performance but still interact closely with human chess players both as opponents and preparation tools, and there is an enormous amount of recorded data on individual players. Starting with an open-source version of AlphaZero trained on a population of human players, we demonstrate that we can significantly improve prediction of a particular player’s moves by applying a series of fine-tuning adjustments. Furthermore, we can accurately perform stylometry—predicting who made a given set of moves—indicating that our personalized models capture human decision-making at an individual level.

LES ÉCHOS DU POUVOIR

Authors

Cristian DANESCU-NICULESCU-MIZIL,LEE Lillian,PANG Bo,Jon KLEINBERG

Published Date

2020

Le texte qui suit paraîtra inhabituel au lecteur familier de Réseaux. À la fois parce qu’il a été écrit par des chercheurs en informatique et parce qu’il comporte des équations mathématiques qui sont peu communes dans cette revue. Pourtant, nous avons fait le choix de traduire cet article parce qu’il constitue une contribution importante, croyons-nous, à l’enquête en sciences sociales à partir des traces textuelles issues du web. Il est emblématique de l’intérêt croissant que certains chercheurs en informatique–en particulier ceux qui se spécialisent dans la théorie des réseaux–portent à des objets qui sont communément étudiés par les sciences sociales.Cet article porte en effet sur un objet incontestablement sociologique: les relations de pouvoir dans les interactions sociales. Reprenant à leur compte une perspective sociolinguistique, les auteurs veulent démontrer que les participants à une interaction émettent des signaux linguistiques qui expriment la relation de pouvoir qui s’ établit entre eux-mêmes et leurs interlocuteurs. Pour le dire rapidement, un individu qui discute avec un autre individu dont le statut social est supérieur au sien tendra à réutiliser systématiquement certains des termes que son interlocuteur utilise. L’argument défendu par les auteurs, c’est que de tels signaux peuvent être saisis, quel que soit le sujet de la discussion–des discussions entre éditeurs sur Wikipédia ou des échanges entre avocats et juges de la Cour suprême des États-Unis–, et qu’ils peuvent être quantifiés à grande échelle.

Subsidy allocations in the presence of income shocks

Authors

Rediet Abebe,Jon Kleinberg,S Matthew Weinberg

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2020/4/3

Poverty and economic hardship are understood to be highly complex and dynamic phenomena. Due to the multi-faceted nature of welfare, assistance programs targeted at alleviating hardship can face challenges, as they often rely on simpler welfare measurements, such as income or wealth, that fail to capture to full complexity of each family's state. Here, we explore one important dimension–susceptibility to income shocks. We introduce a model of welfare that incorporates income, wealth, and income shocks and analyze this model to show that it can vary, at times substantially, from measures of welfare that only use income or wealth. We then study the algorithmic problem of optimally allocating subsidies in the presence of income shocks. We consider two well-studied objectives: the first aims to minimize the expected number of agents that fall below a given welfare threshold (a min-sum objective) and the second aims to minimize the likelihood that the most vulnerable agent falls below this threshold (a min-max objective). We present optimal and near-optimal algorithms for various general settings. We close with a discussion on future directions on allocating societal resources and ethical implications of related approaches.

Opinion dynamics with varying susceptibility to persuasion via non-convex local search

Authors

Rediet Abebe,Jon Kleinberg,David Parkes,Charalampos E Tsourakakis

Published Date

2018/7/19

A long line of work in social psychology has studied variations in people's susceptibility to persuasion -- the extent to which they are willing to modify their opinions on a topic. This body of literature suggests an interesting perspective on theoretical models of opinion formation on social networks: in addition to considering interventions that directly modify people's intrinsic opinions, it is also natural to consider those that modify people's susceptibility to persuasion. Here, we adopt a popular model for social opinion dynamics, and formalize the opinion maximization and minimization problems where interventions happen at the level of susceptibility. We show that modeling interventions at the level of susceptibility leads to an interesting family of new questions in network opinion dynamics. We find that the questions are quite different depending on whether there is an overall budget constraining the number of agents we …

Adversarial perturbations of opinion dynamics in networks

Authors

Jason Gaitonde,Jon Kleinberg,Eva Tardos

Published Date

2020/7/13

In this paper, we study the connections between network structure, opinion dynamics, and an adversary's power to artificially induce disagreements. We approach these questions by extending models of opinion formation in the mathematical social sciences to represent scenarios, familiar from recent events, in which external actors have sought to destabilize communities through sophisticated information warfare tactics via fake news and bots. In many instances, the intrinsic goals of these efforts are not necessarily to shift the overall sentiment of the network towards a particular policy, but rather to induce discord. These perturbations will diffuse via opinion dynamics on the underlying network, through mechanisms that have been analyzed and abstracted through work in computer science and the social sciences. Here we investigate the properties of such attacks, considering optimal strategies both for the adversary …

Mitigating bias in algorithmic hiring: Evaluating claims and practices

Authors

Manish Raghavan,Solon Barocas,Jon Kleinberg,Karen Levy

Published Date

2020/1/27

There has been rapidly growing interest in the use of algorithms in hiring, especially as a means to address or mitigate bias. Yet, to date, little is known about how these methods are used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and analyze the claims and practices of companies offering algorithms for employment assessment. In particular, we identify vendors of algorithmic pre-employment assessments (i.e., algorithms to screen candidates), document what they have disclosed about their development and validation procedures, and evaluate their practices, focusing particularly on efforts to detect and mitigate bias. Our analysis considers both technical and legal perspectives. Technically, we consider the various choices vendors make regarding data collection and prediction targets, and explore the risks and trade-offs that these choices pose …

How do classifiers induce agents to invest effort strategically?

Authors

Jon Kleinberg,Manish Raghavan

Journal

ACM Transactions on Economics and Computation (TEAC)

Published Date

2020/10/16

Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to “gaming” the classifier. We show that whenever any “reasonable” mechanism can do so, a simple linear mechanism suffices.

An economic approach to regulating algorithms

Authors

Ashesh Rambachan,Jon Kleinberg,Sendhil Mullainathan,Jens Ludwig

Published Date

2020/5/11

There is growing concern about" algorithmic bias"-that predictive algorithms used in decisionmaking might bake in or exacerbate discrimination in society. When will these" biases" arise? What should be done about them? We argue that such questions are naturally answered using the tools of welfare economics: a social welfare function for the policymaker, a private objective function for the algorithm designer and a model of their information sets and interaction. We build such a model that allows the training data to exhibit a wide range of" biases." Prevailing wisdom is that biased data change how the algorithm is trained and whether an algorithm should be used at all. In contrast, we find two striking irrelevance results. First, when the social planner builds the algorithm, her equity preference has no effect on the training procedure. So long as the data, however biased, contain signal, they will be used and the algorithm built on top will be the same. Any characteristic that is predictive of the outcome of interest, including group membership, will be used. Second, we study how the social planner regulates private (possibly discriminatory) actors building algorithms. Optimal regulation depends crucially on the disclosure regime. Absent disclosure, algorithms are regulated much like human decision-makers: disparate impact and disparate treatment rules dictate what is allowed. In contrast, under stringent disclosure of all underlying algorithmic inputs (data, training procedure and decision rule), once again we find an irrelevance result: private actors can use any predictive characteristic. Additionally, now algorithms strictly reduce the extent of …

See List of Professors in Jon Kleinberg University(Cornell University)

Jon Kleinberg FAQs

What is Jon Kleinberg's h-index at Cornell University?

The h-index of Jon Kleinberg has been 76 since 2020 and 122 in total.

What are Jon Kleinberg's top articles?

The articles with the titles of

From Graphs to Hypergraphs: Hypergraph Projection and its Remediation

Modeling reputation-based behavioral biases in school choice

Hypergraph patterns and collaboration structure

Replicating Electoral Success

Language Generation in the Limit

Equilibria, Efficiency, and Inequality in Network Formation for Hiring and Opportunity

The Moderating Effect of Instant Runoff Voting

Microstructures and Accuracy of Graph Recall by Large Language Models

...

are the top articles of Jon Kleinberg at Cornell University.

What are Jon Kleinberg's research interests?

The research interests of Jon Kleinberg are: algorithms, data mining, information networks, social networks, Web mining

What is Jon Kleinberg's total number of citations?

Jon Kleinberg has 123,425 citations in total.

What are the co-authors of Jon Kleinberg?

The co-authors of Jon Kleinberg are Christos Faloutsos, Jure Leskovec, Christos H PAPADIMITRIOU, Sendhil Mullainathan, Eva Tardos, Robert Kleinberg.

    Co-Authors

    H-index: 151
    Christos Faloutsos

    Christos Faloutsos

    Carnegie Mellon University

    H-index: 147
    Jure Leskovec

    Jure Leskovec

    Stanford University

    H-index: 131
    Christos H PAPADIMITRIOU

    Christos H PAPADIMITRIOU

    Columbia University in the City of New York

    H-index: 87
    Sendhil Mullainathan

    Sendhil Mullainathan

    University of Chicago

    H-index: 71
    Eva Tardos

    Eva Tardos

    Cornell University

    H-index: 64
    Robert Kleinberg

    Robert Kleinberg

    Cornell University

    academic-engine

    Useful Links