Synopsis:

The 2023 Fall Junior Theorists Workshop is being held jointly by Northwestern University and Toyota Technological Institute at Chicago on November 30th – December 1st, 2023. The focus of this workshop will be on junior researchers in all areas of theoretical computer science.

Speakers:

  • Max Hopkins (UCSD)
  • Allen Liu (MIT)
  • Zihan Tan (Rutgers University)
  • Bailey Flanigan (Carnegie Mellon)
  • Georgy Noarov (UPENN)
  • Lunjia Hu (Stanford University)
  • Arun Jambulapati (Simons Institute)
  • Tegan Wilson (Cornell University)
  • He Jia (GA Tech)
  • Shyan Akmal (MIT)
  • Thuy-Duong Vuong (Stanford University)
  • Yinzhan Xu (MIT)
  • Linda Cai (Princeton University)
  • Arsen Vasilyan (MIT)
  • Kate Donahue (Cornell University)

Logistics

Tentative Schedule (Central Time)

Thursday at NU:

  • 9:00-9:25 – Breakfast
  • 9:25-9:35 – Welcome Remarks
  • 9:35-10:05 – Max Hopkins.  Realizable Learning is All You Need
  • 10:05-10:35 – Allen Liu.  Learning quantum Hamiltonians at any temperature in polynomial time
  • 10:35-11:00 – Coffee Break
  • 11:00-11:30 – Zihan Tan.  On (1+ε)-Approximate Flow Sparsifiers
  • 11:30-12:00 – Bailey Flanigan.  The power of public spirit in voting
  • 12:001:00 – Lunch
  • 1:00-1:30 – Georgy Noarov.  High-Dimensional Prediction for Sequential Decision Making
  • 1:30-2:00 – Lunjia Hu.  A Unifying Theory of Distance from Calibration
  • 2:00-2:30 – Coffee Break
  • 2:30-3:30 – Arun Jambulapati.  New Techniques in Accelerated Methods
  • 2:30-3:30 – Tegan Wilson. Optimal Oblivious Reconfigurable Networks
  • 3:30-4:30Coffee Break
  • 4:30-5:00(Local session) Dmitrii Avdiukhin. Optimal Sample Complexity of Contrastive Learning

Friday at TTIC:

  • 9:00-9:25 – Breakfast
  • 9:25-9:35 – Welcome Remarks
  • 9:35-10:05 – He Jia.  Robustly Learning Affine Transformations with Asymptotically Optimal Error
  • 10:05-10:35 – Shyan Akmal.  MAJORITY-3SAT (and Related Problems) in Polynomial Time
  • 10:35-11:00 – Coffee Break
  • 11:00-11:30 – Thuy-Duong Vuong.  Optimal sampling algorithms using entropic independence
  • 11:30-12:00 – Yinzhan Xu.  Fredman’s Trick Meets Dominance Product: Fine-Grained Complexity of Unweighted APSP, 3SUM Counting, and More
  • 12:00-12:30  Lunch
  • 12:30-1:30 – Research@TTIC talk
  • 1:30-2:00 – Coffee Break
  • 2:00-2:30 – Linda Cai.  Pandora’s Problem with Nonobligatory Inspection: Optimal Structure and a PTAS
  • 2:30-3:00 – Arsen Vasilyan. Testing Distributional Assumptions of Learning Algorithms
  • 3:00-3:30 – Coffee Break
  • 3:30-4:00 – Kate Donahue.  Stability, Optimality, and Fairness in Federated Learning
  • 4:00-4:20 – (Local Session) Han Shao. Trustworthy Machine Learning under Social and Adversarial Data Sources
  • 4:20-4:40 – (Local Session) Shashank Srivastava. List Decoding of Tanner Codes from Distance Certificates 

Titles and Abstracts

Speaker: Max Hopkins
Title: Realizable Learning is All You Need

Abstract: The equivalence of realizable and agnostic learning in Valiant’s Probably Approximately Correct (PAC)-Learning model is among the most classical results in learning theory, dating all the way back to Vapnik and Chervonenkis’ introduction of statistical learning. Roughly speaking, this surprising equivalence states:

Given a set $X$ and family of binary classifiers $H$, the ability to learn $h \in H$ from examples of the form $(x, h(x))$ is equivalent to a (seemingly) harder task: given samples from any distribution $D$ over $X \times \{0,1\}$, find the best approximation to $D$ in $H$.

Traditionally, the proof of this fact is brittle and somewhat complex, relying on strong third-party properties of the class (e.g. VC-dimension, uniform convergence). In this talk, we give a new `model-independent’ proof of their equivalence via elementary blackbox reduction. Time willing, we’ll discuss the inherent connections between learning and combinatorics that underlie this reduction, as well as new implications beyond the PAC model.

Based on joint work with Daniel Kane, Shachar Lovett, and Gaurav Mahajan

 

Speaker: Allen Liu
Title: Learning quantum Hamiltonians at any temperature in polynomial time

Abstract: We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/Tr(e^{-\beta H})$ at a known inverse temperature $\beta>0$.
Anshu, Arunachalam, Kuwahara, and Soleimanifar [AAKS20] gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $\epsilon$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra22, AA23], with prior work only resolving this in the limited cases of high temperature [HKT21] or commuting terms [AAKS21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $\epsilon$ from polynomially many copies of the Gibbs state at any constant $\beta > 0$.

Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.

 

Speaker: Zihan Tan
Title: On $(1+\epsilon)$-Approximate Flow Sparsifiers

Abstract: Given a large graph G with a subset $|T|=k$ of its vertices called terminals, a quality-q flow sparsifier is a small graph $H$ that contains $T$ and preserves all multicommodity flows that can be routed between terminals in $T$, to within factor $q$. The problem of constructing flow sparsifiers with good (small) quality and (small) size has been a central problem in graph compression for decades.

A natural approach of constructing $O(1)$-quality flow sparsifiers, which was adopted in most previous constructions, is contraction. Andoni, Krauthgamer, and Gupta constructed a sketch of size $f(k,\epsilon)$ that stores all feasible multicommodity flows up to factor $(1+\epsilon)$, raised the question of constructing quality-$(1+\epsilon)$ flow sparsifiers whose size only depends on $k,\epsilon$ (but not the number of vertices in the input graph $G$), and proposed a contraction-based framework towards it using their sketch result. In this paper, we settle their question for contraction-based flow sparsifiers, by showing that quality-$(1+\epsilon)$ contraction-based flow sparsifiers with size $f(\epsilon)$ exist for all 5-terminal graphs, but not all 6-terminal graphs. Our hardness result on 6-terminal graphs improves upon a recent hardness result by Krauthgamer and Mosenzon on exact (quality-1) flow sparsifiers, for contraction-based constructions. Our construction and proof utilize the notion of tight span in metric geometry.

This talk is based on joint work with Yu Chen.

 

Speaker: Bailey Flanigan
Title: The power of public spirit in voting

Abstract: A key promise of voting is that, by accounting for all constituents’ preferences, it produces decisions that benefit society overall. It is alarming, then, that all deterministic voting rules have unbounded distortion — i.e., arbitrarily suboptimal social welfare — even under realistic conditions. In our paper Distortion Under Public-Spirited Voting, we show that this impossibility is circumvented when voters are public-spirited: that is, when deciding how to rank alternatives, voters weigh the common good in addition to their own interests. We first generalize the standard voting model to capture public-spirited voting behavior. We then show that public spirit can dramatically — and in some senses, monotonically — reduce the distortion; for some rules, it even drops the distortion from unbounded to constant. We further find that these benefits are robust to practically-motivated imperfect conditions. Taken together, our results suggest an implementable approach to improving the welfare outcomes of elections: democratic deliberation, an already-mainstream practice shown to increase voters’ public spirit.

 

Speaker: Georgy Noarov
Title: High-Dimensional Prediction for Sequential Decision Making

Abstract: We study the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers. We give efficient algorithms for solving this problem, as well as a number of applications that stem from choosing an appropriate set of conditioning events.

For example, we can efficiently make predictions targeted at polynomially many decision makers, giving each of them optimal swap regret if they best-respond to our predictions. We generalize this to online combinatorial optimization, where the decision makers have a very large action space, to give the first algorithms offering polynomially many decision makers no regret on polynomially many subsequences that may depend on their actions and the context. We apply these results to get efficient no-subsequence-regret algorithms in extensive-form games (EFGs), yielding a new family of regret guarantees for EFGs that generalizes some existing EFG regret notions, e.g. regret to informed causal deviations, and is generally incomparable to other known such notions.

Next, we develop a novel transparent alternative to conformal prediction for building valid online adversarial multiclass prediction sets. We produce class scores that downstream algorithms can use for producing valid-coverage prediction sets, as if these scores were the true conditional class probabilities. We show this implies strong conditional validity guarantees, including set-size-conditional and multigroup-fair coverage, for polynomially many downstream prediction sets. Moreover, our class scores can be guaranteed to have improved L2 loss, cross-entropy loss, and generally any separable Bregman loss, compared to any collection of benchmark models, yielding a high-dimensional real-valued version of omniprediction.

 

Speaker: Lunjia Hu
Title: A Unifying Theory of Distance from Calibration

Abstract: We all love calibrated predictors for their reliable and interpretable predictions: among the days predicted to have 30% chance of rain, the true frequency of rain is also 30%. Understanding when and which machine learning models are calibrated has become a popular research topic. Conducting such research requires answering a basic question: how should we measure calibration? A dizzying array of calibration measures have been used in the literature (ECE, binned ECE, smooth calibration, kernel calibration), but do they really measure calibration? We establish a rigorous framework that answers these questions and allows us to meaningfully identify which measures when used in which fashion are consistent with a ground-truth distance to calibration. Our results include the first tight bounds on the quantitative relationship between calibration measures. We show that the default measure ECE (expected calibration error) is only an upper bound on the distance to calibration and can lead to serious overestimation. We also heal the common frustration of choosing the bin size when using binned variants of the ECE.

Joint work with Jarosław Błasiok (Columbia University), Parikshit Gopalan (Apple), Preetum Nakkiran (Apple).

 

Speaker: Arun Jambulapati
Title: New Techniques in Accelerated Methods

Abstract: Accelerated optimization methods are at the heart of many algorithms in modern computer science. However, many important problems possess additional structure, which traditional accelerated methods cannot utilize.

In this talk I will discuss a new optimization primitive, called the ball optimization oracle, and give a near-optimal accelerated algorithm which uses it. I will then discuss several applications of this framework to give improved algorithms for several problems in modern theoretical computer science, such as regression, differentially private optimization, and minimizing the maximum loss function.

This talk is based on joint work with Yair Carmon, Qijia Jiang, Yujia Jin, Yin Tat Lee, Daogao Liu, Yang P. Liu, Aaron Sidford, and Kevin Tian.

 

Speaker: Tegan Wilson
Title: Optimal Oblivious Reconfigurable Networks

Abstract: Oblivious routing has a long history in both the theory and practice of networking. In this talk I will initialize the formal study of oblivious routing in the context of reconfigurable networks, a new architecture that has recently come to the fore in datacenter networking. I focus on the tradeoffs between maximizing throughput and minimizing latency. For every constant guaranteed throughput rate, I characterize the minimum latency (up to a constant factor) achievable by an oblivious reconfigurable network design. The tradeoff curve turns out to be surprisingly subtle: it has an unexpected scalloped shape, reflecting the fact that load balancing, which I show is necessary for these oblivious designs, becomes more costly when average path length is not an integer, since equalizing the load balanced path lengths is not achievable.

This talk is based on joint work with Daniel Amir, Nitika Saran, Robert Kleinberg, Hakim Weatherspoon, Vishal Shrivastav, and Rachit Agarwal.

 

Speaker: He Jia
Title: Robustly Learning Affine Transformations with Asymptotically Optimal Error

Abstract: We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying the fraction of corruption. Our key innovation is a new approach to ICA (even to outlier-free ICA) that circumvents the difficulties in the classical method of moments and instead relies on a new geometric certificate of correctness of an affine transformation. Our algorithm is based on a new method that iteratively improves an estimate of the unknown affine transformation whenever the requirements of the certificate are not met.

 

Speaker: Shyan Akmal
Title: MAJORITY-3SAT (and Related Problems) in Polynomial Time

Abstract: Majority-SAT is the problem of determining whether a given $n$-variable formula has at least $2^{n-1}$ satisfying assignments. Majority-SAT and related problems have been studied extensively in AI communities interested in the complexity of probabilistic planning and inference. Although this problem has been known to be PP-complete for over 40 years, the complexity of a natural variant has remained open: Majority-$k$SAT, where the input is a formula in conjunctive normal form with clause width at most $k$ (i.e., the input is a $k$-CNF formula).

In this talk, I discuss work done by Ryan Williams and myself, proving that for every fixed positive integer $k$, Majority-$k$SAT is in P. More generally, for any fixed positive integer k and rational $\rho \in (0,1)$ with bounded denominator, we present an algorithm which determines whether a given $k$-CNF has at least $\rho \cdot 2^n$ satisfying assignments, in deterministic linear time (whereas the previous fastest algorithm was exhaustive search, running in exponential time).

The talk will focus on presenting the main ideas behind these algorithms, and highlight the implications for some additional problems related to counting satisfying assignments.

 

Speaker: Thuy-Duong Vuong
Title: Optimal sampling algorithms using entropic independence

Abstract: Sampling is a fundamental task with many applications in algorithm design and in machine learning. Sampling from high-dimensional discrete distributions, such as Ising models, has long been studied for its applications in statistical physics, image modeling, and neural networks, as well as its connection to complexity theory. While Markov chain Monte Carlo (MCMC) is a widely used method for sampling, many gaps remain in understanding its performance.

In this talk, I will show tight analysis for Markov chains to sample from various discrete distributions using discrete notions of log-concavity such as entropic and spectral independence. Specifically, I will discuss average-case versions of entropic and spectral independence and their applications in bounding the mixing time of Markov chains.

 

Speaker: Yinzhan Xu
Title: Fredman’s Trick Meets Dominance Product: Fine-Grained Complexity of Unweighted APSP, 3SUM Counting, and More

Abstract: We carefully combine Fredman’s trick [SICOMP’76] and Matou\v{s}ek’s approach for dominance product [IPL’91] to obtain powerful results in fine-grained complexity:

  • Under the hypothesis that APSP for undirected graphs with edge weights in $\{1, 2, \ldots, n\}$ requires $n^{3-o(1)}$ time (when $\omega=2$), we show a variety of conditional lower bounds, including an $n^{7/3-o(1)}$ lower bound for unweighted directed APSP and an $n^{2.2-o(1)}$ lower bound for computing the Minimum Witness Product between two $n \times n$ Boolean matrices, even if $\omega=2$, improving upon their trivial $n^2$ lower bounds. Our techniques can also be used to reduce the unweighted directed APSP problem to other problems. In particular, we show that (when $\omega = 2$), if unweighted directed APSP requires $n^{2.5-o(1)}$ time, then Minimum Witness Product requires $n^{7/3-o(1)}$ time.
  • We show that, surprisingly, many central problems in fine-grained complexity are equivalent to their natural counting versions. In particular, we show that Min-Plus Product and Exact Triangle are subcubically equivalent to their counting versions, and 3SUM is subquadratically equivalent to its counting version.
  • We obtain new algorithms using new variants of the Balog-Szemeredi-Gowers theorem from additive combinatorics. For example, we get an $O(n^{3.83})$ time deterministic algorithm for exactly counting the number of shortest paths in an arbitrary weighted graph, improving the textbook $\widetilde{O}(n^4)$ time algorithm. We also get faster algorithms for 3SUM in preprocessed universes, and deterministic algorithms for 3SUM on monotone sets in $\{1, 2, \ldots, n\}^d$.

Joint work with Timothy M. Chan and Virginia Vassilevska Williams.

 

Speaker: Linda Cai
Title: Pandora’s Problem with Nonobligatory Inspection: Optimal Structure and a PTAS

Abstract: Weitzman introduced Pandora’s box problem as a mathematical model of sequential search with inspection costs, in which a searcher is allowed to select a prize from one of n alternatives. Several decades later, Doval introduced a close version of the problem, where the searcher does not need to incur the inspection cost of an alternative, and can select it uninspected. Unlike the original problem, the optimal solution to the nonobligatory inspection variant is proved to need adaptivity, and by recent work of [FLL22], finding the optimal solution is NP-hard.

Our first main result is a structural characterization of the optimal policy: We show there exists an optimal policy that follows only two different pre-determined orders of inspection, and transitions from one to the other at most once. Our second main result is a polynomial time approximation scheme (PTAS). Our proof involves a novel reduction to a framework developed by [FLX18], utilizing our optimal two-phase structure. Furthermore, we show Pandora’s problem with nonobligatory inspection belongs to class NP, which by using the hardness result of [FLL22], settles the computational complexity class of the problem.

 

Speaker: Arsen Vasilyan
Title: Testing Distributional Assumptions of Learning Algorithms

Abstract: There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs (A,T), such that if the distribution on examples in the data passes the tester T then one can safely trust the output of the agnostic learner A on the data.

To demonstrate the power of the model, we apply it to many classical problems in the theory of supervised learning.

Talk based on papers joint with Ronitt Rubinfeld, Adam Klivans, Aravind Gollakota and Konstantinos Stavropoulos

 

Speaker: Kate Donahue
Title: Stability, Optimality, and Fairness in Federated Learning

Abstract: Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error or fairness. In this talk, I describe three papers analyzing federated learning through the lens of cooperative game theory, all joint with Jon Kleinberg.

In the first paper, we discuss fairness in federated learning, which relates to how error rates differ between federating agents. In this work, we consider two notions of fairness: egalitarian fairness (which aims to bound how dissimilar error rates can be) and proportional fairness (which aims to reward players for contributing more data). For egalitarian fairness, we obtain a tight multiplicative bound on how widely error rates can diverge between agents federating together. For proportional fairness, we show that sub-proportional error (relative to the number of data points contributed) is guaranteed for any individually rational federating coalition. The second paper explores optimality in federated learning with respect to an objective of minimizing the average error rate among federating agents. In this work, we provide and prove the correctness of an efficient algorithm to calculate an optimal (error minimizing) arrangement of players. This paper builds on our prior work on stability in federated learning, and allows us to give the first constant-factor bound on the performance gap between stability and optimality, proving that the total error of the worst stable solution can be no higher than 9 times the total error of an optimal solution (Price of Anarchy bound of 9).

Local Sessions

Speaker: Dmitrii Avdiukhin (postdoc, Northwestern University)
Title: Optimal Sample Complexity of Contrastive Learning

Abstract: Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. In this talk, I will discuss sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. In our work, we give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general $\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning $\ell_p$-distances for integer $p$. For any $p \ge 1$ we show that $\tilde \Theta(\min(nd,n^2))$ labeled tuples are necessary and sufficient for learning $d$-dimensional representations of $n$-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.

This talk is based on joint work with Noga Alon, Dor Elboim, Orr Fischer, and Grigory Yaroslavtsev.

 

Speaker: Han Shao (final-year PhD student, TTIC)
Title: Trustworthy Machine Learning under Social and Adversarial Data Sources

Abstract: Machine learning has witnessed remarkable breakthroughs in recent years. Most machine learning techniques assume that the training and test data are sampled from an underlying distribution and aim to find a predictor with low population loss. However, data may be generated by strategic individuals, collected by self-interested agents, possibly poisoned by adversarial attackers, and used to create predictors, models, and policies satisfying multiple objectives. As a result, predictors may underperform. To ensure the success of machine learning, it is crucial to develop trustworthy algorithms capable of handling these factors. In this talk, I will mainly focus on the factors of self-interested data collectors and adversarial attackers.

 

Speaker: Shashank Srivastava (final-year PhD student, TTIC)
Title: List Decoding of Tanner Codes from Distance Certificates

Abstract: In the theory of error-correcting codes, list decoding is a relaxation of unique decoding useful for tolerating higher levels of noise. Design of list decoding algorithms for algebraic codes, such as Reed-Solomon, has found numerous applications in error correction, as well as in complexity theory and pseudorandomnness. However, we know of very few techniques for list decoding algorithms when the code may not have such algebraic structure, such as Tanner codes which are defined using sparse expander graphs. In this talk, I will describe how continuous relaxations based on the Sum-of-Squares hierarchy can be used to design the first list decoding algorithm for Tanner codes of Sipser-Spielman [IEEE Trans. Inf. Theory 1996]. The techniques include a novel proof of the Johnson bound for arbitrary codes, distance proofs for pseudocodewords, and correlation rounding for convex hierarchies. I will also discuss extensions to a distance amplification scheme of Alon-Edmonds-Luby [FOCS 1995].

Based on joint work with Fernando Granha Jeronimo and Madhur Tulsiani.