Junior Theorists Workshop 2024

Synopsis:

The 2024 Fall Junior Theorists Workshop is being held jointly by Northwestern University and Toyota Technological Institute at Chicago on December 5th – December 6th, 2023. The focus of this workshop will be on junior researchers in all areas of theoretical computer science.

This year, the workshop is being held in conjunction with the NSF TRIPODS workshop (Friday, December 6 – Saturday, December 7 at TTIC), with Friday, December 6 combining both events.

Logistics

Date: Thursday, December 5th & Friday, December 6th
Organizers: Dmitrii Avdiukhin, He Jia, and Liren Shan
Location: Mudd 3514 Northwestern University (Thursday); Toyota Technological Institute at Chicago (Friday)
Registration: click here to register

Speakers:

Rahul Ilango (MIT)
Siddharth Prasad (CMU)
Max Springer (UMD)
Aniket Murhekar (UIUC)
Mahdi Haghifam (Northeastern)
Renato Ferreira Pinto Jr. (Waterloo)
Calum Macrury (Toronto -> Columbia)
Chen Wang (Rice)
Vaishali Surianarayanan (UCSB)
Ainesh Bakshi (MIT)
Rojin Rezvan (UT Austin)
Mo Zhou (UW)
Natalie Collina (UPenn)
Nikos Zarifis (Madison)
Geelon So (UCSD)
Da Wei Zheng (UIUC)

Tentative Schedule (Central Time)

Thursday at NU:

9:00-9:20 – Breakfast
9:20-9:30 – Welcome Remarks
9:30-10:00 – Rahul Ilango. Tales of Obfuscation: Spaghetti Code Meets Derandomization, Differential Privacy, Logic, and Metacomplexity
10:00-10:30 – Siddharth Prasad. Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts
10:30-11:00 – Coffee Break
11:00-11:30 – Max Springer. Fair Polylog-Approximate Low-Cost Hierarchical Clustering
11:30-12:00 – Aniket Murhekar. Constant-Factor EFX Exists for Chores
12:00–1:00 – Lunch
1:00-2:00 – Local talks
- 1:00-1:20 – Saba Ahmadi – Replicable Online Learning
- 1:20 – 1:40 – Dmitrii Avdiukhin – Embedding Dimension of Contrastive Learning and $k$-Nearest Neighbors
- 1:40 – 2:00 – Yifan Wu – Calibration Error for Decision Making
2:00-2:30 – Coffee Break
2:30-3:00 – Calum Macrury. Random-Order Contention Resolution via Continuous Induction: Tightness for Bipartite Matching under Vertex Arrivals
3:00-3:30 – Chen Wang. The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits
3:30-4:00 – Coffee Break
4:00-4:30 – Vaishali Surianarayanan. Efficient Approximation of Fractional Hypertree Width
4:30-5:00 – Mahdi Haghifam. Connections Between Learning and Memorization in Stochastic Convex Optimization
5:00-5:30 – Renato Ferreira Pinto Jr. Directed Isoperimetry and Monotonicity Testing: A Dynamical Approach

Friday at TTIC:

9:00-9:20 – Breakfast
9:20-9:30 – Welcome Remarks: Avrim Blum
9:30-10:00 – Ainesh Bakshi. On the sudden death of thermal entanglement
10:00-10:30 – Rojin Rezvan. Prophet Secretary Against the Online Optimal
10:30-10:45 – Coffee Break
10:45-11:15 – Mo Zhou. How Does Gradient Descent Learn Features – A Local Analysis for Regularized Two-Layer Neural Networks
11:15-11:45 – Natalie Collina. Tractable Agreement Protocols
11:45-12:30 – Lunch
12:30-1:30 – Research@TTIC talk: Ali Vakilian
1:30-2:00 – Coffee Break
2:00-2:30 – Nikos Zarifis. Agnostically Learning Multi-index Models with Queries
2:30-3:00 – Geelon So. Online Consistency of the Nearest Neighbor Rule
3:00-3:30 – Da Wei Zheng. Subquadratic algorithms in minor-free digraphs: (weighted) distance oracles, decremental reachability, and more
3:30-4:00 – Coffee Break
4:00-5:00 – Panel Discussion [career issues]: Piotr Indyk (MIT), Shuchi Chawla (UT Austin), Jason Hartline (NU), Avrim Blum (TTIC)
5:00 onwards – TGIF

Titles and Abstracts

Speaker: Rahul Ilango
Title: Tales of Obfuscation: Spaghetti Code Meets Derandomization, Differential Privacy, Logic, and Metacomplexity [1, 2, 3]

Abstract: A recent breakthrough in cryptography is the construction of “program obfuscators,” which “scramble” code so that the behavior of an algorithm is unintelligible in some formal sense. In this talk, I will discuss this notion and overview three recent results that use obfuscation to answer open questions in various areas of theoretical computer science, including mathematical logic, derandomization, average-case complexity, and differential privacy. This talk will focus on giving a brief description of these results and connections, emphasizing intuition and ideas. No background in cryptography or complexity is required.

This talk is based on joint works with Badih Ghazi, Yizhi Huang, Pritish Kamath, Ravi Kumar, Jiatu Li, Pasin Manurangsi, Hanlin Ren, and Ryan Williams.

Speaker: Siddharth Prasad
Title: Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts

Abstract: The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to an integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.

Published in NeurIPS 2022 as an Oral presentation.
Joint work with Nina Balcan, Tuomas Sandholm, and Ellen Vitercik.

Speaker: Max Springer
Title: Fair Polylog-Approximate Low-Cost Hierarchical Clustering

Abstract: Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established the study of fairness in hierarchical clustering, a stronger, more structured variant of its well-known flat counterpart, though their proposed algorithm that optimizes for Dasgupta’s [2016] famous cost function was highly theoretical. Knittel et al. [2023] then proposed the first practical fair approximation for cost, however they were unable to break the polynomial-approximate barrier they posed as a hurdle of interest. We break this barrier, proposing the first truly polylogarithmic-approximate low-cost fair hierarchical clustering, thus greatly bridging the gap between the best fair and vanilla hierarchical clustering approximations.

Speaker: Aniket Murhekar
Title: Constant-Factor EFX Exists for Chores

Abstract: Fair division is an age-old problem that deals with the allocation of items among agents with diverse preferences in a fair and efficient way. It naturally arises in various real-life situations, from interpersonal to international conflicts. In the discrete setting, envy-freeness up to any item (EFX) has emerged as a compelling fairness criterion, though its existence remains one of the most important open problems in fair division. In this talk, I will present recent advances in the fair allocation of indivisible chores, focusing on the first constant-factor approximation of EFX, achieved through the novel concept of earning-restricted competitive equilibrium.

This talk is based on joint work with Jugal Garg and John Qin.

Speaker: Mahdi Haghifam
Title: Connections Between Learning and Memorization in Stochastic Convex Optimization

Abstract: Classical wisdom in machine learning suggests that ideal learning algorithms extract only the information relevant to their task from training data, avoiding the memorization of irrelevant details. However, this intuition is challenged by the success of modern overparameterized deep neural networks, which achieve high test accuracy while memorizing a substantial portion of their training data. To better understand the role of memorization in learning, we examine the interplay between memorization and generalization in the context of stochastic convex optimization—a theoretical framework well-suited for this exploration. I will show that, in general stochastic convex geometries, achieving small excess error requires any learner to memorize a constant fraction of its training data. Finally, I will discuss how these findings connect to differential privacy.

Speaker: Renato Ferreira Pinto Jr.
Title: Directed Isoperimetry and Monotonicity Testing: A Dynamical Approach

Abstract: This work explores the connection between classical isoperimetric inequalities, their directed analogues, and monotonicity testing. We study the setting of real-valued functions $f : [0,1]^d \to \mathbb{R}$ on the solid unit cube, where the goal is to test with respect to the $L^p$ distance. Our goals are twofold: to further understand the relationship between classical and directed isoperimetry, and to give a monotonicity tester with sublinear query complexity in this setting.

Our main results are 1) an $L^2$ monotonicity tester for $M$-Lipschitz functions with query complexity $\widetilde O(\sqrt{d} M^2 / \epsilon^2)$ and, behind this result, 2) the directed Poincaré inequality $\mathsf{dist}^{\mathsf{mono}}_2(f)^2 \le C \mathbb{E}[|\nabla^- f|^2]$, where the “”directed gradient”” operator $\nabla^-$ measures the local violations of monotonicity of $f$.

To prove the second result, we introduce a partial differential equation (PDE), the directed heat equation, which takes a one-dimensional function $f$ into a monotone function $f^*$ over time and enjoys many desirable analytic properties. We obtain the directed Poincaré inequality by combining convergence aspects of this PDE with the theory of optimal transport. Crucially for our conceptual motivation, this proof is in complete analogy with the mathematical physics perspective on the classical Poincaré inequality, namely as characterizing the convergence of the standard heat equation toward equilibrium.

Speaker: Calum Macrury
Title: Random-Order Contention Resolution via Continuous Induction: Tightness for Bipartite Matching under Vertex Arrivals

Abstract: We introduce a new approach for designing Random-order Contention Resolution Schemes (RCRS) via exact solution in continuous time. Given a function $c(y):[0,1] \rightarrow [0,1]$,
we show how to select each element which arrives at time $y \in [0,1]$ with probability \textit{exactly} $c(y)$.We provide a rigorous algorithmic framework for achieving this, which discretizes the time interval and also needs to sample its past execution to ensure these exact selection probabilities. We showcase our framework in the context of online contention resolution schemes for matching with random-order vertex arrivals. For bipartite graphs with two-sided arrivals, we design a $(1+ e^{-2})/2 \approx 0.567$-selectable RCRS, which we also show to be \textit{tight}. Next, we show that the presence of short odd-length cycles is the only barrier to attaining a (tight) $(1+ e^{-2})/2$-selectable RCRS on general graphs. By generalizing our bipartite RCRS, we design an RCRS for graphs with odd-length girth $g$ which is $(1+ e^{-2})/2$-selectable as $g \rightarrow \infty$. This convergence happens very rapidly: for triangle-free graphs (i.e., $g \ge 5$), we attain a $121/240 + 7/16 e^2 \approx 0.563$-selectable RCRS. Finally, for general graphs we improve on the $8/15 \approx 0.533$-selectable RCRS of (Fu et al., 2021) and design an RCRS which is at least $0.535$-selectable. Due to the reduction of (Ezra et al., 2020), our bounds yield a $0.535$-competitive (respectively, $(1+ e^{-2})/2$-competitive) algorithm for prophet secretary matching on general (respectively, bipartite) graphs under vertex arrivals.

Speaker: Chen Wang
Title: The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

Abstract: Streaming multi-armed bandits (MABs) is an important model for studying online learning with uncertainty and limited memory. For pure exploration, Assadi and Wang [STOC’20] gave an algorithm that assuming the optimality gap $\Delta$, finds the best arm with the optimal sample complexity and a single arm memory. Later, Jin et al. [ICML’21] showed that when $\Delta$ is not known, the same sample and memory efficiency could be achieved with $O(\log(1/\Delta))$ passes. These results have posed an intriguing open problem of the sample-pass trade-offs for multi-pass streaming MABs.

In this talk, we will discuss a near-optimal lower bound for pure exploration in multi-pass streaming MABs. Our main result shows that any streaming algorithm with sublinear memory and the optimal sample complexity requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Our bound nearly matches the upper bound of Jin et al. [ICML’21] and demonstrates a sharp trade-off between memory and pass.

Based on a recent paper on [COLT’24] with Sepehr Assadi.

Speaker: Vaishali Surianarayanan
Title: Efficient Approximation of Fractional Hypertree Width

Abstract: Hypergraphs are essential for effectively representing inputs in various algorithmic problems. Algorithms that leverage tree decompositions of hypergraphs having low width are used in applications such as database query optimization and constraint satisfaction problem (CSP) solving. Commonly used width measures for these decompositions include hypertree width, generalized hypertree width, and fractional hypertree width. In this talk, we present our recent work on new approximation algorithms for computing fractional hypertree width in both theory and practice.

On the theoretical side, we have two main contributions. The first is a polynomial-time approximation algorithm that, given an n-vertex, m-edge hypergraph with fractional hypertree width at most ω, produces a decomposition with fractional hypertree width $O(\omega \log n \log \omega)$. The second is an $O(\omega \log^2 \omega)$-approximation algorithm running in $n^\omega \cdot m^O(1)$ time, improving upon both the runtime and approximation quality of Marx [ACM TALG 2010]. A key component of our approach is a novel variant of Menger’s Theorem for clique separators in graphs.

On the practical side, we explore the efficient implementation and real-world applications of these algorithms. Using Hyperbench, a benchmark suite of hypergraphs derived from practical use cases, we present empirical results that demonstrate their performance and scalability.

This talk is based on two separate works whose union includes collaborations with Daniel Lokshtanov, Saket Saurabh, Jie Xue, Vika Korchemna, Anikait Mundhra, and Ajaykrishnan E. S.

Speaker: Ainesh Bakshi
Title: On the sudden death of thermal entanglement

Abstract: Entanglement is the crucial phenomenon that distinguishes quantum systems from classical ones. Specifically, we consider entanglement in many-body systems at thermal equilibrium as a function of the temperature. Systems at thermal equilibrium at sufficiently low temperatures are demonstrably entangled. We investigate whether these systems remain entangled as the temperature increases.
We prove that above a fixed constant temperature, the system does not exhibit any entanglement and behaves entirely classically. This proof of the sudden death of thermal entanglement cuts against a large body of prior work, both rigorous and computational, which only gives mild limits on entanglement at constant temperatures. Our proof falls out of a new connection between lack of entanglement and an algorithm for preparing thermal states on a quantum computer.

Speaker: Rojin Rezvan
Title: Prophet Secretary Against the Online Optimal

Abstract: We study the prophet secretary problem, a well-studied variant of the classic prophet inequality, where values are drawn from independent known distributions but arrive in uniformly random order. Upon seeing a value at each step, the decision-maker has to either select it and stop or irrevocably discard it. Traditionally, the chosen benchmark is the expected reward of the prophet, who knows all the values in advance and can always select the maximum one. In this work, we study the prophet secretary problem against a less pessimistic but equally well-motivated benchmark; the online optimal. Here, the main goal is to find polynomial-time algorithms that guarantee near-optimal expected reward. As a warm-up, we present a quasi-polynomial time approximation scheme (QPTAS) through careful discretization and non-trivial bundling processes. Using the toolbox developed for t he QPTAS, coupled with a novel front-loading technique that enables us to reduce the number of decisions we need to make, we are able to remove the dependence on n in the exponent and obtain a polynomial time approximation scheme (PTAS) for this problem.

Speaker: Mo Zhou
Title: How Does Gradient Descent Learn Features – A Local Analysis for Regularized Two-Layer Neural Networks

Abstract: The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. We further strengthen this local convergence analysis by incorporating early-stage feature learning analysis. Our results demonstrate that feature learning not only happens at the initial gradient steps, but can also occur towards the end of training.

Based on joint work with Rong Ge.

Speaker: Natalie Collina
Title: Tractable Agreement Protocols

Abstract: We present an efficient method to transform any machine learning model into an interactive system that collaborates with another party, such as a human, to improve prediction accuracy and reach a consensus through iterative feedback. The requirements for each party are computationally and statistically tractable generalizations of Bayesian rationality. These conditions apply even in settings without prior knowledge, significantly generalizing Aumann’s foundational “agreement theorem.” In our protocol, the machine learning model begins by making a prediction. The human then responds either by agreeing or by providing feedback, prompting the model to update its prediction. This back-and-forth continues until both parties reach a consensus. In this presentation we focus on the one-dimensional setting, recovering the state of the art convergence results (by Aaronson ’05) under weaker assumptions. Our approach also extends to multi-dimensional predictions and multi-party settings with minimal added complexity. Our protocols are based on simple, efficiently maintainable conditions and result in predictions that are more accurate than any single party’s alone.

Speaker: Nikos Zarifis
Title: Agnostically Learning Multi-index Models with Queries

Abstract: We study the power of query access for the task of agnostic learning under the Gaussian distribution. In the agnostic model, no assumptions are made on the labels and the goal is to compute a hypothesis that is competitive with the best-fit function in a known class, i.e., it achieves error $\mathrm{opt}+\epsilon$, where $\mathrm{opt}$ is the error of the best function in the class. We focus on a general family of Multi-Index Models (MIMs), which are $d$-variate functions that depend only on few relevant directions, i.e., have the form $g(\mathbf{W} \mathbf{x})$ for an unknown link function $g$ and a $k \times d$ matrix $\mathbf{W}$. Multi-index models cover a wide range of commonly studied function classes, including constant-depth neural networks with ReLU activations, and intersections of halfspaces.

Our main result shows that query access gives significant runtime improvements over random examples for agnostically learning MIMs. Under standard regularity assumptions for the link function (namely, bounded variation or surface area), we give an agnostic query learner for MIMs with complexity $O(k)^{\mathrm{poly}(1/\epsilon)} \; \mathrm{poly}(d) $. In contrast, algorithms that rely only on random examples inherently require $d^{\mathrm{poly}(1/\epsilon)}$ samples and runtime, even for the basic problem of agnostically learning a single ReLU or a halfspace.

Our algorithmic result establishes a strong computational separation between the agnostic PAC and the agnostic PAC+Query models under the Gaussian distribution. Prior to our work, no such separation was known — even for the special case of agnostically learning a single halfspace, for which it was an open problem first posed by Feldman. Our results are enabled by a general dimension-reduction technique that leverages query access to estimate gradients of (a smoothed version of) the underlying label function.

Speaker: Geelon So
Title: Online Consistency of the Nearest Neighbor Rule

Abstract: We study the 1-nearest neighbor learner for online classification in the realizable setting (R* = 0). It is consistent under conditions far broader than previously known (i.e. its average mistake rate eventually vanishes).

Stated in the language of non-worst-case online learning, sequences on which 1-NN fails to learn are extremely rare—under suitable classes of measures, they have measure zero. The “suitable” non-worst-case classes we introduce bridge two recent, independent lines of research in smoothed online learning and optimistic universal learning.

Joint work with Sanjoy Dasgupta.

Speaker: Da Wei Zheng
Title: Subquadratic algorithms in minor-free digraphs: (weighted) distance oracles, decremental reachability, and more

Abstract: Le and Wulff-Nilsen [SODA ’24] initiated a systematic study of VC set systems to unweighted Kh-minor-free directed graphs. We extend their results in the following ways:

We present the first application of VC set systems for real-weighted minor-free digraphs to build the first exact subquadratic-space distance oracle with $O(\log n)$ query time. Prior work using VC set systems only applied in unweighted and integer weighted digraphs.
We describe a unified system for analyzing the VC dimension of balls and the LP set system (based on Li-Parter [STOC ’19]) of Le-Wulff-Nilsen [SODA ’24] using pseudodimension. This is a major conceptual contribution that allows for both improving our understanding of set systems in digraphs as well as improving the bound of the LP set system in directed graphs to h−1.
We present the first application of these set systems in a dynamic setting. Specifically, we construct decremental reachability oracles with subquadratic total update time and constant query time. Prior to this work, it was not known if this was possible to construct oracles with subquadratic total update time and polylogarithmic query time, even in planar digraphs.

Local Talks

Speaker: Saba Ahmadi
Title: Replicable Online Learning

Abstract: In this talk, I discuss the concept of algorithmic replicability introduced by Impagliazzon et al.’22 in an online setting. In our model, the input sequence received by the online learner is generated from time-varying distributions chosen by an adversary (obliviously). Our objective is to design low-regret online algorithms that with high probability, produce the exact same sequence of actions when run on two independently sampled input sequences generated as described above. We refer to such algorithms as adversarially replicable. Our model generalizes to capture both adversarial and iid input sequences, as well as their mixtures, which can be modeled by setting certain distributions as point-masses. I discuss adversarially replicable online learning algorithms for online linear optimization and the experts problem that achieve sub-linear regret. Finally, I briefly discuss some further extensions and lower bounds on the regret that any replicable online algorithm must incur.

Based on joint work with Siddharth Bhandari and Avrim Blum.

Speaker: Dmitrii Avdiukhin
Title: Embedding Dimension of Contrastive Learning and $k$-Nearest Neighbors
Abstract: We study the embedding dimension of distance comparison data in two settings: contrastive learning and $k$-nearest neighbors ($k$-NN). In both cases, the goal is to find the smallest dimension $d$ of an $\ell_p$-space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used $\ell_2$-distance, we get matching upper and lower bounds in both settings.

In contrastive learning, we are given $m$ labeled samples of the form $(x_i, y_i^+, z_i^-)$ representing the fact that the positive example $y_i$ is closer to the anchor $x_i$ than the negative example $z_i$. We show that for representing such dataset in $\ell_2$, dimension $d = \Theta(\sqrt{m})$ is necessary and sufficient.

In the $k$-NN setting, for each of the data points we are given an ordered set of the closest points. We show that for preserving $k$-NN in $\ell_p$ for any $p \ge 1$, dimension $d = poly(k)$ is necessary and sufficient.

Speaker: Yifan Wu
Title: Calibration Error for Decision Making

Abstract: Calibration allows predictions to be reliably interpreted as probabilities by decision makers. We propose a decision-theoretic calibration error, the Calibration Decision Loss (CDL), defined as the maximum improvement in decision payoff obtained by calibrating the predictions, where the maximum is over all payoff-bounded decision tasks. Vanishing CDL guarantees the payoff loss from miscalibration vanishes simultaneously for all downstream decision tasks. We show separations between CDL and existing calibration error metrics, including the most well-studied metric Expected Calibration Error (ECE). Our main technical contribution is a new efficient algorithm for online calibration that achieves near-optimal $O(\log T \cdot T^{-1/2})$ expected CDL, bypassing the $\Omega(T^{−0.472})$ lower bound for ECE by Qiao and Valiant (2021).

Northwestern CS Theory Group

Synopsis:

Logistics

Speakers:

Tentative Schedule (Central Time)

Titles and Abstracts

Local Talks

Upcoming events

Recent Blog Posts

Recent Posts

Recent Comments

Archives

Categories

Meta