Synopsis

The 2020 Junior Theorists Workshop is part of the Northwestern CS Quarterly Theory Workshop Series. The focus of this workshop will be on junior researchers in all areas of theoretical computer science. The speakers for this workshop are:

  • Yu Chen (UPenn)
  • Sitan Chen (MIT)
  • Albert Cheu (Northeastern)
  • Mahsa Derakhshan (UMD)
  • Fernando Granha Jeronimo (UChicago)
  • Christopher Jung (UPenn)
  • Frederic Koehler (MIT)
  • Weihao Kong (Stanford/UW)
  • Jason Li (CMU)
  • Kuikui Liu (UW)
  • Nicole Wein (MIT)
  • Fang-Yi Yu (Harvard)
  • Peng Zhang (Yale)
  • Jeroen Zuiddam (NYU)

Logistics

  • Organizers: Hedyeh Beyhaghi and Jinshuo Dong
  • Date: Thursday-Friday, Dec 17-18, 2020.
  • Location: Virtual (on Gather.Town and Zoom). Further details to come.
  • Registration: Registration is free but required. Registration is possible even during the event. The login information will automatically be sent to your email address.

Tentative Schedule

All times are Central (CST), UTC -6

Thursday:

  • 10:00-10:05: Opening Remarks
  • 10:05-10:35: Peng Zhang (Yale)
    Balancing covariates to improve the design of randomized controlled trials (Video)
  • 10:35-11:05: Jeroen Zuiddam (NYU)
    Tensor Tools for Problems in Combinatorics and Complexity (Video)
  • 11:05-11:30: Q&A / Coffee
  • 11:30-12:00: Fernando Jeronimo (UChicago)
    Decoding Ta-Shma’s Binary Codes (Video)
  • 12:00-12:30: Kuikui Liu (UW)
    New Tools for Analysis of Markov Chains via High-Dimensional Expansion (Video)
  • 12:30-2:00: Q&A / Lunch (on you own)
  • 2:00-2:30: Frederic Koehler (MIT)
    Learning Some Ill-Conditioned Gaussian Graphical Models (Video)
  • 2:30-3:00: Weihao Kong (Stanford/UW)
    Meta-learning for mixed linear regression (Video)
  • 3:00-3:30: Q&A / Coffee
  • 3:30-4:00: Albert Cheu (Northeastern)
    Differential Privacy in the Shuffle Model (Video)
  • 4:00-4:30: Sitan Chen (MIT)
    Learning Deep ReLU Networks when Gradient Descent Fails (Video)
  • 4:30-5:00: Q&A / Coffee

Friday:

  • 10:00-10:30: Nicole Wein (MIT) (Video)
    Approximating the Diameter of a Graph
  • 10:30-11:00: Yu Chen (UPenn) (Video)
    Near-linear Size Hypergraph Cut Sparsifiers 
  • 11:00-11:30: Q&A / Coffee
  • 11:30-12:00: Jason Li (CMU) (Video)
    Deterministic Min-cut in Poly-logarithmic Max-flows
  • 12:00-12:30: Christopher Jung (UPenn) (Video)
    Moment Multicalibration for Uncertainty Estimation
  • 12:30-2:00: Q&A / Lunch (on you own)
  • 2:00-2:30: Mahsa Derakhshan (UMD)
    Optimizing Personalized Reserve Prices (Video)
  • 2:30-3:00: Fang-Yi Yu (Harvard)
    Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective (Video)
  • 3:00-3:30: Q&A / Coffee
  • 3:30-4:30: Northwestern spotlight talks:
    Shravas Kajana Rao
    Degree vs. Approximate Degree (Video)
    Yiding Feng
    Batching and Optimal Multi-stage Bipartite Allocations (Video)
    Saba Ahmadi
    An Algorithm for Multi-Attribute Diverse Matching (Video)
    Sheng Yang
    Scheduling on Spot Instances (Video)
  • 4:30-5:00: Q&A / Coffee

Titles and Abstracts

Speaker: Jeroen Zuiddam (NYU)
Title: Tensor Tools for Problems in Combinatorics and Complexity
Abstract: Understanding “independence” is a driving force behind progress on problems in algebraic complexity theory (the complexity of matrix multiplication), extremal combinatorics (the cap set problem and the sunflower problem), quantum information theory (the resource theory of entanglement), property testing, and communication complexity. We will talk about tensors as a tool to understand independence, and see how a recent geometric approach called Geometric Rank (Kopparty–Moshkovitz–Zuiddam, 2020) solves a problem of Strassen about matrix multiplication. Finally, we will touch on an informal exploration of a totally different potential approach to understanding independence, which is in line with the celebrated Lovász theta function.

Speaker: Frederic Koehler (MIT)
Title: Learning Some Ill-Conditioned Gaussian Graphical Models
Abstract: Gaussian Graphical models have wide-ranging applications in machine learning and the natural and social sciences where they are one of the most popular ways to model statistical relationships between observed variables. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and the goal is to learn the model assuming the underlying model is sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they all require that the precision matrix is in some way well-conditioned. Here we give the first fixed polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples and without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains. Joint work with Jon Kelner, Raghu Meka, and Ankur Moitra.

 

Speaker: KuiKui Liu (UW)
Title: New Tools for Analysis of Markov Chains via High-Dimensional Expansion
Abstract: The Markov Chain Monte Carlo paradigm is one of the most widely used methods for sampling from challenging distributions, both practically and theoretically. We discuss a recent line of work that leverages the perspective of high-dimensional expanders and their local-to-global properties to analyze Markov chains in the discrete setting. We will highlight numerous connections with other areas including geometry of polynomials, statistical physics, and more. Specific applications include sampling from bases of matroids, and spin systems in statistical physics. Based on several joint works with Nima Anari, Zongchen Chen, Shayan Oveis Gharan, Eric Vigoda, and Cynthia Vinzant.

 

 

Speaker: Peng Zhang (Yale)
Title: Balancing covariates to improve the design of randomized controlled trials
Abstract: Randomized controlled trials (RCTs) are widely used to test effectiveness of new drugs, interventions, policies, features of a system / app. We study how to randomly assign experimental subjects into different treatment groups (for example, to receive a drug or a placebo in a clinical trial) to balance covariates — characteristics of the subjects collected before we run an experiment. Balancing covariates improves the precision of estimation of treatment effects in RCTs if covariates are predictive of treatment outcomes; randomization is necessary for statistical inference and guarantees that our assignment is never too bad if covariates are not predictive of outcomes. By exploiting recent advances in algorithmic discrepancy theory, we obtain random assignments with a nearly optimal tradeoff between the gain we have if covariates are predictive of outcomes and the loss we suffer if covariates are not. We will discuss the experimental design problem we address, its connection with discrepancy theory, our results, and future directions. This is joint work with Chris Harshaw, Fredrik Sävje and Dan Spielman.

 

 

Speaker: Jason Li (CMU)
Title: Deterministic Min-cut in Poly-logarithmic Max-flows
Abstract: We give a deterministic (global) min-cut algorithm for weighted undirected graphs that runs in time $m^{1+\epsilon}$ plus polylog$(n)$ max-flow computations. Using the current best max-flow algorithms, this results in an overall running time of $\tilde O(m \cdot \min(\sqrt{m}, n^{2/3}))$ for weighted graphs, and $m^{4/3+o(1)}$ for unweighted (multi)-graphs. This is the first improvement in the running time of deterministic algorithms for the min-cut problem on weighted graphs and unweighted multi-graphs since the early 1990s when a running time bound of $\tO(mn)$ was established for this problem. A key component in our algorithm is the use of deterministic expander decompositions. Conceptually, this allows us to build on, and generalize, the recent breakthrough for this problem by Kawarabayashi and Thorup (STOC 2015) who obtained a running time of $\tO(m)$ {\em for simple graphs}. Although our running time is larger, our algorithm is more general in that it can handle edge weights and/or parallel edges. Our result makes progress toward closing the gap between the randomized and deterministic runtimes for the min-cut problem on general weighted graphs that has existed for over 20 years. Joint work with Debmalya Panigrahi (Duke).

 

 

Speaker: Mahsa Derakhshan (UMD)
Title: Optimizing Personalized Reserve Prices
Abstract: Multi-unit VCG auctions with reserve prices have been prevalent in many marketplaces such as online advertising markets. While there is empirical and theoretical evidence that highlights the significant revenue impact of setting personalized reserve prices for unit-demand buyers, we do not yet have a full understanding of how to optimize them. This problem is only solved under the assumption that buyers’ valuation distributions are i.i.d. and regular, which fail to hold in practice.

In my talk, I discuss the problem of computing data-driven personalized reserve prices in multi-unit VCG auctions without having any assumption on valuation distributions. The input to this problem is a data-set that contains the submitted bids of n buyers in a set of auctions, and the goal is to find a vector of reserve prices, one for each buyer, that maximizes the total revenue across all auctions. Roughgarden and Wang show that this problem is APX-hard even for a single item and give a greedy algorithm with a tight approximation factor of 0.5.  My talk is based on a joint work with Negin Golrezai and Renato Paes Lemme (EC 2019) and a joint work with Alex Slivkins and David Pennock (SODA 2021). In these works, we respectively present a 0.684-approximation algorithm for the single-item version and a 0.63-approximation algorithm for the general case. Both algorithms are based on novel LP-formulations and rounding procedures, which might be of independent interest.

 

 

Speaker: Weihao Kong (Stanford/UW)
Title: Meta-learning for mixed linear regression
Abstract: A common challenge faced in practical supervised learning is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks?  Our recent works [1,2] provide a smooth trade-off between the number of tasks and the size of the dataset per task where efficient and accurate learning is possible. In addition, our algorithm is robust even when a small fraction of tasks provide arbitrarily corrupted data.

[1] Meta-learning for mixed linear regression. Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh. ICML 2020.
[2] Robust Meta-learning for Mixed Linear Regression with Small Batches. Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh. NeurIPS 2020.

 

 

Speaker: Albert Cheu (Northeastern)
Title: Differential Privacy in the Shuffle Model
Abstract: When run on personal data, differentially private (DP) algorithms prevent third parties from recovering much information about any individual’s data. Classical work in differential privacy operates in extremes of trust assumptions: either all users give their data to one entity who executes a DP algorithm (central privacy) or users have no trust in any such entity and execute DP algorithms individually (local privacy). Solving statistical problems under local privacy demands many more samples than central privacy. In this talk, I will present shuffle privacy: under a slightly stronger trust assumption than local privacy, we can perform key statistical tasks with utility approaching central privacy. On the other hand, there are problems that remain difficult to solve.

 

 

Speaker: Sitan Chen (MIT)
Title: Learning Deep ReLU Networks when Gradient Descent Fails
Abstract: We consider the problem of learning an unknown ReLU network with an arbitrary number of layers under Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network’s parameters. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, while prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry. Based on joint work with Adam Klivans and Raghu Meka.

 

 

Speaker: Nicole Wein (MIT)
Title: Approximating the Diameter of a Graph
Abstract: I will talk about algorithms and conditional lower bounds for approximating the diameter of a graph (the largest distance between two vertices). The best known algorithm for exactly computing the diameter of a graph is the naive algorithm of computing all-pairs shortest paths (APSP) and taking the largest distance. Furthermore, no significantly better algorithm exists under the Strong Exponential Time Hypothesis. Running APSP can be prohibitively slow for very large graphs, so we seek much faster algorithms that produce an approximation of the diameter. I will discuss the landscape of time vs. accuracy trade-off algorithms and hardness for diameter.

 

 

Speaker: Yu Chen (UPenn)
Title: Near-linear Size Hypergraph Cut Sparsifiers
Abstract: Cuts in graphs are a fundamental object of study and play a central role in the study of graph algorithms. The problem of sparsifying a graph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczúr and Karger (1996) showed that given any n-vertex undirected weighted graph G and a parameter 0<ε<1, there is a weighted subgraph G’ of G with Õ(n/ε^2) edges such that the weight of every cut in G is preserved to within a (1 ± ε)-factor in G’.

A natural question is if such cut-preserving sparsifiers also exist for hypergraphs. Kogan and Krauthgamer (2015) initiated a study of this question. They showed that there exist hypergraph cut sparsifiers with O(n^2/ε^2) hyperedges, which is a factor $n$ larger than the Benczúr-Karger bound for graphs. In this talk, we will show a polynomial-time algorithm for creating hypergraph sparsifiers with Õ(n/ε^2) hyperedges, matching the Benczúr-Karger bound for graphs.

This is joint work with Sanjeev Khanna (Penn) and Ansh Nagda (Univ. of Washington).

 

 

Speaker: Fang-Yi Yu (Harvard)
Title: Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective
Abstract: We study a large family of stochastic processes that update a limited amount in each step. One family of such examples is agent-based modeling, where one agent at a time updates, so the state has small changes in each step. A key question is how this family of stochastic processes is approximated by their mean-field approximations. Prior work shows that the stochastic processes escape repelling fixed points and saddle points in polynomial time. We provide a tight analysis of the above question. For any non-attracting hyperbolic fixed point and a sufficiently small constant $\epsilon >0$, the process will be $\epsilon$-far away from the fixed point in $O(n\log n)$ time with high probability. We also show that it takes time $\Omega(n\log n)$ to escape such a fixed point with constant probability. This shows that our result is optimal up to a multiplicative constant. We leverage the above result and show that such stochastic processes can reach the neighborhood of some attracting fixed point in $O(n\log n)$ time with high probability. Finally, We show the power of our results by applying them to several settings: evolutionary game theory, opinion formation dynamics, and stochastic gradient descent in a finite-dimensional space. Joint works with Grant Schoenebeck.

 

 

Speaker: Fernando Granha Jeronimo (UChicago)
Title: Decoding Ta-Shma’s Binary Codes
Abstract: Since the seminal work of Gilbert and Varshamov in the 50’s, binary codes of distance $1/2-\epsilon$ and rate $\Omega(\epsilon^2)$ are known to exist (where $O(\epsilon^2 \log(1/\epsilon))$ is a rate upper bound). However, it was only more recently that the first \emph{explicit} construction of nearly optimal binary codes was found by Ta-Shma’17. Ta-Shma’s codes have distance $1/2-\epsilon$ and rate $\Omega(\epsilon^{2+o(1)})$ and they were not known to admit efficient decoding at the time of their discovery. In this talk, we give an account of efficient decoding algorithms for Ta-Shma’s codes. We explain how constraint satisfaction problems (CSPs), a notion of high-dimensional expansion, the Sum-of-Squares hierarchy and techniques from pseudorandomness can be combined to obtain efficient decoding (even in near-linear time).

 

 

Speaker: Christopher Jung (UPenn)
Title: Moment Multicalibration for Uncertainty Estimation
Abstract: We show how to achieve multi-calibrated estimators not just for means, but also for variances and other higher moments. Informally, this means that we can find regression functions which, given a data point, can make point predictions not just for the expectation of its label, but for higher moments of its label distribution as well — and those predictions match the true distribution quantities when averaged not just over the population as a whole — but also when averaged over an enormous number of finely defined population subgroups. It yields a principled way to estimate the uncertainty of predictions on many different subgroups — and to diagnose potential sources of unfairness in the predictive power of features across subgroups. As an application, we show that our moment estimates can be used to derive marginal prediction intervals that are simultaneously valid as averaged over all of the (sufficiently large) subgroups for which moment multi-calibration has been obtained. We also show how to obtain tight marginal prediction intervals in an online adversarial prediction setting — solving the same problem as conformal prediction, but without any distributional assumptions.

 

Speaker: Yiding Feng (Northwestern)
Title: Batching and Optimal Multi-stage Bipartite Allocations
Abstract: In several applications of real-time matching of demand to supply in online marketplaces, the platform can allow for some latency to batch the demand and improve the matching’s efficiency. Motivated by these scenarios, we study the optimal trade-off between batching and inefficiency in online allocations. In particular, we consider K-stage variants of the classic vertex weighted bipartite b-matching and AdWords problems, where online vertices arrive in K batches. Our main result for both problems is an optimal (1-(1-1/K)^K)-competitive (fractional) matching algorithm, improving the classic (1-1/e) competitive ratios known for the online variant of these problems (Mehta et al., 2007; Aggarwal et al., 2011). Our main technique is using a family of convex-programming based matchings that distribute the demand in a particularly balanced way among supply in different stages. More precisely, we identify a sequence of “polynomials with decreasing degrees” that can be used as strictly concave regularizers of the optimal matching linear program to form this family. By providing structural decompositions of the underlying graph using the optimal solutions of these convex programs, we develop a new multi-stage primal-dual framework to analyze the fractional multi-stage algorithm that returns the corresponding regularized optimal matching in each stage (by solving the stage’s convex program). We further show a matching upper-bound by providing an unweighted instance of the problem in which no online algorithm obtains a competitive ratio better than (1-(1-1/K)^K). We extend our results to integral allocations in the vertex weighted b-matching problem with large budgets, and in the AdWords problem with small bid over budget ratios.

 

Speaker: Shravas Kajana Rao (Northwestern)
Title: Degree vs. Approximate Degree
Abstract: Based on the recent breakthrough of Huang (2019), we show that for any total Boolean function f, deg(f)=O(~deg(f)2): The degree of f is at most quadratic in the approximate degree of f. This is optimal as witnessed by the OR function. We apply this result to show that the approximate degree of any read-once formula on n variables is Θ(√n).

 

Speaker: Saba Ahmadi (Northwestern)
Title: An Algorithm for Multi-Attribute Diverse Matching
Abstract: Bipartite b-matching, where agents on one side of a market are matched to one or more agents or items on the other, is a classical model that is used in myriad application areas such as healthcare, advertising, education, and general resource allocation. Traditionally, the primary goal of such models is to maximize a linear function of the constituent matches (e.g., linear social welfare maximization) subject to some constraints. Recent work has studied a new goal of balancing whole-match diversity and economic efficiency, where the objective is instead a monotone submodular function over the matching. Basic versions of this problem are solvable in polynomial time. In this work, we prove that the problem of simultaneously maximizing diversity along several features (e.g., country of citizenship, gender, skills) is NP-hard. To address this problem, we develop the first combinatorial algorithm that constructs provably-optimal diverse b-matchings in pseudo-polynomial time. We also provide a Mixed-Integer Quadratic formulation for the same problem and show that our method guarantees optimal solutions and takes less computation time for a reviewer assignment application. Joint work with Faez Ahmed, John P. Dickerson, Mark Fuge, and Samir Khuller.

 

 

Speaker: Sheng Yang (Northwestern)
Title: Scheduling on Spot Instances
Abstract: Cloud providers rent out surplus computational resources as spot instances at a deep discount. However, these cheap spot instances are revocable. When demand surges for higher priced on-demand instances, cloud providers can interrupt these spot instances after a brief alert. Such unreliability makes it challenging to utilize spot instances for many long-running, stateful jobs, but with checkpoints and restoration, machine-learning (ML) training jobs are a good candidate to overcome this difficulty. We face the trade-off between low cost and uninterrupted computations. The interruptions are modeled as known stochastic processes, while the diminishing returns in utility for ML training are modeled as a monotone submodular function. We study the problem of scheduling with the presence of spot instances, maximize the total utility obtained under a given budget. The problem is reduced to a variant of the stochastic correlated knapsack problem with submodular target function. With the help of the stochastic continuous greedy algorithm and the contention resolution scheme, we managed to get a \((1 – 1/\sqrt{e})/2 \simeq 0.1967\) approximation algorithm, improving on the \((1 – 1/\sqrt[4]{e})/2\) ratio from previous works.

 

 

 

About the Quarterly Theory Workshop Series

The Quarterly Theory Workshop brings in three or four theoretical computer science experts present their perspective and research on a common theme. Chicago and Midwest area researchers with interest in theoretical computer science are invited to attend. The technical program is in the morning and includes coffee and lunch. The afternoon of the workshop will allow for continued discussion between attendees and the speakers.