# Quarterly Theory Workshop: 2019 Junior Theorists Workshop

**About the Series**

**Synopsis**

The focus of this workshop will be on junior researchers in all areas of theoretical computer science. The talks will be on the afternoon of the first day and morning of the second day. There will be time for open discussion after lunch on the second day. The speakers for this workshop are:

- Soheil Behnezad (UMD)
- Greg Bodwin (Georgia Tech)
- Sumegha Garg (Princeton)
- Andrea Lincoln (MIT)
- Thatchapol Saranuik (TTIC)
- Vatsal Sharan (Stanford)
- Sahil Singla (Princeton and IAS)
- Manolis Zampetakis (MIT)
- Jiapeng Zhang (Harvard)

### Logistics

**Date:**Thursday-Friday, Nov 14-15, 2019.**Location:**Seeley Mudd 3514, (map), Northwestern U, 2211 Campus Dr, Evanston, IL 60208.**Transit:**Noyes St. Purple Line (map).**Parking:**Validation for North Campus Parking Garage (map) available at workshop.**Recommended Hotel:**Hilton Orrington.**Registration:**Please register (not compulsory).*Please bring your own name badge from past conference*.

### Tentative Schedule

Thursday:

**11:30-12:30**: Lunch**12:30-12:35**: Opening Remarks**12:35-1:15**: Sumegha Garg (Princeton)

*Extractor-based Time-Space Lower Bounds for Learning**(Video)***1:20-2:00**: Jiapeng Zhang (Harvard)

*An improved sunflower bound**(Video)***2:05-2:35**: Coffee Break**2:35-3:15**: Thatchaphol Saranurak (TTIC)

*Expander decomposition: applications and how to use it**(Video)***3:20-4:00**: Andrea Lincoln (MIT)

*Tight Hardness for Shortest Cycles and Paths in Sparse Graphs**(Video)***4:05-4:45**: Soheil Behnezad (UMD)

*Improved Massively Parallel Algorithms for Maximal Matching and Graph Connectivity**(Video)***5:00-6:30**: Cocktail Reception

Friday:

**8:30-9:00**: Continental Breakfast**9:00-9:05**: Opening Remarks**9:05-9:45**: Sahil Singla (Princeton and IAS)

*Improved Truthful Mechanisms for Combinatorial Auctions with Submodular Bidders**(**Video)***9:50-10:30**: Greg Bodwin (Georgia Tech)

*Regularity Decompositions for Sparse Pseudorandom Graphs**(**Video)***10:35-11:05**: Coffee Break**11:05-11:45**: Vatsal Sharan (Stanford)

*New Problems and Perspectives on Learning, Sampling, and Memory**(**Video)***11:50-12:30**: Manolis Zampetakis (MIT)

*Computationally and Statistically Efficient Truncated Statistics**(**Video)***12:35-1:30**: Lunch

### Titles and Abstracts

**Speaker:** Sumegha Garg (Princeton)

**Title:** Extractor-based Time-Space Lower Bounds for Learning

A recent line of works has focused on the following question: Can one prove strong unconditional lower bounds on the number of samples needed for learning under memory constraints? We study an extractor-based approach to proving such bounds for a large class of learning problems as follows.

A matrix M: A x X -> {-1,1} corresponds to the following learning problem: An unknown element x in X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a_1, b_1), (a_2, b_2) …, where for every i, a_i in A is chosen uniformly at random and b_i = M(a_i,x).

Assume that k, l, r are such that any submatrix of M, with at least 2^{-k}|A| rows and at least 2^{-l}|X| columns, has a bias of at most 2^{-r}. We show that any learning algorithm for the learning problem corresponding to M requires either a memory of size at least \Omega(k l), or at least 2^{\Omega(r)} samples.

In particular, this shows that to learn d-degree multi-linear polynomial from its evaluation on random points from {0,1}^n, we either need \tilde{\Omega}(n^{d+1}) memory or 2^{\Omega(n)} samples (~ tight).

We also extend the lower bounds to a learner that is allowed two passes over the stream of samples.

Joint works with Ran Raz and Avishay Tal.

**Speaker:** Jiapeng Zhang (Harvard)

**Title:** An improved sunflower bound.

A sunflower with $r$ petals is a collection of $r$ sets so that the intersection of each pair is equal to the intersection of all. Erdos and Rado in 1960 proved the sunflower lemma: for any fixed $r$, any family of sets of size $w$, with at least about $w^w$ sets, must contain a sunflower. The famous sunflower conjecture is that the bound on the number of sets can be improved to $c^w$ for some constant $c$. Despite much research, the best bounds until recently were all of the order of $w^{cw}$ for some constant c. In this work, we improve the bounds to about $(log w)^w$.

Joint work with Ryan Alweiss, Shachar Lovett and Kewen Wu.

**Speaker:** Thatchaphol Saranurak (TTIC)

**Title:** Expander decomposition: applications and how to use it

Expander decomposition has been a central tool in designing graph algorithms in many fields (including fast centralized algorithms, approximation algorithms, and property testing) for decades. Recently, we found that it also gives many impressive applications in dynamic graph algorithms and distributed graph algorithms. In this talk, I will explain the key tools that enable this development, show how to apply these tools, and survey the recent results based on expander decomposition.

**Speaker:** Andrea Lincoln (MIT)

**Title:** Tight Hardness for Shortest Cycles and Paths in Sparse Graphs

Fine-grained reductions have established equivalences between many core problems with \tilde{O}(n^3)-time algorithms on n-node weighted graphs, such as Shortest Cycle, All-Pairs Shortest Paths (APSP), Radius, Replacement Paths, Second Shortest Paths, and so on. These problems also have \tilde{O}(mn)-time algorithms on m-edge n-node weighted graphs, and such algorithms have wider applicability. Are these mn bounds optimal when m << n^2? Starting from the hypothesis that the minimum weight (2ℓ+1)-Clique problem in edge weighted graphs requires n^{2ℓ +1-o(1)} time, we prove that for all sparsities of the form m = \Theta(n^{1+1/ℓ}), there is no O(n^2 + mn^{1- ϵ}) time algorithm for ϵ >0 for any of the below problems

- Minimum Weight (2ℓ+1)-Cycle in a directed weighted graph,
- Shortest Cycle in a directed weighted graph,
- APSP in a directed or undirected weighted graph,
- Radius (or Eccentricities) in a directed or undirected weighted graph,
- Wiener index of a directed or undirected weighted graph,
- Replacement Paths in a directed weighted graph,
- Second Shortest Path in a directed weighted graph,
- Betweenness Centrality of a given node in a directed weighted graph.

That is, we prove hardness for a variety of sparse graph problems from the hardness of a dense graph problem. Our results also lead to new conditional lower bounds from several related hypothesis for unweighted sparse graph problems including k-cycle, shortest cycle, Radius, Wiener index and APSP.

Joint work with Virginia Vassilevska Williams and Ryan Williams.

**Speaker:** Soheil Behnezad (UMD)

**Title:** Improved Massively Parallel Algorithms for Maximal Matching and Graph Connectivity

In this talk we will discuss the recent algorithmic progress made on the Massively Parallel Computations (MPC) model. The MPC model provides a clean theoretical abstraction of modern parallel computation frameworks such as MapReduce, Hadoop, Spark, etc., which have been extremely successful in processing large-scale data-sets.

Our main focus in the talk will be on the maximal matching problem. We provide a novel analysis of an extremely simple algorithm and show that it leads to exponentially faster MPC algorithms for maximal matching. Our analysis is based on a novel method of proving concentration bounds for algorithms satisfying a certain “locality” property.

We will also overview an algorithm for graph connectivity in the MPC model that takes O(log D) rounds for graphs with diameter D > log n and takes O(loglog n) rounds for all other graphs. This algorithm is near-optimal since Ω(log D) is a (conditional) lower-bound for the problem.

Based on joint works with Mohammad Hajiaghayi and David Harris (FOCS 2019), and Laxman Dhulipala, Hossein Esfandiari, Jakub Łącki, and Vahab Mirrokni (FOCS 2019).

**Speaker:** Sahil Singla (Princeton and IAS)

**Title:** Improved Truthful Mechanisms for Combinatorial Auctions with Submodular Bidders

A longstanding open problem in Algorithmic Mechanism Design is to design computationally efficient truthful mechanisms for (approximately) maximizing welfare in combinatorial auctions with submodular bidders. The first such mechanism was obtained by Dobzinski, Nisan, and Schapira [STOC’06] who gave an O(log^2 m)-approximation where m is number of items. This problem has been studied extensively since, culminating in an O(\sqrt{log m})-approximation mechanism by Dobzinski [STOC’16]. We present a computationally-efficient truthful mechanism with approximation ratio that improves upon the state-of-the-art by an exponential factor. In particular, our mechanism achieves an O((loglog m)^3)-approximation in expectation, uses only O(n) demand queries, and has universal truthfulness guarantee. This settles an open question of Dobzinski on whether Θ(\sqrt{log m}) is the best approximation ratio in this setting in negative.

This is joint work with Sepehr Assadi and appears in FOCS 2019.

**Speaker:** Greg Bodwin (Georgia Tech)

**Title:** Regularity Decompositions for Sparse Pseudorandom Graphs

The “regularity method” is a powerful tool in graph theory where one analyzes a graph G by first approximating it using a union of random graphs H, and then one analyzes H using easy probabilistic processing. This method lets one solve otherwise-hard or intractable problems on G, with some extra error depending on how well it is approximated by H. How big can this error possibly be? For dense graphs, the answer is well known. But for slightly sparser graphs — say, on “only” n^{1.99} edges — it is suddenly less clear which types of graphs can be well approximated, how to measure approximation error, etc. We will survey this research area and discuss some new progress in classifying the types of graphs for which a “sparse regularity method” can be effectively applied.

Based on joint work with Santosh Vempala.

**Speaker:** Vatsal Sharan (Stanford)

**Title:** New Problems and Perspectives on Learning, Sampling, and Memory

What is the role of memory in continuous optimization and learning? Are there inherent trade-offs between the available memory and the data requirement? Is it possible to achieve the sample complexity of second-order optimization methods with significantly less memory? I will discuss these questions in this talk, and a recent result along these lines. For a basic continuous optimization and learning problem—linear regression—we show that there is a gap between the sample complexity of algorithms with quadratic memory and that of any algorithm with sub-quadratic memory. I will also discuss several promising future directions to show strong memory/sample tradeoffs for continuous optimization. This is based on joint work with Aaron Sidford and Greg Valiant.

If there is time, I will introduce the problem of data “amplification”: given n independent draws from a distribution, D, to what extent is it possible to output a set of m > n datapoints that are indistinguishable from m i.i.d. draws from D? Curiously, we show that nontrivial amplification is often possible in the regime where n is too small to learn D to any nontrivial accuracy. We also discuss connections between this setting and the challenge of interpreting the behavior of GANs and other ML/AI systems. This is based on joint work with Brian Axelrod, Shivam Garg and Greg Valiant.

**Speaker:** Manolis Zampetakis (MIT)

**Title:** Computationally and Statistically Efficient Truncated Statistics

Censoring and truncation occur when data falling outside of a subset of the population are not observable. In practice, it often arises as a result of saturation of measurement devices, experimental design, and legal or privacy constraints preventing the use of some of the data. Such phenomena have been known to affect experimental results in a counterintuitive way, as per

Berkson’s paradox.

In our recent work, we provide the first provably computationally and statistically efficient methods accomplishing the fundamental task of statistical estimation for the entire population out of exclusively censored data. Our first result [w/ Daskalakis, Gouleakis, Tzamos FOCS’18] assumes that

the population follows a multi-dimensional normal distribution and the survival set is known. In follow-up works, we have extended our result to the case of censored linear [w/ Daskalakis, Gouleakis, Tzamos COLT’19], logistic and probit regression [w/ Daskalakis, Ilyas, Rao ’19] and we have also explored the case of unknown survival set [w/ Kontonis, Tzamos FOCS’19].

## Recent Comments