CS+Math Workshop on Metric Embeddings and Dimensionality Reduction
- Location: Mudd Library Building, Room 3514 (subject to change).
- Transit: Noyes St. Purple Line (map).
- Parking: Validation for North Campus Parking Garage (map) available at workshop.
- Recommended Hotel: Hilton Orrington.
- Registration (free): Please, register here.
- Moses Charikar (Stanford)
- Anupam Gupta (CMU)
- Sepideh Mahabadi (Toyota Technological Institute at Chicago)
- Assaf Naor (Princeton)
- Jelani Nelson (Harvard)
- Ilya Razenshteyn (Microsoft Research — Redmond)
- David P. Woodruff (CMU)
Friday, May 24 (Afternoon)
- 12:00 Lunch
- 1:15pm Opening remarks
- 1:30pm Assaf Naor, An average John theorem.
- 2:30pm Jelani Nelson, Sampling from sketching, an optimal lower bound.
- 3:30pm Anupam Gupta, Metric Embeddings via Shortest-path Decompositions.
- 4:30pm Moses Charikar, Importance Sampling in High Dimensions via Hashing.
Saturday, May 25 (Morning)
- 9:00am Breakfast
- 9:30am Sepideh Mahabadi, Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extension.
- 10:30am David P. Woodruff, Tight Bounds for L1 Oblivious Subspace Embeddings.
- 11:30am Ilya Razenshteyn, Holder Homeomorphisms and Approximate Nearest Neighbors.
- 12:30pm Lunch
Title: An average John theorem
Speaker: Assaf Naor (Princeton)
Abstract: We will prove a sharp average-case variant of a classical embedding theorem of John through the theory of nonlinear spectral gaps. We will use this theorem to provide a new answer to questions of Johnson and Lindenstrauss (1983) and Bourgain (1985) on metric dimension reduction, and explain how it leads to algorithms for approximate nearest neighbor search.
Title: Sampling from sketching, an optimal lower bound
Speaker: Jelani Nelson (Harvard)
Abstract: The work of Jowhari et al. 2011 showed how to use sketches for sparse recovery, popular in the compressed sensing literature, to develop low-memory schemes for sampling from a set being dynamically updated in a data stream. Ahn, Guha, and McGregor in a sequence of two papers then showed how to use this sampling primitive to solve a number of dynamic graph problems using low memory, such as connectivity, and computing a spanning forest, max matching, minimum spanning tree, and other such graph problems. Much follow-up work has extended the AGM approach to several other graph problems as well. In this talk, we show optimality of the sampling primitive developed by Jowhari et al.
This talk is based on joint work with Michael Kapralov, Jakub Pachocki, Zhengyu Wang, David P. Woodruff, and Mobin Yahyazadeh.
Title: Metric Embeddings via Shortest-path Decompositions
Speaker: Anupam Gupta (CMU)
Abstract: We consider the question of bi-Lipschitz embeddings of graph metrics into ℓp spaces, with the goal of understanding how the structure of the graph influences its embeddability. While some special cases are currently understood, this area still offers many open questions. In this talk we focus on embedding weighted graphs with pathwidth k into ℓp spaces. We discuss a new (existentially tight) bound on this embeddability problem. Our result is obtained via low-depth decompositions via shortest paths: the core new idea is that given a geodesic shortest path P, we can probabilistically embed all points into 2 dimensions with respect to P.
This is based on work with Ittai Abraham, Arnold Filtser, and Ofer Neiman.
Title: Importance Sampling in High Dimensions via Hashing
Speaker: Moses Charikar (Stanford)
Abstract: Locality sensitive hashing (LSH) is a popular technique for nearest neighbor search in high dimensional data sets. Recently, a new view at LSH as a biased sampling technique has been fruitful for density estimation problems in high dimensions. Given a set of points and a query point, the goal (roughly) is to estimate the density of the data set around the query. One way to formalize this is by kernel density estimation: Given a function that decays with distance and represents the “influence” of a data point at the query, sum up this influence function over the data set. Yet another way to formalize this problem is by counting the number of data points within a certain radius of the query. While these problems can easily be solved by making a linear pass over the data, this can be prohibitive for large data sets and multiple queries. Can we preprocess the data so as to answer queries efficiently? This talk will survey several recent papers that use locality sensitive hashing to design unbiased estimators for such density estimation problems and their extensions.
This talk will survey joint works with Arturs Backurs, Piotr Indyk, Vishnu Natchu, Paris Syminelakis and Xian (Carrie) Wu.
Title: Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extension
Speaker: Sepideh Mahabadi (TTIC)
Abstract: We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. We show that for every map f there exists an outer bi-Lipschitz extension f’ whose distortion is greater than that of f by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem but for outer bi-Lipschitz extensions. We present applications of our results to prioritized and terminal dimension reduction problems.
This is a joint work with Konstantin Makarychev, Yury Makarychev and Ilya Razenshteyn.
Title:Tight Bounds for L1 Oblivious Subspace Embeddings
Speaker: David P. Woodruff (CMU)
Abstract: Oblivious subspace embeddings have proven to be an essential ingredient for approximately solving numerical linear algebra problems, such as regression and low-rank approximation.
While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of p = 1, much less was known. In this talk I will present our results on l1 oblivious subspace embeddings, including (i) nearly optimal lower bounds and (ii) new constructions for sparse l1 oblivious subspace embeddings.
Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise lp low rank approximation. Our results give improved algorithms for these applications.
Based on joint work with Ruosong Wang.
Title: Holder Homeomorphisms and Approximate Nearest Neighbors
Speaker: Ilya Razenshteyn (Microsoft Research)
Abstract: I will show the first approximate nearest neighbor search data structure for a general d-dimensional normed space with sub-polynomial in d approximation.
The main tool is a finite-dimensional quantitative version of a theorem of Daher, which yields a Holder homeomorphism between small perturbations of a normed space of interest and a Euclidean space. To make Daher’s theorem algorithmic, we employ convex programming to compute the norm of a vector in a space, which is the result of complex interpolation between two given normed spaces.
Based on a joint work (FOCS 2018) with Alex Andoni, Assaf Naor, Sasho Nikolov and Erik Waingarten.