# Quarterly Theory Workshop: Computational Challenges in Machine Learning

**About the Series**

**Synopsis**

**Logistics **

**Location:**ITW Lecture Hall (1st floor), Ford Motor Company Engineering Design Center (map), Northwestern U, 2133 Sheridan Rd, Evanston, IL 60208.**Transit:**Noyes St. Purple Line (map).**Parking:**Validation for North Campus Parking Garage (map) available at workshop.**Registration:**none necessary.

**Talk Schedule on **Wednesday Feb 22nd.

8:45 am — 9:15 am: Breakfast

9:15 am – 10:00am: Sanjoy Dasgupta: *Embedding and projection*

*approaches to fast nearest neighbor search.*

10:05 am — 10:50am : Moritz Hardt: *Gradient Descent Learns Linear*

*Dynamical Systems.*

10:50 am – 11:15 am : Break

11:15 am — Noon: Santosh Vempala: *Robust Estimation of Mean and Covariance.*

12:05pm – 12:50pm : Nathan Srebro: *Non-Convex Local Search: Beyond*

*Global Optimality.*

12:50pm – 2:00 pm : Lunch

**Titles and Abstracts**

**Speaker:**

*Sanjoy Dasgupta (UCSD)*

**Title:**

*Embedding and projection approaches to fast nearest neighbor search.*

**Abstract:** Nearest neighbor (NN) search is one of the simplest and most enduring methods of statistical estimation. Its efficacy depends upon (1) finding nearest neighbors rapidly and (2) using a distance function that is well-suited to the domain. We present three algorithmic results that address these issues.1. Randomized data structures for exact NN search in Euclidean space

Two of the most popular approaches to fast NN search are locality-sensitive hashing and randomized forests. They are both effective in practice, but only the former has been analyzed carefully. We focus on randomized forests, which subsume hashing schemes but are applied in a somewhat different, data-dependent manner. We show that the probability that they fail to find the nearest neighbor in Euclidean space, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in several situations of interest, for instance when the data lie in a space of bounded doubling dimension.

2. Embeddings into Hilbert space that monotonically transform distances

Most work on NN search has focused on Euclidean distance, while other important distance functions, like L1 distance or KL divergence, have received much less attention. We observe that although many of these distances cannot be embedded into Euclidean space, monotonic transformations of them can. For instance, L1 is not embeddable in L2,

but the square root of L1 distance is — and this is good enough for NN purposes. We show how this can be exploited algorithmically.

3. A randomized tree structure for metric spaces

Finally, we introduce the “two-vantage” tree, a randomized tree data structure for metric spaces. Forests of such trees are seen to be extremely effective for exact nearest neighbor search.

This is joint work with Kaushik Sinha, Xinan Wang, and Zhen Zhai.

Bio: Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley in 2000, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He is the author of a textbook, “Algorithms” (with Christos Papadimitriou and Umesh Vazirani), that appeared in 2006.

**Speaker:** *Moritz Hardt (Google)*

**Title:** *Gradient Descent Learns Linear Dynamical Systems*

**Abstract:**We prove that gradient descent efficiently converges to the global

optimizer of the maximum likelihood objective of an unknown linear

time-invariant dynamical system from a sequence of noisy observations

generated by the system. Even though objective function is non-convex,

we provide polynomial running time and sample complexity bounds under

strong but natural assumptions. Linear systems identification has been

studied for many decades, yet, to the best of our knowledge, these are

the first polynomial guarantees for the problem we consider.Joint work with Tengyu Ma and Ben Recht.Short bio: Moritz Hardt is a senior research scientist at Google Brain. After obtaining

a PhD in computer science from Princeton University in 2011, he worked

at IBM Research Almaden on algorithmic principles of machine learning.

Then he moved to Google to join the Foundations of Applied Machine

Learning group where his mission is to build guiding theory and scalable

algorithms that make the practice of machine learning more reliable,

transparent, and effective.

**Speaker:**

*Santosh Vempala (Georgia Tech)*

**Title:**

*Robust Estimation of Mean and Covariance*

**Abstract:** We consider the fundamental statistical problem of robustly estimating the mean and covariance of a distribution from i.i.d. samples in n-dimensional space, i.e. in the presence of arbitrary (malicious) noise, with no assumption on the noise. Classical solutions (Tukey point, geometric median) are either intractable to compute efficiently or produce estimates whose error scales as a power of the dimension. In this talk, we present efficient and practical algorithms to compute estimates for the mean, covariance and operator norm of the covariance matrix, for a wide class of distributions. We show that the estimate of the algorithm is higher than the information-theoretic lower bound by a factor of at most the square-root of the logarithm of the dimension. This gives polynomial-time solutions to some basic questions studied in robust statistics. One of the applications is agnostic SVD.

Joint work with Kevin Lai and Anup B. Rao

**Speaker:** *Nathan Srebro (TTI Chicago)*

**Title:** *Non-Convex Local Search: Beyond Global Optimality*

**Abstract:**Gradient descent, and other local search variants, are proving successful for solving many non-convex problems, most prominently training multi-layer networks. Not only do they seem to succeed in converging to a global (nor near global) optimum, but in underdetermined problems they are also biased toward “simpler” global optimum and thus enable generalization. To understand this better, we study the non-convex problem of searching over matrix factorizations, and discuss both global optimality and the bias toward simpler models.

## Recent Comments