## GlobalSIP 2013 Symposium on:

# Optimization in Machine Learning and Signal Processing

[Download the PDF Call for Papers]

This workshop seeks to explore the fertile intersection of signal processing, machine learning and large-scale optimization. Many recent and fundamental advances in drawing inference from data have involved formulating statistical objectives - both inferring a model, and predicting thereof - as optimization problems.A key factor complicating matters is that modern signal processing applications often demand solving such optimization problems at very large scales. It thus becomes important to leverage any structure present in the statistical estimation problem, both classical structures such as strict/strong convexity, and smoothness, but also other statistical-estimation-specific structures such as sparsity, graphical model structure, low-rank structure and so on.

What are the operational and fundamental relationships between computation and statistical efficiency? Which facets of optimization methods are inherently parallelizable (e.g. message-passing algorithms)? What are the limits and bottlenecks faced by the state of the art optimization methods when faced with large-scale problems? It is the aim of this workshop to answer questions in this vein by bringing together researchers from different communities --- signal processing, machine learning, statistics and mathematical programming --- and identify common intuitions underlying successful methods.

Submissions of at most 4 pages in two-column IEEE format are welcome on topics including:

- Models and estimation
- Sparsity, Low-rank and other methods in high-dimensional statistics
- Large-scale convex optimization: algorithms and applications
- Graphical models: inference, structure learning etc.
- Optimization for clustering, classification, regression etc.
- Non-convex and iterative methods

### Keynote Speakers

Francis Bach, *Ecole Normale Superieure*, **Beyond Stochastic Gradient Descent for Large-scale Machine Learning**

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are O(1/\sqrt{n}) for general convex functions and reaches O(1/n) for strongly-convex functions. In this talk, I will show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. (Joint work with Nicolas Le Roux, Eric Moulines and Mark Schmidt.)

**Francis Bach** is a researcher in the Sierra INRIA project-team, in the Computer Science Department of the Ecole Normale Superieure, Paris, France. He graduated from the Ecole Polytechnique, Palaiseau, France, in 1997, and earned his PhD in 2005 from the Computer Science division at the University of California, Berkeley. His research interests include machine learning, statistics, optimization, graphical models, kernel methods, sparse methods and statistical signal processing. He has been awarded a starting investigator grant from the European Research Council in 2009.

Inderjit Dhillon, *University of Texas at Austin*, **Sparse Inverse Covariance Estimation for a Million Variables**

The L1-regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters that scale quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20,000 variables. In this talk, I will describe a quadratic approximation method that can solve 1-million dimensional L1-regularized log determinant problems (which would thus have a trillion parameters) on a single computer. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include (i) a second-order Newton-like method, (ii) division of the variables into free and fixed sets, (iii) a block co-ordinate descent method, and (iv) a memory efficient scheme that approximates key quantities within the algorithm. Even with the latter approximations, the proposed BIGQUIC algorithm can achieve a quadratic convergence rate. Experimental results using synthetic and real application data demonstrate the improvements in performance over otherstate-of-the-art methods.

This is joint work with Cho-Jui Hsieh, Matyas Sustik and Pradeep Ravikumar.

**Inderjit Dhillon** is a Professor of Computer Science and Mathematics at The University of Texas at Austin. He is closely affiliated with the Institute for Computational Engineering and Sciences(ICES), and also with the Division of Statistics and Scientific Computation(SSC), Dept of Electrical and Computer Engineering(ECE), and the Center for Computational Biology and Bioinformaics(CCBB). Inderjit received his B.Tech. degree from the Indian Institute of Technology at Bombay, and Ph.D. from the University of California at Berkeley. At Berkeley, Inderjit studied computer science and mathematics with Beresford Parlett and Jim Demmel. His thesis work led to the fastest known numerically stable algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. Software based on this work is now part of all state-of-the-art numerical software libraries. Inderjit's current research interests are in big data, machine learning, network analysis, numerical optimization and scientific computing.Inderjit received an NSF Career Award in 2001, a University Research Excellence Award in 2005, the SIAG Linear Algebra Prize in 2006, the Moncrief Grand Challenge Award in 2010, the SIAM Outstanding Paper Prize in 2011, and the ICES Distinguished Research Award in 2013. Along with his students, he has received several best paper awards at leading data mining and machine learning conferences. Inderjit has published over 100 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. He has served on several panels, including the Committee of Visitors, at the National Science Foundation. He is a Senior Member of the Institute of Electrical and Electronics Engineers(IEEE), a member of the Association for Computing Machinery(ACM), the Society for Industrial and Applied Mathematics(SIAM) and the American Association for the Advancement of Science(AAAS).

Babak Hassibi, *California Institute of Technology*

We consider the problem of recovering a structured signal (sparse, low-rank, block-sparse, etc.) from noisy compressed measurements. A general algorithm for such problems, commonly referred to as generalized LASSO, attempts to solve this problem by minimizing a least-squares cost with an added "structure-inducing" regularization term (l_1 norm, nuclear norm, mixed l_2/l_1 norm, etc.). When the measurement matrix consists of iid Gaussian entries we provide a full performance analysis of the generalized LASSO algorithm and compute, in closed form, the normalized square error of the reconstructed signal. We will highlight some of the mathematical vignettes necessary for the analysis, including Gordon's comparison lemma for Gaussian processes, and projections onto the sub-differential cone of the structure-inducing function. We will also make connections to noiseless compressed sensing and the proximity operator in denoising, and will emphasize the central role of the "statistical dimension", i.e., the minimum number of measurements needed to reconstruct a structured signal by minimizing an appropriate convex function. We provide extensive numerical examples to validate the results and mention several open problems.

**Babak Hassibi** is the Gordon M Binder/Amgen Professor and Executive Officer of Electrical Engineering at Caltech. He obtained his PhD from Stanford University and prior to Caltech was with the Mathematical Sciences Research Center at Bell Laboratories. His research interests span communications, control and signal processing. Among other awards, he is the recipient of the Presidential Early Career Award for Scientists and Engineers and the David and Lucille Packard Fellowship in Science and Engineering.

### Paper Submission

Submit papers of at most 4 pages in two-column IEEE format through the GlobalSIP website at http://www.ieeeglobalsip.org/Papers.asp. All papers (contributed and invited) will be presented as posters.

### Important Dates

Paper Submission Deadline | June 15, 2013 |

Review Results Announce | July 30, 2013 |

Camera-Ready Papers Due | September 7, 2013 |