A low rank matrix can be described as the outer product of two tall matrices, where the total number of variables is much smaller. One could exploit this observation in optimization: e.g., consider the minimization of a convex function f over low-rank matrices, where the low-rank set is modeled via such factorization. This is not the first time such a heuristic has been used in practice. The reasons of this choice could be three-fold: (i) it might model better an underlying task (e.g., f may have arisen from a relaxation of a rank constraint in the first place), (ii) it might lead to computational gains, since smaller rank means fewer variables to maintain and optimize, (iii) it leads to statistical “gains”, as it might prevent over-fitting in machine learning or inference problems.
Though, such parameterization comes at a “cost”: the objective is now a bilinear function over the variables, and thus non-convex with respect to the factors. Such cases are known to be much harder to analyze. In this talk, we will discuss and address some of the issues raised by working directly on such parameterization: How does the geometry of the problem change after this transformation? What can we say about global/local minima? Does this parameterization introduce spurious local minima? Does initialization over the factors play a key role, and how we can initialize in practice? Can we claim any convergence guarantees under mild assumptions on the objective f? And if yes, at what rate?
Discovery Building, Orchard View Room