Research

My research interests include machine learning, big and high-dimensional data, and computational statistics. Essentially, I am very intrigued and amazed by the possibilities of big data and the questions it can answer.

I am very fortunate to have had Hua Zhou and Eric Chi as my co-advisers. For the first research project in my dissertation, I derived and compared different algorithms for estimating the constrained lasso (James et al., 2013), including a novel derivation of an efficient solution path algorithm. As its name suggests, the constrained lasso augments the standard lasso (Tibshirani, 1996) with linear equality and inequality constraints, providing an additiional vehicle for incorporating prior knowledge into the solution (such as monotonicity or non-negativity of the lasso coefficients).

My other research projects focused on a convex formulation of clustering that has been studied recently (Chen et al., 2015; Chi and Lange, 2015). Clustering is a fundamental unsupervised learning technique that aims to discover groups of objects in a dataset. Biclustering extends clustering to two dimensions where both observations and variables are grouped concurrently, such as simultaneously clustering cancerous tumors and genes or documents and words. Triclustering is then the natural extension of clustering to three dimensions where the data are organized in a three-dimensional array, or tensor. We develop and study a convex formulation of triclustering that generates an entire solution path of triclustering results as a function of only one tuning parameter, which alleviates the need to specify the number of clusters a priori. Additionally, convex triclustering has several nice properties, such as always returning the unique global minimum regardless of the initialization, and its solutions are stable with respect to small changes in the data.

Publications