matrix linear models for high-throughput genetic screens

Matrix linear models for high-throughput chemical genetic screens Jane W Liang, Robert J Nichols, Saunak Sen

We develop a flexible and computationally efficient approach for analyzing high throughput chemical genetic screens. In such screens, a library of genetic mutants is phenotyped in a large number of stresses. The goal is to detect interactions between genes and stresses. Typically, this is achieved by grouping the mutants and stresses into categories, and performing modified t-tests for each combination. This approach does not have a natural extension if mutants or stresses have quantitative or non-overlapping annotations (eg. if conditions have doses, or a mutant falls into more than one category simultaneously). We develop a matrix linear model framework that allows us to model relationships between mutants and conditions in a simple, yet flexible multivariate framework. It encodes both categorical and continuous relationships to enhance detection of associations. To handle large datasets, we develop a fast estimation approach that takes advantage of the structure of matrix linear models. We evaluate our method's performance in simulations and in an E. coli chemical genetic screen, comparing it with an existing univariate approach based on modified t-tests. We show that matrix linear models perform slightly better than the univariate approach when mutants and conditions are classified in non-overlapping categories, and substantially better when conditions can be ordered in dosage categories. Our approach is much faster computationally and is scalable to larger datasets. It is an attractive alternative to current methods, and provides a natural framework extensible to larger, and more complex chemical genetic screens. A Julia implementation of matrix linear models and the code used for the analysis in this paper can be found at https://bitbucket.org/jwliang/mlm_packages and https://bitbucket.org/jwliang/mlm_gs_supplement, respectively.