Event Detail

Event Type: 
Mathematical Biology Seminar
Tuesday, April 30, 2013 - 06:00
GBAD 103

Speaker Info

Oregon State University

RNA sequencing (RNA-Seq) has become the technology of choice for mapping and
quantifying transcriptome and for studying gene expression. The negative binomial
(NB) distribution has been shown to be a useful model for frequencies of mapped
RNA-Seq reads. The NB model uses a dispersion parameter to capture the extra-Poisson
variation commonly observed in RNA-Seq read frequencies. An extension to NB
regression is needed to permit the modeling of gene expression as a function of
explanatory variables and to compare groups after accounting for other factors. A
considerable obstacle in the development of NB regression is the lack of accurate
small-sample inference for the NB regression coefficients. The exact test available
for two-group comparisons does not extend. Asymptotic inferences—through Wald test
and the likelihood ratio test—are mathematically justified only for large sample
sizes. Because of the labor associated with RNA-Seq experiments, sample sizes are
almost always small. There is an obvious concern that the large-sample tests may be
inappropriate for such small sample sizes. In this paper we address that issue by
showing that likelihood ratio tests for regression coefficients in NB regression
models, possibly with a higher-order asymptotic (HOA) adjustment, are nearly exact,
even for very small sample sizes. In particular, we demonstrate that 1) the
HOA-adjusted likelihood ratio test p-values are, for practical purposes,
indistinguishable from exact test p-values in situations where the exact test is
available and 2) via simulation, that the behavior of the test matches the nominal
specifications more generally. This work helps clarify the accuracy of the
unadjusted likelihood ratio test and the degree of improvement available with the
HOA adjustment. Furthermore, this important application to analysis of biological
data will draw attention to HOA, a somewhat neglected yet extremely useful
development of modern statistical theory.