Event Type:

Mathematical Biology Seminar

Date/Time:

Tuesday, April 30, 2013 - 06:00

Location:

GBAD 103

Event Link:

Guest Speaker:

Institution:

Oregon State University

Abstract:

RNA sequencing (RNA-Seq) has become the technology of choice for mapping and

quantifying transcriptome and for studying gene expression. The negative binomial

(NB) distribution has been shown to be a useful model for frequencies of mapped

RNA-Seq reads. The NB model uses a dispersion parameter to capture the extra-Poisson

variation commonly observed in RNA-Seq read frequencies. An extension to NB

regression is needed to permit the modeling of gene expression as a function of

explanatory variables and to compare groups after accounting for other factors. A

considerable obstacle in the development of NB regression is the lack of accurate

small-sample inference for the NB regression coefficients. The exact test available

for two-group comparisons does not extend. Asymptotic inferencesâ€”through Wald test

and the likelihood ratio testâ€”are mathematically justified only for large sample

sizes. Because of the labor associated with RNA-Seq experiments, sample sizes are

almost always small. There is an obvious concern that the large-sample tests may be

inappropriate for such small sample sizes. In this paper we address that issue by

showing that likelihood ratio tests for regression coefficients in NB regression

models, possibly with a higher-order asymptotic (HOA) adjustment, are nearly exact,

even for very small sample sizes. In particular, we demonstrate that 1) the

HOA-adjusted likelihood ratio test p-values are, for practical purposes,

indistinguishable from exact test p-values in situations where the exact test is

available and 2) via simulation, that the behavior of the test matches the nominal

specifications more generally. This work helps clarify the accuracy of the

unadjusted likelihood ratio test and the degree of improvement available with the

HOA adjustment. Furthermore, this important application to analysis of biological

data will draw attention to HOA, a somewhat neglected yet extremely useful

development of modern statistical theory.