Although modern sequencing technologies such as ribonucleic acid sequencing and next-generation sequencing have been developed, microarrays have been a widely used high-throughput technology in gathering large amounts of genomic data [1, 2]. Due to small sample sizes in single microarray studies, microarray studies are combined with meta-analytic techniques to increase statistical power and generalizability of the results [1, 3].
Common meta-analysis techniques applied in gene expression studies included combining of p-values, rank values, and effect sizes. Examples of the p-value based methods include Fisher’s method, Stouffer’s method, minimum p-value method, maximum p-value method, and adaptively weighted Fisher’s method. The rank-based methods include rth ordered p-value method, naïve sum of ranks, naïve product of ranks, rank product, and rank sum methods. The effect-size based methods include fixed-effects (FE) and random-effects (RE) models.
Appropriateness of the meta-analysis techniques in gene expression data depends on types of hypothesis testing: HSA, HSB, or HSC as described in [4–6]. Maximum p-value and naïve sum of rank methods were appropriate for HSA hypothesis that detected DE genes across all studies. The rth ordered p-value method and two-step DerSimonian and Laird estimated RE models were appropriate for HSB hypothesis that detected DE genes in one or more studies. DerSimonian and Laird (DSL) and empirical Bayes estimated RE models, including our two-step estimated RE model using DSL and random coefficient of determination (R2) method were appropriate for HSC hypothesis that detected DE genes in a majority of combined studies [4–6].
Some of these methods may be limited in their application. The p-value based methods are limited in reporting summary effects and addressing study heterogeneity [3, 7–9]. The rank-based methods are robust towards outliers and applied without assuming a known distribution [8, 10]; however, their results are dependent on the influence of other genes included in microarrays . The FE model assumes that total variation is derived from a true effect size and a measurement error ; however, the effect may vary across studies in real-world applications. Concurrently, although the RE model can address study-specific effects and accounts for both within and between study variation, the between study variation or the heterogeneity in effect sizes is unknown. Many frequentist-based methods have been developed to estimate the between study variation. More details can be found in [6, 9, 11, 12].
The RE models are commonly applied in gene expression meta-analysis. Classical RE models assume studies are independently and identically sampled from a population of studies. However, an infinite population of studies may not exist and studies may be designed based on results of previous studies, thus potentially violating an independence assumption. Bayesian random-effects (BRE) models have been used to allow for uncertainty of parameters. The uncertainty is expressed through a prior distribution and a summary of evidence provided by the data is expressed by the likelihood of the models. Multiplying the prior distribution and the likelihood function results in a posterior distribution of the parameters [13, 14].
Sample quality has substantial influence on results of gene expression studies [15, 16]. The degree of heterogeneity may differ due to inconsistencies in sample quality. Low heterogeneity can be found in meta-analyses containing good quality samples, while high heterogeneity arises in meta-analyses containing poor quality samples. In our recent study, we evaluated the relationships between DE and heterogeneous genes in meta-analyses of Alzheimer’s gene expression data. We detected some overlapped DE and heterogeneous genes in meta-analyses containing borderline quality samples, while no heterogeneous genes were detected in meta-analyses containing good quality samples . Obviously, data obtained from borderline (poor) quality samples can increase study heterogeneity and reduce the efficiency of meta-analyses in detecting DE genes [17, 18].
In this study, we implemented a meta-analytic approach that includes sample-quality weights to take study heterogeneity into account in RE and BRE models. The gene expression data therefore would consist of up-weighted good quality samples and down-weighted borderline quality samples. Therefore in the Methods section we first review quality assessments of microarray samples, sample-quality weights, RE models, BRE models, weighted RE models, and weighted BRE models. We then describe our simulation studies and application data. Our results are then presented followed by discussion and conclusions.