A Practitioner's Guide to Multiple Testing Error Rates

J. D.  Rosenblatt
last revised 24 Jun 2013 

abstract

It is quite common in modern research, for a researcher to test many hypotheses. The statistical (frequentist) hypothesis testing framework, does not scale with the number of hypotheses in the sense that naively performing many hypothesis tests will probably yield many false findings. Indeed, statistical "significance" is evidence for the presence of a signal within the noise expected in a single test, not in a multitude. In order to protect himself from an uncontrolled number of erroneous findings, a researcher has to consider of the type or errors he wishes to avoid and select the adequate procedure for that particular error type and data structure.
A quick search of the tag [multiple-comparisons] in the statistics Questions & Answers web site Cross Validates (this http URL) demonstrates the amount of confusion this task can actually cause. This was also a point made at the 2009 Multiple Comparisons conference in Tokyo. In an attempt to offer guidance, we review possible error types for multiple testing, and demonstrate them with some practical examples, which clarify the formalism. Finally, we include some notes on the software implementations of the methods discussed. 
The emphasis of this manuscript is on the error-rates, and not on the procedures themselves. We do try to name several procedures in this manuscript where appropriate. P-value adjustment will not be discussed as it is procedure specific. I.e., it is the choice of a procedure that defines the p-value adjustment, and not the error rate itself. Simultaneous confidence intervals will, also, not be discussed.

 

Deciding whether follow-up studies have replicated findings in a preliminary large-scale “omics’ study”

R. Heller, M. Bogomolov, Y. Benjamini
Submitted October 2, 2013

abstract

In "omics" research primary studies are often followed by follow-up studies on promising findings. Informally, reporting the p-values of promising findings in the primary and the follow-up studies gives a sense of the replicability of the findings. We offer a formal statistical approach to measure the evidence of replicability of findings from the primary study to the follow-up study, that exploits the fact that most hypotheses examined in the primary study are null. For each of the findings in the follow-up study we compute the lowest false discovery rate at which the finding is called replicated, which we term the r-value.
 

Assessing replicability of single-laboratory discoveries in phenotyping experiments

Supporting software and data (zip). 

The user is kindly asked to read the directions  and Warnings files first of all. Please follow the instructions carefully.

 N. Kafkafi , T. Sarig, I. Jaljule, I. Golani, H. Würbel, Y. Benjamini  
 Submitted August 28, 2014