Background: Measuring the behavior of model animals, such as mice, rats, flies etc., is an important tool in the fields of behavioral phenotyping, brain research, pre-clinical psychopharmacological research and drug screening, the latter having a worldwide cost of over $40 billion per year (see our recent paper in Nature methods for more details).

The problem Results of these studies are almost always established in a single laboratory. When conducted in other laboratories these results often fail to replicate.This lack of replicability across laboratories was largely ignored, until brought to light by Crabbe, wahlsten and Dudek in their influential 1999 report. Their experiment compared  several inbred strains of mice in standard assays across three different laboratories. In spite of the strict standardization of the experimental protocol, laboratory effects were evident. Even worse:  the interaction between laboratories and genotype did not disappear.

Standard solutions: The traditional remedy advocated for the Replicability across laboratory problem is strict standardization of test protocol, housing procedures, and laboratory environment. Such standardization, however, would require a vast coordinated community effort, because currently each laboratory typically uses its own housing conditions, protocols, hardware, and technical limitations. Moreover, in a multi-lab study in 2009, Richter demonstrated that standardization might be the cause, rather than the cure, for poor replicability. Richter actually inserted “within-laboratory heterogenization” intentionally, so as to better represent the range of environmental variation between laboratories.

Our approach recognizes that mere standardization is not a sufficient solution. Clearly, the replicability of a discovery across laboratories should be demonstrated, not assumed. Taking this requirement at face value means conducting each experiment in at least two labs. However, we offer a practical approach that would enable replicable discoveries to be established even in single-lab experiments.

Our vision is a community effort to create a database from which one can directly get estimates of the interaction variability that are relevant for the experiment being conducted. Phenotype databases such as the Europhenome and the Mouse Phenome Database already exist and rely on the willingness of researchers to share their data after publication. These or similar databases can be used to mine the relevant variability terms. Practically this will enable users of behavioral measures to ensure the replicability of their discoveries according to the multiple-lab criterion even when they are made in single-lab studies.

If you share our vision and would like to contribute ‘used’ data please let us know.