A major challenge in the post-genomic era is deciphering the genetic basis of behavior. Central to this effort are behavioral phenotyping assays in model animals, such as the mouse, the zebra fish, the fruit fly and C. elegans. The problem of replicability of phenotyping results was initially pointed out in  mouse studies, in standard assays such as the “open field” and the “plus maze”, but it probably afflicts phenotyping research in other model animals. A related field that probably suffers from this problem is pre-clinical psychopharmacological research and drug screening in rats and mice, which employ the same standard behavioral assays. This might be one cause of the deteriorating number of approved psychiatric drugs during the last years.

The assays are used to measure the behavior of mouse and rat genotypes, such as inbred strains, crosses, and knockouts, in order to associate them with particular gene loci. Such genotype differences are almost always established in single-laboratory experiments. The issue of their replicability in other laboratories was largely ignored until brought to light by Crabbe, Wahlsten, and Dudek (CWD) in their influential 1999 report in Science.

CWD conducted an experiment concurrently in three laboratories, comparing 8 genotypes in 7 standard behavioral characteristics (“endpoints”), following an identical protocol. While they found large genotype differences, these differences frequently did not remain constant across laboratories: Genotype A might score higher than genotype B in two laboratories but lower in the third. Such a genotype × laboratory interaction (G×L) might arise if a particular genotype reacts differently than another genotype, for no identifiable cause, to the peculiarities of a specific laboratory. CWD thus concluded: “experiments characterizing mutants may yield results that are idiosyncratic to a particular laboratory”. This lack of across-laboratory replicability might be interpreted as a serious shortcoming in behavior genetics at large.

The traditional remedy advocated for the G×L problem is strict standardization of test protocol, housing procedures, and laboratory environment. Such standardization, however, would require a vast coordinated community effort, because currently each laboratory typically uses its own housing conditions, protocols, hardware, and technical limitations. Moreover, in a multi-lab study in 2009, Richter, Garner and Würbel (RGW) demonstrated that standardization might be the cause, rather than the cure, for poor replicability. RGW actually inserted “within-laboratory heterogenization” intentionally, so as to better represent the range of environmental variation between laboratories.

Our approach to the problem, proposed and demonstrated in a multi-lab study in PNAS in 2005, also recognizes that mere standardization is not a likely solution. Instead we propose a community effort to directly estimate G×L in multiple laboratories, capitalizing on phenotype databases such as the Europhenome http://www.europhenome.org/ and the Mouse Phenome Database http://phenome.jax.org/. The main  practical advantage of our approache is enabling phenotypers to ensure the replicability of their discoveries even when they are made in single-lab studies.


Crabbe JC, Wahlsten D, Dudek BC. (1999) Genetics of Mouse Behavior: Interactions with Laboratory Environment. Science 284, 1670–1672.

Kafkafi N, Benjamini Y, Sakov A, Elmer GI, Golani I (2005) Genotype-environment interactions in mouse behavior: A way out of the problem. Proc Natl Acad Sci USA 102:4619–4624.

Richter SH, Garner JP and Würbel H. Environmental standardization: cure or cause of poor reproducibility in animal experiments? Nature Methods 6, 257 – 261 (2009)