Supplemental website for:

Quantitative Models of the Mechanisms that Control Genome-wide Patterns
of Transcription Factor Binding During Early Drosophila Development

Tommy Kaplan1, Xiao-Yong Li2, Peter J. Sabo3, Sean Thomas3, John A. Stamatoyannopoulos3, Mark D. Biggin4,*, and Michael B. Eisen1,2,4,*

1. Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California, Berkeley, California, USA
2. Howard Hughes Medical Institute, University of California, Berkeley, California, USA
3. Department of Genome Sciences, University of Washington, Seattle, Washington
4. Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
* To whom correspondence should be addressed.

Abstract

There is increasing evidence that the animal transcription factors that drive complex spatial and temporal patterns of gene expression during development bind in a quantitative continuum to a wide array of genomic regions. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when and the extent to which they bind remains primitive.

Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of anterior-posterior patterning in the early Drosophila melanogaster embryo. Predictions from the simplest form of the model, based on DNA sequence and in vitro protein-DNA binding affinities, achieve a correlation of ~0.4 with experimental measurements of in vivo binding to a set of test loci. Incorporating cooperativity and competition between the five factors and the predicted locations of nucleosomes, and accounting for the spatial patterning of each factor by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin at these regions. To test this idea, we incorporated direct experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions by twofold to a correlation of 0.6-0.9 for various factors across known target genes, enabling us to predict the landscape of in vivo binding with significant precision.

Finally, we used our model to quantify the roles of DNA sequence, DNA accessibility, and the inferred binding competition and cooperativity between the factors. Our results show that in regions of open chromatin, binding is predicted almost exclusively by the sequence specificity of individual factors, with only a minimal role for interactions among the proteins. We suggest that a combination of experimentally determined chromatin accessibility data and a simple computational model of transcription factor binding may be used to predict the binding landscape of any animal transcription factor.

Data

vBulletin statistic