In Vivo DNA Binding

Browse Data
Data Releases

Overview:
   Summary
   Protocols

Software:
   TiMAT
   Secondary
      Analysis Scripts

   Berkeley
      Quantitative
      Genome Browser

Summary

We are using formaldehyde crosslinking ChIP/chip and a whole genome tiling array from Affymetrix to identify regions bound by transcription factors in Drosophila embryos. Our current data release contains information on 6 of the 37 principal regulators of the early embryo network, Zeste, and RNA polymerase II, see Browse Data.

These data are produced as follows. One hour collections of wild type embryos are aged to the appropriate developmental stage and then crosslinked. The chromatin is purified by CsCl buoyant density centrifugation and sonicated to an average length of around 600 bp. DNA fragments bound by specific factors are then immunoprecipitated using affinity purified antibodies. Where possible, separate immunoprecipitations are performed for each factor using distinct antibodies recognizing non overlapping epitopes of the protein. Control immunoprecipitations of crosslinked chromatin in the presence of normal IgG are also performed. Duplicate immunoprecipitations are performed for each anti factor antibody and IgG control. Each immunoprecipitated chromatin sample as well as duplicate input genomic DNAs are then separately amplified, labeled and hybridized to a tiling array. The arrays used contain over 3 million 5 µ features representing 25 bp sequences spaced on average 35 bp apart across the unique sequences of the D. melanogaster genome. Our protocols are available here, and our methods further described in Li et al 2008.

The array data is first analyzed using a comprehensive package of command line java applications for mapping, normalizing, testing, filtering, printing, plotting, comparing, and annotating results. This package, TiMAT, identifies regions of the genome significantly enriched in the immunoprecipitation (called intervals or bound regions) and provides False Discovery Rate estimations for each region using two methods (the symmetric null test and the experimental IgG IP test). TiMAT also identifies the location of the peaks of local maximum array hybridization, the most intense peak in each interval being designated as the primary peak. TiMAT is freely available as open source software.

In addition, secondary analysis scripts take bound regions identified by TiMAT and characterizes then further to, for example, discover the locations of factor recognition sites and determine if these are functionally conserved across Drosophila species, or to determine the genomic features and gene classes associated with each primary peak. These analyses were designed to take into account and to illustrate the fact that animal sequence specific transcription factors bind a quantitative continum of hetrogenous genomic regions, from highly bound functional target elements to poorly bound regions, many of which may have no direct transcripitional function, see Li et al 2008.