In Vivo DNA Binding

Browse ChIP/chip Data
Browse ChIP/seq Data
Data Releases


      Analysis Scripts

      Genome Browser


We are using formaldehyde crosslinking and either ChIP/chip or ChIP/seq to identify regions bound by transcription factors in Drosophila embryos. Our current data release contains ChIP/chip data on 21 of the 37 principal regulators of the early embryo network, Zeste, TFIIB and RNA polymerase II, see Browse ChIP/chip Data. All ChIP/chip data is for D. melanogaster. In addition, ChIP/seq data are available for 6 of the principal regulators in both D. melanogaster and D. Yakuba, see Browse ChIP/seq data.

These data are produced as follows. One hour collections of wild type embryos are aged to the appropriate developmental stage and then crosslinked. The chromatin is purified by CsCl buoyant density centrifugation and sonicated to an average length of around 600 bp. DNA fragments bound by specific factors are then immunoprecipitated using affinity purified antibodies. Where possible, separate immunoprecipitations are performed for each factor using distinct antibodies recognizing non overlapping epitopes of the protein. Control immunoprecipitations of crosslinked chromatin in the presence of normal IgG are also performed. Duplicate immunoprecipitations are performed for each anti factor antibody and IgG control.

For ChIP/chip the DNA samples from chromatin immunoprecipitation and the input chromatin are then amplified, labeled and hybridized to affymetrix tiling arrays that contain over 3 million 5 features representing 25 bp sequences spaced on average 35 bp apart across the unique sequences of the D. melanogaster genome, for detailed protocols see here, and also Li et al 2008. The array data is first analyzed using a comprehensive package of command line java applications for mapping, normalizing, testing, filtering, printing, plotting, comparing, and annotating results. This package, TiMAT, identifies regions of the genome significantly enriched in the immunoprecipitation (called intervals or bound regions) and provides False Discovery Rate estimations for each region using two methods (the symmetric null test and the experimental IgG IP test). TiMAT also identifies the location of the peaks of local maximum array hybridization, the most intense peak in each interval being designated as the primary peak. TiMAT is freely available as open source software.

In addition, secondary analysis scripts take bound regions identified by TiMAT and characterizes then further to, for example, discover the locations of factor recognition sites and determine if these are functionally conserved across Drosophila species, or to determine the genomic features and gene classes associated with each primary peak. These analyses were designed to take into account and to illustrate the fact that animal sequence specific transcription factors bind a quantitative continum of hetrogenous genomic regions, from highly bound functional target elements to poorly bound regions, many of which may have no direct transcripitional function, see Li et al 2008 and MacAurthur et al 2009.

For ChIP/seq, the DNA libraries for sequencing were prepared from immunoprecipitated DNA or from input DNA samples using the genomic-DNA or ChIP-DNA sample preparation kits from Illumina. The libraries were quantified by Q_PCR, and were sequenced on the Solexa/Illumina platform. The details of DNA library preparation and quantification are provided here. The data were used to show that there are pervasive quantitative changes in transcription fatcor binding between closely related Drosophila species, see Bradley et al 2010.