TiMAT ChIP Chip Usage (Versions < 3.2)
Here is an outline on how to use TiMAT to process your ChIP-Chip Affymetrix tiling microarray data.
For ChIP-chip experiments:
Two biological replicas for the anti-transcription factor antibody (ie hybridize two independent IPs to two chips)
Two biological replicas for a mock IgG IP.
Two biological replicas for just input chromatin.
The Affymetrix chips are very consistent. Technical repeats are unnecessary and can emphasize false positives in certain testing situations.
2) Skip mismatch transformation for ChIP-Chip experiments. Just median scale intensity values to 100 and if deemed necessary, quantile normalize across all chips.
3) Use the ScatterPlot app to check the correlation coefficient between your biological controls. For IPs, the r value should be >0.8, for input chromatin >0.9.
If they are less than these values your IP/ PCR/ labeling protocols must be optimized. It cannot be emphasized enough to not proceed until these
values have been achieved! Otherwise, considerable time and money will be wasted. Garbage in = garbage out.
1) A text version bpmap file (tpmap) or 1lq file.
2) Text version cel files. Use Affymetrix' http://www.affymetrix.com/support/developer/downloads/Tools/CelFileConversion.ZIP tool to convert from binary files.
3) Chromosome FASTA formated sequence files (ie for dmel chr2L.fasta,
chr3R.fasta....). The names and fasta headers of the chromosome files must correspond to the names given in
the tpmap file. Be absolutely certain you are using the same release used in constructing
the tpmap file. Alternatively follow the instructions below to rebuild the tpmap file
4) (optional) A multi-FASTA file containing aligned trimmed examples of binding sites for
your IPed transcription factor
1) Java version 1.4 or greater.
2) These command line applications have been tested on MacOSX
and Linux. They have not been tested on a Windows machine.
All the source code, with extensive
documentation, is included in the TiMAT package.
3) A computer with > 2 GigaBytes RAM. One can use a machine with less memory but some
programs will run rather slowly, hours instead of minutes.
A) (Optional) Rebuild the tpmap file
1) Split the text version bpmap file (tpmap) into many files, one for each node in your cluster,
2) Run "BPMapOligoBlastFilter" on each of the files simultaneously.
3) Combine the filtered files using "FileJoiner".
4) Sort the combined, filtered bpmap file with "BPMapSort".
If you are uninterested in 1bp mismatches, use "MummerMapper" to rebuild your tpmap file. It is 100x faster and does not require a cluster.
B) TPMap processing
1) Run the "TPMapProcessor" on the tpmap to window the data. (For Drosophila chIP chip data, try a max length window 675bp, minimum number of oligos 10.)
For every experiment...
C) Cel processing
1) Convert your text cel files into a serialized float files with the "ConvertCelFiles" app.
2) Run the "VirtualCel" application on a directory of converted cel files (xxx.cela) to create image
files for visual inspection. Any large spots should be masked using the "CelMasker" app.
3) Run the "CelProcessor" on sets of xxx.cela files to scale, transform, and normalize the data.
4) Use the "ScatterPlot" to draw a simple scatter plot and calculate a Pearson correlation
coefficient on processed cel files. For example, compare the different treatment chips to
one another. There should be a good internal correlation (>=0.8).
5) Alternatively, run the "CelFileQualityControl" to QC large numbers of files.
D) Test and process results
1) Run the "ScanChip" app to score overlapping windows of oligos intensities along each chromosome.
2) If you have performed mock IPs, use the "FDRWindowConverter" app to calculate an empirical FDR for each window.
3) Merge high scoring, overlapping Windows into Intervals with the "IntervalMaker" program. The biggest difficulty is where to set the
threshold for merging windows. Two FDR estimations are provided by TiMAT: an empirical FDR based on a mock IP and a statistical
FDR based on Richard Bourgon's non-parametric symetric p-test. Set these generously and manually filter. If you have multiple replicas,
or have used different antibodies you can merge the different Window arrays using the "MultiWindowIntervalMaker."
4) Load the Intervals with oligo information using "LoadIntervalOligoInfo".
5) Find the best average intensity difference sub window within each Interval, as well as enrichment peaks
6) (Optional) Score Intervals for the presence of a transcription factor using
"ScoreIntervals". You can also score whole chromosomes or a set of FASTA sequences using
"ScoreChromosomes" and "ScoreSequences" respectively.
After Loading intervals and finding the best sub window...
7) Filter Intervals with a variety of parameters using "IntervalFilter". Sorts intervals
into pass and fail.
8) Compare and split different Interval sets based on their overlap with the
E) Print results
1) Print interval plots for processed intervals using "IntervalPlotter". These are graphic
representations of the individual oligo intensities for each data set, the averaged treatment
chip, the averaged control chip, the intensity difference, the intensity ratio, smoothed
(trimmed mean) ratios, as well as
hits to a position specific probability matrix, number of matches, and the best window, and the best
sub window. These plots can be saved to
disk as PNG files or manipulated directly to print chromosomal coordinates for a point or
region (including the genomic sequence).
2) Print interval reports in a spread sheet or page format using "IntervalReportPrinter".
3) Print Intervals as .sgr files for import into Affymetrix's IGB browser with
4) Print all the oligo values as .sgr files for import into IGB using "OligoIntensityPrinter".
5) Print any serialize int (ie processed cel files) associated with each oligo in the
tpmap file as .sgr files for import into IGB using "IntensityPrinter".
6) Print Binding Regions from the "IntervalPlotter" as .sgr files for import into Affymetrix's
IGB browser with "BindingRegionGraphPrinter".
7) Print the Intervals in GFF3 format with "IntervalGFFPrinter."
1) Visually pick binding regions with a mouse using the "IntervalPlotter". The "FindSubBindingRegions"
program will automatically identify binding peaks with some reliability but cannot be trusted to correctly
picks the flanks of the peak.
2) Use the "AnnotateRegions" application to fetch information about genes surrounding
the binding region picks from the "IntervalPlotter". Given the gross inconsistencies between GFF
file formats, this is currently only configured to work with the fly chips.
3) Compare lists of regions with the "RankedSetAnalysis" application. It creates a visual representation of region intersection/ overlap using box-line-boxes.