Here is an outline on how to use TiMAT to process your ChIP-Chip Affymetrix tiling microarray data.
The applications are designed to be user friendly and require only moderate familiarity
with command line programs. When in doubt type
'java program_name_minus_the_.class_extension' from the directory TiMAT directory to pull up the parameters for a particular
program. Example 'java IntervalPlotter', not 'java IntervalPlotter.class' nor 'java IntervalPlotter.java'.
Recommendations:
For ChIP-chip experiments:
1)Perform:
Two biological replicas for the anti-transcription factor antibody (ie hybridize two independent IPs to two chips)
Two biological replicas for a mock IgG IP.
Two biological replicas for just input chromatin.
The Affymetrix chips are very consistent. Technical repeats are unnecessary and can emphasize false positives in certain testing situations.
2) Skip mismatch transformation for ChIP-Chip experiments. Just median scale intensity values to 50.
3) Use the ScatterPlot application to check the correlation coefficient between your biological controls. The IPs should be >0.8, the input chromatin >0.9.
If they are less than these values you should work on optimizing your IP/ PCR/ labeling protocols and repeat the ChIP-chip experiment.
4) Skip quantile normalization unless you are forced to work with poor data, ie correlation coefficients < 0.8, where it might help.
5) Use the SkepticalSumIntensityTest to score the windows. Use the trimmed mean ratio value to build intervals.
Rank intervals by the median ratio of the best sub window. Examine the Interval plots and spread sheet statistics to get a sense of
where the data becomes questionable. A base line cutoff can be estimated in some experiments when you have a measure of true positives.
These should cluster near the top of the list.
Required Files:
1) A text version bpmap file
2) Text version cel files
3) (optional) Chromosome FASTA formated sequence files (ie for dmel chr2L.fasta,
chr3R.fasta....). The names of the chromosome files must correspond to the names given in
the bpmap file. Be absolutely certain you are using the same release used in constructing
the bpmap file. Alternatively follow the instructions below to rebuild the bpmap file
(recommended).
4) (optional) A multi-FASTA file containing aligned trimmed examples of binding sites for
your favorite transcription factor
Required Resources:
1) Java version 1.4 or greater.
2) These command line applications have been tested on MacOSX
and Linux. Since the programs are written in Java, they should run any where, with slight
tweaking to match file system abnormalities. They have not been tested on a Windows machine.
All the source code, with extensive
documentation, is included in the TiMAT package.
3) A computer with > 2 GigaBytes RAM. One can use a machine with less memory but some
programs will run rather slowly, hours instead of minutes.
Processing Protocol:
Do once...
A) (Optional) Rebuild the bpmap file - These files are available the latest dmel release.
1) Split the text version bpmap file into many files, one for each node in your cluster,
using "FileSplitter".
2) Run "BPMapOligoBlastFilter" on each of the files simultaneously using a cluster.
3) Combine the 40 filtered files using "FileJoiner".
4) Sort the combined, filtered bpmap file with "BPMapSort".
If you are uninterested in 1bp mismatches, use "MummerFilter" to rebuild your bpmap file. It is 100x faster.
B) BPMap processing - This is available for the latest dmel release.
1) Run the "BPMapProcessor" on the bpmap filter to window the data and average unique
duplicates. (Try max length window 675bp, minimum number of oligos 15.)
For every experiment...
C) Cel processing
1) Run the "VirtualCel" application on a directory of .cel files (text version) to create image
files for visual inspection. Any large spots should be masked using the "CelMasker" application.
2) Run the "CelProcessor" on sets of .cel files (text versions) to process the data with
various options, quantile normalize, average duplicated spotted oligos, mis match
transformation, etc.
3) Use the "ScatterPlot" to draw a simple scatter plot and calculate a Pearson correlation
coefficient on processed cel files. For example, compare the different treatment chips to
one another. There should be a good internal correlation. Same for the control chips.
D) Test and process results
1) Run the "SumIntensityTest" or the "SkepticalSumIntensityTest" (recommended) to
score the processed cel files and bpmap files for significance.
2) Where applicable, run the WindowScanner to calculate the number of windows that pass a range of cutoff scores.
For ChIP-Chip experiments, we use this information to calculate a false positive rate for each cutoff. For example,
compare the number of windows returned from the IgG vs input chromatin to the number of windows returned from the real
antibody vs input chromatin experiment. At each cutoff calculate the number of IgG/Input divided by the total number of
windows returned at that cutoff (IgG/Input + real/Input). This ratio is a rough proxy for the false positive rate at a given cut off.
We typically choose a cutoff that allows a 1% false positive rate with windows. When making intervals from these
windows the false positive rate increases to 5%.
3) Merge high scoring, overlapping Windows into Intervals with the "IntervalMaker" program.
4) Load the Intervals with oligo information using "LoadIntervalOligoInfo".
5) (Optional) Score Intervals for the presence of a transcription factor using
"ScoreIntervals". You can also score whole chromosomes or a set of FASTA sequences using
"ScoreChromosomes" and "ScoreSequences" respectively.
6) (Optional) Find the best average intensity difference sub window within each Interval
using "FindSubBindingRegions".
After Loading intervals and finding the best sub window...
7) Filter Intervals with a variety of parameters using "IntervalFilter". Sorts intervals
into pass and fail.
8) Compare and split different Interval sets based on their overlap with the
"OverlapCounter".
E) Print results
1) Print interval plots for processed intervals using "IntervalPlotter". These are graphic
representations of the individual oligo intensities for each data set, the averaged treatment
chip, the averaged control chip, the intensity difference, the intensity ratio, smoothed
(trimmed mean) ratios, as well as
hits to a position specific probability matrix, number of 1bp mismatches, and the best statistical
scoring window, intensity difference window, and sub window. These plots can be saved to
disk as PNG files or manipulated directly to print chromosomal coordinates for a point or
region (including the genomic sequence).
2) Print interval reports in a spread sheet or page format using "IntervalReportPrinter".
3) Print Intervals as .sgr files for import into Affymetrix's IGB browser with
"IntervalGraphPrinter".
4) Print all the oligo values as .sgr files for import into IGB using "OligoIntensityPrinter".
5) Print any serialize int[] (ie processed cel files) associated with each oligo in the
bpmap file as .sgr files for import into IGB using "IntensityPrinter".
6) Print Binding Regions from the "IntervalPlotter" as .sgr files for import into Affymetrix's
IGB browser with "BindingRegionGraphPrinter".
F) Interpretation
1) Visually pick binding regions with a mouse using the "IntervalPlotter". The "FindSubBindingRegions"
program will automatically identify binding peaks with some reliability but cannot be trusted to correctly
picks the flanks of the peak.
2) Use the "AnnotateRegions" application to fetch information about genes surrounding
the binding region picks from the "IntervalPlotter". Given the gross inconsistencies between GFF
file formats, this is currently only configured to work with the fly chips.