Berkeley Drosophila Transcription Network Project


Berkeley Quantitative Genome Browser User Manual

Back to TOC

Graph Types

Overview

When viewing any given graph or set of graph tracks, there are three primary organization questions addressed by BQGB: does this track contain scored data or annotations only, what feature is represented in this track and what sequence is associated with the current stack of tracks?

BQGB divides representations into two general categories: score displays and annotation dispalys. In both cases, data may be bound to a single position or shown over a range. Score data may also have annotations, meta-data, etc... associated with a record, but data whose only numeric attribute is position, ie. lacking range information will be shown as an annotation display. The screenshot below shows two annotation tracks and a score track.


A screenshot showing default graphs for score data and annotation data; glyphs are kept simple to speed browsing (shorter render time) and avoid distracting complexity. Default colors may be changed, but have been chosen to fall in ranges to which the human eye is sensitive.


Generally when a file is read into memory, it will be partitioned into features, ie. data bound to some similar morphology, ontology or other classificaiton such as "genes" and "exons". How this is done depends on the file type. In most cases, each feature will be displayed in its own track. In the screenshot above, the annotations, though coming from the same file, have been broken into two tracks: one showing gene and showing coding regions. BQGB considers a "track" to be a group of data sharing an ordinate (ie. on the same y-axis whether there's score data or not or if the score is shown along a z-axis or as heat map). It is, of course, possible to write graph types that display multiple features within a single track such as the oft seen bar and line annotations. BQGB also offers bar and line annotations, for instance using thicker and thinner lines to denote coding and non-coding regions of a gene. To display such a graph, however, the software must have additional knowledge of the relationships within the data. Currently, such relationship definitions are supported via gff attributes, the parsing rules provided by the user via the annotation preferences dialog (choose from menu File->Preferences).

Only a single given sequence definition may be viewed among a stack of tracks, ie. if there are three tracks open and the currently selected sequence is Dmel 2L, all three tracks should be showing 2L data. BQGB offers a synonym tool, please see tools page for more information on the synonym setter. Its existence arose from the comparing of two data sets that chose different naming conventions, such as "2L" and "chr2L". Once the synonym for "2L" has been set to "chr2L", sequences bearing either "2L" or "chr2L" may be viewed in the same track stack. This can be helpful in a pinch, but dangerous! Be careful, it's possible to set biologically meaningless synonyms!

In order to improve navigation fluidity, ie. smoothness of scrolling and zooming, each display type generally considers data density and pixel availability when rendering. Two separate strategies are used when when data density exceeds pixel availability. First, data will be subsampled, ie. when data becomes so dense that many data points would be drawn on the same pixel, some data is dropped. This may be done with or without regard to what data is dropped and generally, when these thresholds are exceeded, BQGB just subsamples at some fixed interval, for instance every third record. Second, displays may choose to draw more or less information depending on zoom level. For instance, when there's no space to draw a gene's name, it may not be drawn. As with many maps, particularly digital maps, as zoom increases, so does detail.


Default annotation display

The image below shows a default annotation display with strands broken into separate tracks. Since, of course, features such as genes may overlap, coloring is set to a moderate opacity with overlapping regions showing more darkly.

A left mouse click within a feature will cause associated meta-data to be displayed in the Info tool box. A right mouse click provides a mechanism for highlighting the feature across the track stack. Highlights can subsequently be exported in GFF format; follow File->Export->Highlights to find this functionality.


The default annotation display with strands split into separate tracks: the opacity on the feature glyps is set low to allow the viewing of overlapping features.


Default score display

The image below shows two score stracks. The data in the top track has a score associated with a single base pair position. In the lower track, scores are associated with overlapping windows. The opacity of a window is set low to facilitate viewing of overlaps.

A left mouse click will show position information on the info bar at the bottom of the window including base pair position and the ordinate location of the mouse click on the current score axis's scale.


The default score track for single base pair data (top) and score data that covers a window of base pairs (bottom). In this case the top track is from a chIP-Chip experiment and the lower track the results of a linkage disequilibrium analysis with 10k bp overlapping windows.


Bar and line annotation display

Bar and line displays allow rendering of related features in a single glyph such as gene and exon or gene and coding sequence plus alternate transcripts.

Of course, these relationships must be defined in the source data file. BQGB uses ID/PARENT keys defined in a GFF's attribute field. To set these relationships, open the preferences dialog via the File menu (File->Preferences). Since the relationships are established when the file is read into memory, begin by setting the source file. Next, select "Bar and Line Relation Notation" from the "Default Graph Type" drop-down widget. Under "Relations", choose whether the source file's attributes contain two-tier relations (no alternate transcripts) or three-tier relations (alternate transcripts). Finally, provide the feature names and the attribute names exactly as they appear in the source file.

In the bar and line notation screenshot example below, the glyph is read as follows. The thin line shows the entire length of a gene. The blue boxes below show each coding sequence alternate transcript. The black boxes hanging off the thin line are the sum ("anding") of all alternate coding sequence transcripts. As with other annotations, when resolution and space allows, gene name and strand are shown.


The Bar and Line graph allows drawing of multiple related features onto a single track, assuming such relations are explicit in the source data file (ex. via GFF attributes). Here, the long thin black lines shows genes. The blue rectangles below show alternate transcripts while the thick black bars are the logical and of all alternate transcripts.


Data density display

When zoomed out, there is often more data to show than pixel space on which to show it. BQGB uses a variety of mechanisms to handle this problem while striving to maintain smooth navigation. First, many graph types use different glyphs for high resolution and low resolution, with lower resolution glyphs being simpler and faster to render. Second, at low resolution, BQGB may subsample when data density is above the Nyquist threshold. In general this shouldn't cause trouble since only course impressions can be gleaned at low resolution, such as the general trend of scores in a region hundreds of thousands of base pairs in length. Still, subsampling does throw out data. Another strategy for viewing data at distant zoom levels is via a histogram, ie. how many data points are there within a set bin window?

The data density display presents this histogram as a heat map, associating opacity with the amount of data in a bin window. The darker the bar, the more data found within that region.

In the screenshot below, the lower track uses the data density graph type, providing the density histogram of the score track above it.


The data denstiy plot is an alternate view of any other graph type; it is a heat map histogram simply showing the number of data points per bin as darker bands, ie. where are the data rich regions.


Sequence display

Viewing sequence data usually means one has set the zoom level to a fairly high resolution such as no more than a few thousand base pairs in width. Even at a few thousand base pairs, it is still not possible to present base pair composition alphabetically. At these resolution levels, BQGB will bin base pair compositions and present the sequence as a box showing either GC content or percent purines. The bin length and the toggle between GC content and purine composition is controlled in the "Track Appearence" dialog tool. Remember to double click on the graph one wishes to modify to activate its controls.

In the the screenshot below, the top track shows GC content by 8-mer windows (the default). At even higher resolutions, nucleotides will be draw alphabetically as well.


The top track shows sequence. At high zoom levels, base pair composition is given. At intermediate zoom levels, the track can be set to show either GC content or % purines in an n base pair length window.


Switching between graph types

To switch a graph's type, begin by double clicking on the graph. Next, go to the "Track Appearence" tool and select the tab "Graph". The dropdown menu "Graph type" provides the options and the type of selected graphs should automatically change when this setting is toggled. Obviously, not all graph types are approriate for all types of data. Score data can be shown as an annotation graph minus score information, but annotations without scores will not show in a score track (though the graph type will change). Data without inter-feature relations set via the "Preferences" will not display as Bar and Line plots.

Many of the graph types have custom controls. A double-click selected graph's controls can also be manipulated within the "Track Appearence" tool dialog. For instance, sequence plots may be set to display GC content or purine composition, bin size can be manipulated for data density plots and any graph type may be forced to specific scales and locations via the "Axis" controls.

Adding new graph types

Similar to adding support for new file formats, adding new graph types requires writing C++ code. New classes will generally begin by extending the DataTrackWidget class or its children, the ScoreTrackWidget or AnnotationTrackWidget classes. Please see the programming notes and the source code itself.

Back to TOC