3D Gene Expression

Expression Atlases
Search PointClouds
Browse PointClouds
Data Releases

   Project Goals

   Staining & Mounting
   Probe Constructs


File Formats:
   PointCloud (.pce)
   VirtualEmbryo (.vpc)
   Correspondence files

PointCloud File Format

The information on the blastoderm nuclei for a single embryo is written to a text file with a '.pce' extension. This file is compatible with the 'comma-separated values' format, and should be readable in applications such as Excel. Meta-data is written to a header in lines starting with a '#' character. These lines conform to a certain syntax. Lines starting with a double '##' are comments. The actual comma-separated values form a table in which each row contains data for a single nucleus, and each column contains a specific measurement, such as coordinates, volume, average expression intensity, local density of nuclei, etc. What each column means is indicated by the meta-data, together with some additional information. Some columns have a fixed meaning, such as x, y and z for the coordinates.

Each row contains one value for each column, plus a varying number of values that define the neighborhood mesh. The first number past the expected number of columns is the number of neighbors, and that is the number of values that should come after. These values indicate the ID of the nuclei that are considered direct neighbors.

Meta-data lines are of the form:

   # key = value

where key is the name of a property, and value can be either a number, a string or an array of numbers or strings. A value can also be empty. A string is distinguished from a number by enclosing it in double-quotes ("). An array is enclosed in square brackets ([]), with the elements in a row separated by commas and the rows separated by semi-colons (see the sample excerpts).

These are the property names currently used:

name (string)
Name of the image from which the data is derived.
pointcloud_quality (integer)
Quality score as given in the database.
stage (string)
stage_percent (number)
Subdivision of the stage, after optional correction.
original_percent (number)
The subdivision of the stage as entered.
phenotype_string (string)
The genotype and phenotype in human-readable form.
genotype_id (integer)
Database ID for the genotype.
phenotype_number (integer)
Database ID for the phenotype.
phenotype_string (string)
Human-readable string that combines genotype and phenotype.
column (string array)
Gives a name for each column.
column_info (string array)
Array in which each row gives info on a group of columns, whose name start the same.
column_info_bid (string array)
Array that gets concatenated to column_info.
attenuation_offset (numeric array)
An offset to add to the nuclear stain intensity when performing attenuation correction on each of the gene expression channels.
attenuation_correct (string array)
Name of the dye that uses each number in attenuation_offset .
nuclear_stain (string)
Name of the dye used to stain nuclei. This should match the name of one of the columns.
column_factors (numeric array)
Gives the multiplication factor used for each column to convert from image units to PointCloud units.
nuclear_count (integer)
Number of nuclei in the file (i.e. number of rows in the comma-separated values file).
translate (numeric array)
4-by-4 transformation matrix for translation to standard pose (moves center of mass to origin).
rotate1 (numeric array)
4-by-4 transformation matrix for rotation to standard pose (aligns a/p axis with x-axis).
scale (numeric array)
4-by-4 transformation matrix for scaling to standard pose (scales the a/p axis to unit length).
rotate2 (numeric array)
4-by-4 transformation matrix for rotation to standard pose (rotates around the a/p axis).
DVrotation (number)
Orientation of the embryo with respect to the optical axis. This is the angle used to generate the rotate2 matrix.
release (numeric array)
Releases this PointCloud belongs to.
segmentation_stats (numeric array)
Various numbers used for debugging the segmentation algorithm.
image_rotation (number)
Amount in radians the image was rotated before segmentation.
image_boundingbox (numeric array)
The bounding box of the crop applied to the image before segmentation.
intensity_correction (string array)
Operation performed on each of the channels before segmentation and measurement. Shows the estimated background values and bleedthrough values.
channel_offset (numeric array)
Amplifier offset as noted in the input image file.
channel_gain (numeric array)
PMT gain as noted in the input image file.
automated_quality (integer)
An automatically computed segmentation quality score. The value pointcloud_quality can differ from this if changed manually in BID.
automated_attenuation_offsets (numeric array)
Some computed numbers from which the numbers in attenuation_offset are later computed.

column name

Each column has a name that specifies it's meaning. There is a standard set of columns that the user is expected to know the meaning of, and a few groups of additional columns that are explained in the header. These additional columns include gene expression level measurements and local nuclear density measurements.

The standard set of columns are as follows:

Nucleus ID.
x, y, z
Coordinates of the center of mass of the nucleus, in micron.
Nx, Ny, Nz
Direction of the surface normal at the nucleus.
Volume of the nucleus (in cubic micron).
Volume of the cytoplasmic region estimated to belong to the nucleus, including the nucleus itself (in cubic micron).
Volume of the apical cytoplasmic region (in cubic micron).
Volume of the basal cytoplasmic region (in cubic micron).

info on a group of columns

The column_info string array gives information about a group of related data columns. Each channel yields four different measurements: apical, basal, nuclear and cellular expression levels. Thus, there are four different columns for each channel. These columns are named as follows: the dye name, followed by an underscore ('_'), followed by a string (e.g. 'apical'). One row of the column_info array will contain this dye name, and some additional information:

["Coumarin", "mRNA", "ftz", "Fushi-tarazu", "Gene Expression", "apical"]

"Coumarin" is the dye name, "mRNA" specifies this is an mRNA stain, the other options are "Protein", "Chemical Dye" or the empty string "", if it is not a stain. The next two columns are the gene name, first in short-hand, then the full name. "Gene Expression" indicates these columns contain gene expression data. Other options include: "Subcellular Feature: Nuclei" and "Derived Morphology". Finally, the 6th value gives the default column to use. So of all columns that start with Coumarin, the one called Coumarin_apical is the one we would preferably use.

All PointCloud files also contain local density measurements. These have names that start with density, and their column_info row is as follows:

["density","","density","Nuclear Density","Derived Morphology","15"]

meaning that de default column to use is density_15.

Sample excerpts from a PointCloud File