3D Gene Expression
Staining & Mounting
PointCloud File Format
The information on the blastoderm nuclei for a single embryo
is written to a text file with a '.pce' extension. This file is
compatible with the 'comma-separated values' format, and should
be readable in applications such as Excel. Meta-data is written
to a header in lines starting with a '#' character. These lines
conform to a certain syntax. Lines starting with a double '##'
are comments. The actual comma-separated values form a table
in which each row contains data for a single nucleus, and each
column contains a specific measurement, such as coordinates,
volume, average expression intensity, local density of nuclei,
etc. What each column means is indicated by the meta-data,
together with some additional information. Some columns have a
fixed meaning, such as x,
y and z for the coordinates.
Each row contains one value for each column, plus a varying
number of values that define the neighborhood mesh. The first
number past the expected number of columns is the number of
neighbors, and that is the number of values that should come
after. These values indicate the ID of the nuclei that are
considered direct neighbors.
Meta-data lines are of the form:
# key = value
where key is the name of a property, and value
can be either a number, a string or an array of numbers or strings.
A value can also be empty. A string is distinguished from a number
by enclosing it in double-quotes ("). An array is enclosed in
square brackets (), with the elements in a row separated by commas
and the rows separated by semi-colons (see the sample excerpts).
These are the property names currently used:
- name (string)
- Name of the image from which the data is derived.
- pointcloud_quality (integer)
- Quality score as given in the database.
- stage (string)
- stage_percent (number)
- Subdivision of the stage, after optional correction.
- original_percent (number)
- The subdivision of the stage as entered.
- phenotype_string (string)
- The genotype and phenotype in human-readable form.
- genotype_id (integer)
- Database ID for the genotype.
- phenotype_number (integer)
- Database ID for the phenotype.
- phenotype_string (string)
- Human-readable string that combines genotype and phenotype.
- column (string array)
- Gives a name for each column.
- column_info (string array)
- Array in which each row gives info on a group of columns, whose name start the same.
- column_info_bid (string array)
- Array that gets concatenated to column_info.
- attenuation_offset (numeric array)
- An offset to add to the nuclear stain intensity when performing attenuation correction on each of the gene expression channels.
- attenuation_correct (string array)
- Name of the dye that uses each number in attenuation_offset .
- nuclear_stain (string)
- Name of the dye used to stain nuclei. This should match the name of one of the columns.
- column_factors (numeric array)
- Gives the multiplication factor used for each column to convert from image units to PointCloud units.
- nuclear_count (integer)
- Number of nuclei in the file (i.e. number of rows in the comma-separated values file).
- translate (numeric array)
- 4-by-4 transformation matrix for translation to standard pose (moves center of mass to origin).
- rotate1 (numeric array)
- 4-by-4 transformation matrix for rotation to standard pose (aligns a/p axis with x-axis).
- scale (numeric array)
- 4-by-4 transformation matrix for scaling to standard pose (scales the a/p axis to unit length).
- rotate2 (numeric array)
- 4-by-4 transformation matrix for rotation to standard pose (rotates around the a/p axis).
- DVrotation (number)
- Orientation of the embryo with respect to the optical axis. This is the angle used to generate the rotate2 matrix.
- release (numeric array)
- Releases this PointCloud belongs to.
- segmentation_stats (numeric array)
- Various numbers used for debugging the segmentation algorithm.
- image_rotation (number)
- Amount in radians the image was rotated before segmentation.
- image_boundingbox (numeric array)
- The bounding box of the crop applied to the image before segmentation.
- intensity_correction (string array)
- Operation performed on each of the channels before segmentation and measurement. Shows the
estimated background values and bleedthrough values.
- channel_offset (numeric array)
- Amplifier offset as noted in the input image file.
- channel_gain (numeric array)
- PMT gain as noted in the input image file.
- automated_quality (integer)
- An automatically computed segmentation quality score. The value pointcloud_quality can differ from this if changed manually in BID.
- automated_attenuation_offsets (numeric array)
- Some computed numbers from which the numbers in attenuation_offset are later computed.
Each column has a name that specifies it's meaning. There is a standard
set of columns that the user is expected to know the meaning of, and a
few groups of additional columns that are explained
in the header. These additional columns include gene expression level
measurements and local nuclear density measurements.
The standard set of columns are as follows:
- Nucleus ID.
- x, y, z
- Coordinates of the center of mass of the nucleus, in micron.
- Nx, Ny, Nz
- Direction of the surface normal at the nucleus.
- Volume of the nucleus (in cubic micron).
- Volume of the cytoplasmic region estimated to belong to the nucleus, including the nucleus itself (in cubic micron).
- Volume of the apical cytoplasmic region (in cubic micron).
- Volume of the basal cytoplasmic region (in cubic micron).
info on a group of columns
The column_info string array gives information about a group of related data
columns. Each channel yields four different measurements: apical, basal, nuclear and cellular
expression levels. Thus, there are four different columns for each channel. These columns
are named as follows: the dye name, followed by an underscore ('_'), followed by a string
(e.g. 'apical'). One row of the column_info array will contain this dye name, and
some additional information:
["Coumarin", "mRNA", "ftz", "Fushi-tarazu", "Gene Expression", "apical"]
"Coumarin" is the dye name, "mRNA" specifies this is an mRNA stain,
the other options are "Protein", "Chemical Dye" or the empty string
"", if it is not a stain. The next two columns are the gene name, first in
short-hand, then the full name. "Gene Expression" indicates these columns
contain gene expression data. Other options include: "Subcellular Feature: Nuclei"
and "Derived Morphology". Finally, the 6th value gives the default column to
use. So of all columns that start with Coumarin, the one called Coumarin_apical
is the one we would preferably use.
All PointCloud files also contain local density measurements. These have names that
start with density, and their column_info row is as follows:
["density","","density","Nuclear Density","Derived Morphology","15"]
meaning that de default column to use is density_15.
Sample excerpts from a PointCloud File