3D Gene Expression
VirtualEmbryo File Format
VirtualEmbryo files are text files with a '.vpc' extension and contain composited average expression data for a set of embryos in an Gene Expression Atlas. The syntax extends that of individual PointCloud files in order to also describe temporal data for a set of discrete time points as well as additional metadata describing the collection of individual PointClouds that went into building an atlas.
This file format is compatible with the 'comma-separated values' (.csv) format, and should be readable in applications such as Excel. Meta-data is written to a header in lines starting with a '#' character. These lines conform to a certain syntax specified below. Lines starting with a double '##' are comments. The actual comma-separated values form a table in which each row contains data for a single nucleus, and each column contains a specific measurement, such as coordinates, average expression level for a given gene and time, local density of nuclei, etc. The meaning of each column can be derived from the meta-data contained in the comments.
The row for a given 'virtual nucleus' contains one value for each data column, followed by a variable number of columns that specify the list of neighbors for that nucleus. The first number after the set of data columns is the number of neighbors nn which is followed by nn more entries that indicate the ID of nuclei which are direct neighbors. This data is used in rendering the embryo and may be useful for modeling propagation of factors between nuclei.
Meta-data lines are of the form:
# key = value
where key is the name of a property, and value can be either a number, a string or an array of numbers or strings. A value can also be empty. A string is distinguished from a number by enclosing it in double-quotes ("). An array is enclosed in square brackets (), with the elements in a row separated by commas and the rows separated by semi-colons (see examples below).
These are the property names currently used in the VirtualEmbryo file:
Each column has a name that specifies its meaning. There is a standard set of columns that the user is expected to know the meaning of and additional columns that are described in column_info. These additional columns include gene expression level measurements and local nuclear density measurements.
The standard set of columns used in VirtualEmbryo files are as follows:
The column_info string array gives information about a group of related data columns corresponding to a particular gene expression level over time. Each gene listed in column_info has one or more corresponding columns in the data array. A gene may have measurements associated with up to K time points where K is the length of the cohort_names array. For a gene, each time point for which data was collected has a corresponding column of the form columnname_subcolumn_timepoint. Currently the subcolumn isn't used in the VirtualEmbryo (only in PointClouds) so column names typically appear with two underscores, e.g. "bcd__6" would be bcd data for the sixth time point. Note that a gene may not have been measured at all time points, in which case those columns will not appear in the file.
The entries in each row of column_info specify in order,