Help

User's guide

Browsing

WholeCellSimDB is very easy to use. Use the "Browse" menu at the top-left to browse the organisms and simulation batches available in WholeCellSimDB.

Searching

Three methods are available to search WholeCellSimDB:

  • Basic full-text search: use the search box at the top-right of each page to search the organism, simulation batch, and investigator metadata.
  • Google full-text search: use the search box at the top-right of each page to search the entire WholeCellSimDB site.
  • Advanced structured search: use the "Search" menu at the top-left to access the advanced search form which allows users to search simulation batches by metadata, option and parameter values, and by modeled processes and states

Visualizing simulations

Three methods are available to visualize predicted phenotypes:

  • The home page provides a graphical interface to browse all of the predicted phenotypes across all available simulations, and select up to ten to plot.
  • The simulation pages (e.g. simulation #1) provide a graphical interface to browse all of the predicted phenotypes for a given simulation, and select up to ten to plot.
  • The batch-state pages (e.g. Wild-type set #1 growth) plot the predicted phenotype across all simulations within a batch.

Exporting simulations

Three methods are available to export simulations from WholeCellSimDB:

  • Users can use the download form to download simulations for selected simulation batches in HDF5 format.
  • Users can use the HDF5 icon at the bottom-right of most pages to export the simulation data described on each page. This allows uers to export entire organisms, simulation batches, and simulations. This also allows users to export states and properties across all simulations in WholeCellSimDB.
  • Users can use the get_data_series web service to export the predicted phenotype of a particular rows/columns of a particular states of a particular simulations in HDF5, JSON, BSON, or MessagePack format. The web service requires two query parameters:
    • format: string indicating desired export format: "hdf5", "json", "bson", "msgpack", or "numl"
    • data_series: JSON formatted array indicating desired simulations/states. Each element of the array must be a object containing four fields:
      • simulation: integer indicating id of desired simulation
      • property: integer indicating id of desired state property
      • row: integer indicating id of desired state property row
      • col: integer indicating id of desired state property column
    • Simulation, property, row, and column ids can be obtained using the list_data_series web service.

See below for more information about the HDF5 schema used to export simulations.

Developer's guide

Installing your own WholeCellSimDB server and storing simulations

Fellow the steps below to install WholeCellSimDB on your own server, and save simulations to your own WholeCellSimDB server. In addition, a fully configured version of WholeCellSimDB is available on the whole-cell virtual machine.

  1. Obtain WholeCellSimDB from GitHub, and save to /path/to/WholeCellSimDB/.

  2. Install the required software packages:

    • apache

    • gcc

    • libhdf5-dev

    • mod_wsgi

    • mysql

    • python 2.6

    • python-numpy

    • python-scipy

    • python-matplotlib

    • xapian

  3. Install the required python packages:

    • bson

    • django

    • django-extensions

    • django-haystack

    • h5py

    • numpy

    • pytz

    • scipy

    • u-msgpack-python

  4. Create a new database and a new user, and grant the new user priveleges to the new database:

    mysql >> create database wholecellsimdb;
    mysql >> create user 'wholecellsimdb'@'localhost' identified by '9KNYdUnQaSFSUDCa';
    mysql >> grant all privileges on wholecellsimdb.* to 'wholecellsimdb'@'localhost';
    mysql >> flush privileges;
    

  5. Create a directory to store the HDF5 simulation data, and grant permissions to the webserver daemon:

    mkdir /path/to/WholeCellSimDB/wcdb/data
    sudo chown :<apache-user> /path/to/WholeCellSimDB/wcdb/data
    sudo chmod ug+rw /path/to/WholeCellSimDB/wcdb/data
    

  6. Create a temporary directory, and grant permissions to the webserver daemon:

    mkdir /path/to/WholeCellSimDB/tmp
    sudo chown :<apache-user> /path/to/WholeCellSimDB/tmp
    sudo chmod ug+rw /path/to/WholeCellSimDB/tmp
    

  7. Create a file to store error logs:

    mkdir /path/to/WholeCellSimDB/log
    sudo chown :<apache-user> /path/to/WholeCellSimDB/log
    sudo chmod ug+rw /path/to/WholeCellSimDB/log
    

  8. Edit the settings file /path/to/WholeCellSimDB/WholeCellSimDB/settings.py

    • Edit the root URL settings: ROOT_URL = 'http://domain/url/to/WholeCellSimDB'

    • Edit the ALLOWED_HOSTS settings: ALLOWED_HOSTS = ['domain']

    • Edit the HDF5 data directory setting: HDF5_ROOT = '/path/to/WholeCellSimDB/wcdb/data'

    • Edit the temporary directory setting: TMP_DIR = '/path/to/WholeCellSimDB/tmp'

    • Edit the database settings. See the Django documentation for more information.

      DATABASES.default.NAME = 'wholecellsimdb'
      DATABASES.default.USER = 'wholecellsimdb'
      DATABASES.default.PASSWORD = '9KNYdUnQaSFSUDCa'
      DATABASES.default.HOST = ''
      

  9. Configure your apache webserver. Add the following to your apache configuration (e.g. /etc/apache2/sites-enabled/WholeCellSimDB):

    WSGIDaemonProcess default processes=2 threads=25
    WSGIDaemonProcess wholecellsimdb:1 threads=1
    WSGIDaemonProcess wholecellsimdb:2 threads=1
    SetEnv PROCESS_GROUP default
    WSGIProcessGroup %{ENV:PROCESS_GROUP}
    WSGISocketPrefix /var/run/wsgi
    
    Alias /url/to/WholeCellSimDB/static /path/to/WholeCellSimDB/wcdbweb/static
    <Location "/path/to/WholeCellSimDB/wcdbweb/static">
        Order allow,deny
        Allow from all
    </Location>
    
    WSGIScriptAlias /url/to/WholeCellSimDB /path/to/WholeCellSimDB/WholeCellSimDB/wsgi.py
    <Directory /path/to/WholeCellSimDB/WholeCellSimDB>
        WSGIApplicationGroup %{RESOURCE}
        WSGIRestrictProcess wholecellsimdb:1 wholecellsimdb:2
        SetEnv PROCESS_GROUP wholecellsimdb:1
        AddHandler wsgi-script .py
    
        Options ExecCGI
        Order allow,deny
        Allow from all
    </Directory>
    

  10. Restart your webserver (e.g. sudo /etc/init.d/apache2 restart)

  11. Create database tables

    cd /path/to/WholeCellSimDB
    python manage.py syncdb
    

  12. Build search indices and grant permissions to webserver daemon:

    python manage.py rebuild_index
    sudo chown -R :<apache-user> /path/to/WholeCellSimDB/wcdbsearch/indexes
    sudo chmod -R ug+rw /path/to/WholeCellSimDB/wcdbsearch/indexes
    

  13. Navigate to the WholeCellSimDB web frontend: http://domain/url/to/WholeCellSimDB

Storing simulations

Simulations can be imported into WholeCellSimDB via either the command line or programmatic (Python) interface. Simulations must be saved using the HDF5 and SED-ML formats described below.

This code snippet illustrates how to load simulations via the command line interface:

cd /path/to/WholeCellSimDB
python wcdbcli/save_simulation_batch.py -d /path/to/simulation-batch-1 -c /path/to/simulation-batch-1/changes.xml
python wcdbcli/save_simulation_batch.py -d /path/to/simulation-batch-2 -c /path/to/simulation-batch-2/changes.xml
python wcdbcli/save_simulation_batch.py -d /path/to/simulation-batch-3 -c /path/to/simulation-batch-3/changes.xml

This code snippet shows how to load simulations via the programmatic (Python) interface:

import os
import sys

# Add the project to your path.
sys.path.append('/path/to/WholeCellSimDB')

# Import the models
from wcdb import models

#options
path_to_simuation_batch = '/path/to/simulation-batch'
n_simulations = 128

# Load simulation batch
models.SimulationBatch.objects.create_simulation_batch(
    os.path.join(path_to_simuation_batch, '1.h5'), 
    os.path.join(path_to_simuation_batch, 'changes.xml'))
    
# Load individual simulations
for idx in range(1, n_simulations + 1):
    models.Simulation.objects.create_simulation(os.path.join(path_to_simuation_batch, '%d.5' % idx))

Simulation data file format

WholeCellSimDB imports and exports simulation data using the following format.

First, simulations must be organized into batches. A batch is a set of simulations run using the same code, same options, and same parameter values. Individual simulations within a batch should differ only in their random number generator seeds. Batches must be stored as a folder, with HDF5 files for each simulation numbered sequentially starting from 1 containing the predicted phenotypes for each simulation.

Second, each simulation (including its metadata and predicted phenotypes) must be stored using the HDF5 format using the following schema (red indicates HDF groups, blue indicates HDF datasets, cyan indicates HDF dataset values, green indicates HDF dataset attribute containers, grey indicates individual HDF dataset attributes). An example simulation is available here.

  • attrs:
    • batch__investigator__user__first_name: First name of researcher who ran simulation
    • batch__investigator__user__last_name: Last name of researcher who ran simulation
    • batch__investigator__user__email: Email address of researcher who ran simulation
    • batch__investigator__affiliation: Affiliation of researcher who ran simulation
    • batch__organism__name: Name of simulated organism
    • batch__organism_version: Version of simulated organism (e.g. revision of code)
    • batch__name: Name of simulation batch
    • batch__description: Description of simulation batch
    • batch__ip: IP address of machine which ran simulation
    • batch__date: Date when simulation was ran (YYYY-MM-DD HH:MM:SS)
  • options: group containing datasets for each global option, and sub-groups containing sub-sub-groups containing datasets for each state and process option
    • global-option-1-name: dataset representing the first global option
      • value: Value of option
      • attrs
        • units: Units of option
    • global-option-2-name: dataset representing the second global option
      • value: Value of option
      • attrs
        • units: Units of option
    • states: group containing sub-groups containing datasets for each state option
      • state-1-name: group containing datasets for each state option
        • state-1-option-1: dataset representing the first state's first option
          • value: Value of option
          • attrs
            • units: Units of option
        • state-1-option-2: dataset representing the first state's second option
          • value: Value of option
          • attrs
            • units: Units of option
      • state-2-name: group containing datasets for each state option
        • state-2-option-1: dataset representing the second state's first option
          • value: Value of option
          • attrs
            • units: Units of option
        • state-2-option-2: dataset representing the second state's second option
          • value: Value of option
          • attrs
            • units: Units of option
    • processes: group containing sub-groups containing datasets for process state
      • process-1-name: group containing datasets for each process option
        • process-1-option-1: dataset representing the first process' first option
          • value: Value of option
          • attrs
            • units: Units of option
        • process-1-option-2: dataset representing the first process' second option
          • value: Value of option
          • attrs
            • units: Units of option
      • process-2-name: group containing datasets for each process option
        • process-2-option-1: dataset representing the second process' first option
          • value: Value of option
          • attrs
            • units: Units of option
        • process-2-option-2: dataset representing the second process' second option
          • value: Value of option
          • attrs
            • units: Units of option
  • parameters: group containing datasets for each global parameter, and sub-groups containing sub-sub-groups containing datasets for each state and process parameter
    • global-parameter-1-name: dataset representing the first global parameter
      • value: Value of parameter
      • attrs
        • units: Units of parameter
    • global-parameter-2-name: dataset representing the second global parameter
      • value: Value of parameter
      • attrs
        • units: Units of parameter
    • states: group containing sub-groups containing datasets for each state parameter
      • state-1-name: group containing datasets for each state parameter
        • state-1-parameter-1: dataset representing the first state's first parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
        • state-1-parameter-2: dataset representing the first state's second parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
      • state-2-name: group containing datasets for each state parameter
        • state-2-parameter-1: dataset representing the second state's first parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
        • state-2-parameter-2: dataset representing the second state's second parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
    • processes: group containing sub-groups containing datasets for process state
      • process-1-name: group containing datasets for each process parameter
        • process-1-parameter-1: dataset representing the first process' first parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
        • process-1-parameter-2: dataset representing the first process' second parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
      • process-2-name: group containing datasets for each process parameter
        • process-2-parameter-1: dataset representing the second process' first parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
        • process-2-parameter-2: dataset representing the second process' second parameter
          • value: Value of parameter
          • attrs
            • units: Units of parameter
  • processes: group containing sub-groups for each process
    • process-1-name: group repesenting process 1
    • process-2-name: group repesenting process 2
  • states: group containing sub-groups for each state containing datasets for each predicted phenotype
    • state-1-name: group containing datasets for each predicted phenotype
      • property-1-name: group representing the first state's first predicted phenotype
        • data: dataset representing the phenotype's predicted value
          • value: NumPy ndarray containing phenotype's predicted value
        • units: dataset representing the phenotype's units
            value: string indicating phenotype's units
        • labels: group contain sub-group for each dimension's labels
          • 0
            • value: list of labels for first dimension
          • 1
            • value: list of labels for first dimension
      • property-2-name: group representing the first state's second predicted phenotype
        • data: dataset representing the phenotype's predicted value
          • value: NumPy ndarray containing phenotype's predicted value
        • units: dataset representing the phenotype's units
            value: string indicating phenotype's units
        • labels: group contain sub-group for each dimension's labels
          • 0
            • value: list of labels for first dimension
          • 1
            • value: list of labels for first dimension
    • state-2-name: group containing datasets for each predicted phenotype
      • property-1-name: group representing the second state's first predicted phenotype
        • data: dataset representing the phenotype's predicted value
          • value: NumPy ndarray containing phenotype's predicted value
        • units: dataset representing the phenotype's units
            value: string indicating phenotype's units
        • labels: group contain sub-group for each dimension's labels
          • 0
            • value: list of labels for first dimension
          • 1
            • value: list of labels for first dimension
      • property-2-name: group representing the second state's second predicted phenotype
        • data: dataset representing the phenotype's predicted value
          • value: NumPy ndarray containing phenotype's predicted value
        • units: dataset representing the phenotype's units
            value: string indicating phenotype's units
        • labels: group contain sub-group for each dimension's labels
          • 0
            • value: list of labels for first dimension
          • 1
            • value: list of labels for first dimension

Optionally, changes to the default simulation options and parameters can be described using the SED-ML format described below.

Simulation change SED-ML format

Changes to the default simulation options and parameter values can be recorded by passing a SED-ML file to wcdbcli/save_simulation_batch.py. The XML snippet below illustrates how to use SED-ML to encode the modified option and parameter values.

<sedML
    xmlns="http://sed-ml.org/sed-ml/level1/version2" level="1" version="2"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://sed-ml.org/sed-ml-L1-V2.xsd"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:math="http://www.w3.org/1998/Math/MathML"
    >   
    <listOfModels>
        <model ... >
            <listOfChanges>
                <!-- global options -->
                <changeAttribute target="option-name/@value[index]" newValue="new-value" />
                ...
                <!-- global parameters -->
                <changeAttribute target="parameter-name/@value[index]" newValue="new-value" />
                ...
                <!-- process options -->
                <changeAttribute target="processes.process-name.option-name/@value[index]" newValue="new-value" />
                ...
                <!-- process parameters -->
                <changeAttribute target="processes.process-name.parameter-name/@value[index]" newValue="new-value" />
                ...
                <!-- state options -->
                <changeAttribute target="processes.state-name.option-name/@value[index]" newValue="new-value" />
                ...
                <!-- state parameters -->
                <changeAttribute target="processes.state-name.parameter-name/@value[v]" newValue="new-value" />
                ...
            </listOfChanges>
        </model>
    </listOfModels>
</sedML>

Converting simulations from MAT-File to HDF5 format

Simulations can be converted from the original MAT-File format used by the M. genitalium whole-cell model (see User Guide section 2.2 and DiskLogger.m) using following script:

cd /path/to/WholeCellSimDB

organism="Mycoplasma genitalium"
simulation_batch_name="Wild-type set #1"
path_to_simulation_batch=/path/to/simulation-batch
n_simulations=128

for i in {1..$n_simulations}
do
    python wcdbcli/convert_mat_to_hdf5.py \
        -o "${organism}" \
        -n "${simulation_batch}" \
        -d ${path_to_simulation_batch}/${i} \
        -i ${i}
done

The following script can be used to convert simulations using a cluster. The Perl script uses the job script template wcdbcli/convert_mat_to_hdf5.sh.tmpl to submit jobs to a cluster schedule. The job scripts in turn call wcdbcli/convert_mat_to_hdf5.py.

cd /path/to/WholeCellSimDB

organism="Mycoplasma genitalium"
simulation_batch_name="Wild-type set #1"
path_to_simulation_batch=/path/to/simulation-batch
n_simulations=128

for i in {1..$n_simulations}
do
    wcdbcli/convert_mat_to_hdf5.pl \
        -o "${organism}" \
        -n "${simulation_batch}" 
        -d ${path_to_simulation_batch}/${i} 
        -i ${i}
done

Constructing advanced visualizations using the Python API

See the Python API gallery for examples of how to use Python API to construct more advanced visualizations. Note: The Python API is not publicly accessible. Researchers must install WholeCellSimDB on their own machines to use the Python API.

Need help?

Please contact the development team at wholecell@lists.stanford.edu.