Help
User's guide
Browsing
WholeCellSimDB is very easy to use. Use the "Browse" menu at the top-left to browse the organisms and simulation batches available in WholeCellSimDB.
Searching
Three methods are available to search WholeCellSimDB:
- Basic full-text search: use the search box at the top-right of each page to search the organism, simulation batch, and investigator metadata.
- Google full-text search: use the search box at the top-right of each page to search the entire WholeCellSimDB site.
- Advanced structured search: use the "Search" menu at the top-left to access the advanced search form which allows users to search simulation batches by metadata, option and parameter values, and by modeled processes and states
Visualizing simulations
Three methods are available to visualize predicted phenotypes:
- The home page provides a graphical interface to browse all of the predicted phenotypes across all available simulations, and select up to ten to plot.
- The simulation pages (e.g. simulation #1) provide a graphical interface to browse all of the predicted phenotypes for a given simulation, and select up to ten to plot.
- The batch-state pages (e.g. Wild-type set #1 growth) plot the predicted phenotype across all simulations within a batch.
Exporting simulations
Three methods are available to export simulations from WholeCellSimDB:
- Users can use the download form to download simulations for selected simulation batches in HDF5 format.
- Users can use the HDF5 icon at the bottom-right of most pages to export the simulation data described on each page. This allows uers to export entire organisms, simulation batches, and simulations. This also allows users to export states and properties across all simulations in WholeCellSimDB.
- Users can use the get_data_series web service to export the predicted phenotype of a particular rows/columns of a particular states of a particular simulations in HDF5, JSON, BSON, or MessagePack format. The web service requires two query parameters:
- format: string indicating desired export format: "hdf5", "json", "bson", "msgpack", or "numl"
- data_series: JSON formatted array indicating desired simulations/states. Each element of the array must be a object containing four fields:
- simulation: integer indicating id of desired simulation
- property: integer indicating id of desired state property
- row: integer indicating id of desired state property row
- col: integer indicating id of desired state property column
Simulation, property, row, and column ids can be obtained using the list_data_series web service.
See below for more information about the HDF5 schema used to export simulations.
Developer's guide
Installing your own WholeCellSimDB server and storing simulations
Fellow the steps below to install WholeCellSimDB on your own server, and save simulations to your own WholeCellSimDB server. In addition, a fully configured version of WholeCellSimDB is available on the whole-cell virtual machine.
Obtain WholeCellSimDB from GitHub, and save to /path/to/WholeCellSimDB/.
Install the required software packages:
apache
gcc
libhdf5-dev
mod_wsgi
mysql
python 2.6
python-numpy
python-scipy
python-matplotlib
xapian
Install the required python packages:
bson
django
django-extensions
django-haystack
h5py
numpy
pytz
scipy
u-msgpack-python
Create a new database and a new user, and grant the new user priveleges to the new database:
mysql >> create database wholecellsimdb; mysql >> create user 'wholecellsimdb'@'localhost' identified by '9KNYdUnQaSFSUDCa'; mysql >> grant all privileges on wholecellsimdb.* to 'wholecellsimdb'@'localhost'; mysql >> flush privileges;
Create a directory to store the HDF5 simulation data, and grant permissions to the webserver daemon:
mkdir /path/to/WholeCellSimDB/wcdb/data sudo chown :<apache-user> /path/to/WholeCellSimDB/wcdb/data sudo chmod ug+rw /path/to/WholeCellSimDB/wcdb/data
Create a temporary directory, and grant permissions to the webserver daemon:
mkdir /path/to/WholeCellSimDB/tmp sudo chown :<apache-user> /path/to/WholeCellSimDB/tmp sudo chmod ug+rw /path/to/WholeCellSimDB/tmp
Create a file to store error logs:
mkdir /path/to/WholeCellSimDB/log sudo chown :<apache-user> /path/to/WholeCellSimDB/log sudo chmod ug+rw /path/to/WholeCellSimDB/log
Edit the settings file /path/to/WholeCellSimDB/WholeCellSimDB/settings.py
Edit the root URL settings: ROOT_URL = 'http://domain/url/to/WholeCellSimDB'
Edit the ALLOWED_HOSTS settings: ALLOWED_HOSTS = ['domain']
Edit the HDF5 data directory setting: HDF5_ROOT = '/path/to/WholeCellSimDB/wcdb/data'
Edit the temporary directory setting: TMP_DIR = '/path/to/WholeCellSimDB/tmp'
Edit the database settings. See the Django documentation for more information.
DATABASES.default.NAME = 'wholecellsimdb' DATABASES.default.USER = 'wholecellsimdb' DATABASES.default.PASSWORD = '9KNYdUnQaSFSUDCa' DATABASES.default.HOST = ''
Configure your apache webserver. Add the following to your apache configuration (e.g. /etc/apache2/sites-enabled/WholeCellSimDB):
WSGIDaemonProcess default processes=2 threads=25 WSGIDaemonProcess wholecellsimdb:1 threads=1 WSGIDaemonProcess wholecellsimdb:2 threads=1 SetEnv PROCESS_GROUP default WSGIProcessGroup %{ENV:PROCESS_GROUP} WSGISocketPrefix /var/run/wsgi Alias /url/to/WholeCellSimDB/static /path/to/WholeCellSimDB/wcdbweb/static <Location "/path/to/WholeCellSimDB/wcdbweb/static"> Order allow,deny Allow from all </Location> WSGIScriptAlias /url/to/WholeCellSimDB /path/to/WholeCellSimDB/WholeCellSimDB/wsgi.py <Directory /path/to/WholeCellSimDB/WholeCellSimDB> WSGIApplicationGroup %{RESOURCE} WSGIRestrictProcess wholecellsimdb:1 wholecellsimdb:2 SetEnv PROCESS_GROUP wholecellsimdb:1 AddHandler wsgi-script .py Options ExecCGI Order allow,deny Allow from all </Directory>
Restart your webserver (e.g. sudo /etc/init.d/apache2 restart)
Create database tables
cd /path/to/WholeCellSimDB python manage.py syncdb
Build search indices and grant permissions to webserver daemon:
python manage.py rebuild_index sudo chown -R :<apache-user> /path/to/WholeCellSimDB/wcdbsearch/indexes sudo chmod -R ug+rw /path/to/WholeCellSimDB/wcdbsearch/indexes
Navigate to the WholeCellSimDB web frontend: http://domain/url/to/WholeCellSimDB
Storing simulations
Simulations can be imported into WholeCellSimDB via either the command line or programmatic (Python) interface. Simulations must be saved using the HDF5 and SED-ML formats described below.
This code snippet illustrates how to load simulations via the command line interface:
cd /path/to/WholeCellSimDB python wcdbcli/save_simulation_batch.py -d /path/to/simulation-batch-1 -c /path/to/simulation-batch-1/changes.xml python wcdbcli/save_simulation_batch.py -d /path/to/simulation-batch-2 -c /path/to/simulation-batch-2/changes.xml python wcdbcli/save_simulation_batch.py -d /path/to/simulation-batch-3 -c /path/to/simulation-batch-3/changes.xml
This code snippet shows how to load simulations via the programmatic (Python) interface:
import os import sys # Add the project to your path. sys.path.append('/path/to/WholeCellSimDB') # Import the models from wcdb import models #options path_to_simuation_batch = '/path/to/simulation-batch' n_simulations = 128 # Load simulation batch models.SimulationBatch.objects.create_simulation_batch( os.path.join(path_to_simuation_batch, '1.h5'), os.path.join(path_to_simuation_batch, 'changes.xml')) # Load individual simulations for idx in range(1, n_simulations + 1): models.Simulation.objects.create_simulation(os.path.join(path_to_simuation_batch, '%d.5' % idx))
Simulation data file format
WholeCellSimDB imports and exports simulation data using the following format.
First, simulations must be organized into batches. A batch is a set of simulations run using the same code, same options, and same parameter values. Individual simulations within a batch should differ only in their random number generator seeds. Batches must be stored as a folder, with HDF5 files for each simulation numbered sequentially starting from 1 containing the predicted phenotypes for each simulation.
Second, each simulation (including its metadata and predicted phenotypes) must be stored using the HDF5 format using the following schema (red indicates HDF groups, blue indicates HDF datasets, cyan indicates HDF dataset values, green indicates HDF dataset attribute containers, grey indicates individual HDF dataset attributes). An example simulation is available here.
- attrs:
- batch__investigator__user__first_name: First name of researcher who ran simulation
- batch__investigator__user__last_name: Last name of researcher who ran simulation
- batch__investigator__user__email: Email address of researcher who ran simulation
- batch__investigator__affiliation: Affiliation of researcher who ran simulation
- batch__organism__name: Name of simulated organism
- batch__organism_version: Version of simulated organism (e.g. revision of code)
- batch__name: Name of simulation batch
- batch__description: Description of simulation batch
- batch__ip: IP address of machine which ran simulation
- batch__date: Date when simulation was ran (YYYY-MM-DD HH:MM:SS)
- options: group containing datasets for each global option, and sub-groups containing sub-sub-groups containing datasets for each state and process option
- global-option-1-name: dataset representing the first global option
- value: Value of option
- attrs
- units: Units of option
- global-option-2-name: dataset representing the second global option
- value: Value of option
- attrs
- units: Units of option
- …
- states: group containing sub-groups containing datasets for each state option
- state-1-name: group containing datasets for each state option
- state-1-option-1: dataset representing the first state's first option
- value: Value of option
- attrs
- units: Units of option
- state-1-option-2: dataset representing the first state's second option
- value: Value of option
- attrs
- units: Units of option
- …
- state-1-option-1: dataset representing the first state's first option
- state-2-name: group containing datasets for each state option
- state-2-option-1: dataset representing the second state's first option
- value: Value of option
- attrs
- units: Units of option
- state-2-option-2: dataset representing the second state's second option
- value: Value of option
- attrs
- units: Units of option
- …
- state-2-option-1: dataset representing the second state's first option
- …
- state-1-name: group containing datasets for each state option
- processes: group containing sub-groups containing datasets for process state
- process-1-name: group containing datasets for each process option
- process-1-option-1: dataset representing the first process' first option
- value: Value of option
- attrs
- units: Units of option
- process-1-option-2: dataset representing the first process' second option
- value: Value of option
- attrs
- units: Units of option
- …
- process-1-option-1: dataset representing the first process' first option
- process-2-name: group containing datasets for each process option
- process-2-option-1: dataset representing the second process' first option
- value: Value of option
- attrs
- units: Units of option
- process-2-option-2: dataset representing the second process' second option
- value: Value of option
- attrs
- units: Units of option
- …
- process-2-option-1: dataset representing the second process' first option
- …
- process-1-name: group containing datasets for each process option
- global-option-1-name: dataset representing the first global option
- parameters: group containing datasets for each global parameter, and sub-groups containing sub-sub-groups containing datasets for each state and process parameter
- global-parameter-1-name: dataset representing the first global parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- global-parameter-2-name: dataset representing the second global parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- …
- states: group containing sub-groups containing datasets for each state parameter
- state-1-name: group containing datasets for each state parameter
- state-1-parameter-1: dataset representing the first state's first parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- state-1-parameter-2: dataset representing the first state's second parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- …
- state-1-parameter-1: dataset representing the first state's first parameter
- state-2-name: group containing datasets for each state parameter
- state-2-parameter-1: dataset representing the second state's first parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- state-2-parameter-2: dataset representing the second state's second parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- …
- state-2-parameter-1: dataset representing the second state's first parameter
- …
- state-1-name: group containing datasets for each state parameter
- processes: group containing sub-groups containing datasets for process state
- process-1-name: group containing datasets for each process parameter
- process-1-parameter-1: dataset representing the first process' first parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- process-1-parameter-2: dataset representing the first process' second parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- …
- process-1-parameter-1: dataset representing the first process' first parameter
- process-2-name: group containing datasets for each process parameter
- process-2-parameter-1: dataset representing the second process' first parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- process-2-parameter-2: dataset representing the second process' second parameter
- value: Value of parameter
- attrs
- units: Units of parameter
- …
- process-2-parameter-1: dataset representing the second process' first parameter
- …
- process-1-name: group containing datasets for each process parameter
- global-parameter-1-name: dataset representing the first global parameter
- processes: group containing sub-groups for each process
- process-1-name: group repesenting process 1
- process-2-name: group repesenting process 2
- …
- states: group containing sub-groups for each state containing datasets for each predicted phenotype
- state-1-name: group containing datasets for each predicted phenotype
- property-1-name: group representing the first state's first predicted phenotype
- data: dataset representing the phenotype's predicted value
- value: NumPy ndarray containing phenotype's predicted value
- units: dataset representing the phenotype's units
-
value: string indicating phenotype's units
- labels: group contain sub-group for each dimension's labels
- 0
- value: list of labels for first dimension
- 1
- value: list of labels for first dimension
- …
- 0
- data: dataset representing the phenotype's predicted value
- property-2-name: group representing the first state's second predicted phenotype
- data: dataset representing the phenotype's predicted value
- value: NumPy ndarray containing phenotype's predicted value
- units: dataset representing the phenotype's units
-
value: string indicating phenotype's units
- labels: group contain sub-group for each dimension's labels
- 0
- value: list of labels for first dimension
- 1
- value: list of labels for first dimension
- …
- 0
- data: dataset representing the phenotype's predicted value
- …
- property-1-name: group representing the first state's first predicted phenotype
- state-2-name: group containing datasets for each predicted phenotype
- property-1-name: group representing the second state's first predicted phenotype
- data: dataset representing the phenotype's predicted value
- value: NumPy ndarray containing phenotype's predicted value
- units: dataset representing the phenotype's units
-
value: string indicating phenotype's units
- labels: group contain sub-group for each dimension's labels
- 0
- value: list of labels for first dimension
- 1
- value: list of labels for first dimension
- …
- 0
- data: dataset representing the phenotype's predicted value
- property-2-name: group representing the second state's second predicted phenotype
- data: dataset representing the phenotype's predicted value
- value: NumPy ndarray containing phenotype's predicted value
- units: dataset representing the phenotype's units
-
value: string indicating phenotype's units
- labels: group contain sub-group for each dimension's labels
- 0
- value: list of labels for first dimension
- 1
- value: list of labels for first dimension
- …
- 0
- data: dataset representing the phenotype's predicted value
- …
- property-1-name: group representing the second state's first predicted phenotype
- …
- state-1-name: group containing datasets for each predicted phenotype
Optionally, changes to the default simulation options and parameters can be described using the SED-ML format described below.
Simulation change SED-ML format
Changes to the default simulation options and parameter values can be recorded by passing a SED-ML file to wcdbcli/save_simulation_batch.py. The XML snippet below illustrates how to use SED-ML to encode the modified option and parameter values.
<sedML xmlns="http://sed-ml.org/sed-ml/level1/version2" level="1" version="2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://sed-ml.org/sed-ml-L1-V2.xsd" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:math="http://www.w3.org/1998/Math/MathML" > <listOfModels> <model ... > <listOfChanges> <!-- global options --> <changeAttribute target="option-name/@value[index]" newValue="new-value" /> ... <!-- global parameters --> <changeAttribute target="parameter-name/@value[index]" newValue="new-value" /> ... <!-- process options --> <changeAttribute target="processes.process-name.option-name/@value[index]" newValue="new-value" /> ... <!-- process parameters --> <changeAttribute target="processes.process-name.parameter-name/@value[index]" newValue="new-value" /> ... <!-- state options --> <changeAttribute target="processes.state-name.option-name/@value[index]" newValue="new-value" /> ... <!-- state parameters --> <changeAttribute target="processes.state-name.parameter-name/@value[v]" newValue="new-value" /> ... </listOfChanges> </model> </listOfModels> </sedML>
Converting simulations from MAT-File to HDF5 format
Simulations can be converted from the original MAT-File format used by the M. genitalium whole-cell model (see User Guide section 2.2 and DiskLogger.m) using following script:
cd /path/to/WholeCellSimDB organism="Mycoplasma genitalium" simulation_batch_name="Wild-type set #1" path_to_simulation_batch=/path/to/simulation-batch n_simulations=128 for i in {1..$n_simulations} do python wcdbcli/convert_mat_to_hdf5.py \ -o "${organism}" \ -n "${simulation_batch}" \ -d ${path_to_simulation_batch}/${i} \ -i ${i} done
The following script can be used to convert simulations using a cluster. The Perl script uses the job script template wcdbcli/convert_mat_to_hdf5.sh.tmpl to submit jobs to a cluster schedule. The job scripts in turn call wcdbcli/convert_mat_to_hdf5.py.
cd /path/to/WholeCellSimDB organism="Mycoplasma genitalium" simulation_batch_name="Wild-type set #1" path_to_simulation_batch=/path/to/simulation-batch n_simulations=128 for i in {1..$n_simulations} do wcdbcli/convert_mat_to_hdf5.pl \ -o "${organism}" \ -n "${simulation_batch}" -d ${path_to_simulation_batch}/${i} -i ${i} done
Constructing advanced visualizations using the Python API
See the Python API gallery for examples of how to use Python API to construct more advanced visualizations. Note: The Python API is not publicly accessible. Researchers must install WholeCellSimDB on their own machines to use the Python API.