This documentation provides detailed guides for installing up a new copy of CDIbase, modifying the code behind the software, or doing something else rather technical. That being said, most researchers can simply use the less technical tutorials to accomplish their work.
CDIbase, Python software written with Flask and (by default) SQLite3, can go into almost any modern web application environment. The highly configurable solution also follows strict testing and documentation standards to help those that need to add new code. This project is currently in beta.
CDIbase runs almost anywhere Python web applications go. However, deploying a web application requires some technical sophistication. Not comfortable setting up CDIbase yourself? Contact us and we may be able to help!
Anyway, the hardest parts of getting CDIbase setup are often regulatory and not technical. So, please check security, privacy, ethical, and related standards for your appropriate regulatory bodies as CDIbase cannot guarantee compliance with IRB and other institutional standards given the specifics of individual studies. Still, this guide can offer best practices and initial guidance given lessons from past installations of CDIbase.
While many institutions will not provide guidance or policies for most or all of these points, this guide suggests checking if your research institution has:
Even if using a cloud service, download a recent copy of Pip and Python. It makes the rest of this easier.
Due to regulatory, privacy, and ethical obligations, many institutions and regulatory bodies require self-hosting of research data. However, for lucky labs whose policies allow for safe use of cloud services, this guide advises against self-hosting for increased security, redundancy, convenience, and cost-effectiveness. For those electing to use cloud services:
Note that the application does need to take file uploads. If you are looking to use S3 or blobstore, let us know and we can provide an adapter. We hope to have that rolled into the regular CDIbase release early 2015.
Previous successful deployments have favored the nginx engine on a *nix system (just use Mac or Linux) with Gunicorn and Supervisor.
Note that the application does need to take file uploads containing non-sensitive configuration data. Please be sure to set the permissions on the uploads directory accordingly or let us know if you would like to target S3 or blobstore and we can provide an adapter. We hope to have that adapter rolled into the regular CDIbase release early 2015.
Some institutions and regulatory bodies may require encrypting data in motion (moving between server and users' computers). Regardless, as an ethical and security obligation, this guide very strongly suggests installing SSL certificates and encrypting data moving between sever and users using TLS. If labs have encryption requirements for data in motion, this may likely satisfy their responsibilities. However, this guide / project cannot guarantee compliance. Anyway...
Some institutions and regulatory bodies may require encrypting data at rest (data while on the server or cached on the client). Labs may need additional security affordances depending on the specifics of the corresponding policies but this guide suggests sqlcipher as a possible transparent method for encrypting data server-side. If using SQLite, maybe consider using the DB API 2.0-compliant pysqlcipher. Regardless, labs can integrate pysqlchipher or alternative DB API 2.0-compliant database drivers by modifying the DB adapter in prog_code/util/db_util.py
Privacy and security standards exist to protect institutions, researchers, and study participants. These important standards and regulations take some time to manage but, historically, CDIbase has enjoyed excellent case studies of increasing productivity / efficiency and decreasing costs for its users after deployment. So, in our biased opinion, we believe in the benefits resulting from overcoming those upfront regulatory hurdles. :) Again, in case of trouble, contact us and we might be able to help.
With that, now for our obligatory disclaimer... This guide and CDIbase's larger community cannot provide legal advice and labs must ultimately make their own technical decisions for which this community / guide / project cannot be held responsible. As modern application hosting and research standards just involve too many variables, this purely informational documentation serves only as an initial reference guide for the community. Moreover, in linking to multiple external resources, this guide / community / project cannot guarantee or be held responsible for that external content in its current or future iterations. Finally, this guide emphasizes that individual labs must decide hosting strategies for themselves within their regulatory obligations and this guide only offers initial direction.
CDIbase uses a SQLite database by default though labs can use any DB API 2.0 compliant adapter like psycopg for PostgreSQL.
Obviously familiarity with Python or a similar language can help clean up and transform data. However, we also suggest...
Disclaimer: This project and its developers are not responsible for these third party tools.
If not done already, create a new database for CDIbase. Using the default engine (SQLite) haven't created the database yet?
In addition to whether a word was said / signed / known (depending on study) or not, each CDI response should have the following:
By default, CDIbase uses the following values for CDI responses. Remember, individual researchers can use presentation formats to convert these encodings. So, true needs to be 1 for CDIbase but a researcher could ask CDIbase to generate a CSV with "t" for true instead.
Most labs will not need to use any additional or alternate encodings but, for integrating with other software, the other available encoding values are listed in the Understanding encoding values section.
Don't want to use 1 for male, 2 for female, and 3 for other gender? CDIbase can do that.
In transitioning to CDIbase, a very tiny number of labs may have a more complex data migration process and will want to have CDIbase use non-standard values for encoding said, not said, true, false, male, female, etc. Unless labs know they require non-standard encoding values (because of a shared SQL database, dataset inconsistencies, etc.), this guide suggests against configuring them. That being said, labs electing for non-standard encoding values should read the understanding encoding values section. Still, remember that individual researchers can use presentation formats to convert these encodings. So, if using the default CDIbase encodings where true needs to be 1, a researcher could still ask CDIbase to generate a CSV with "t" for true instead to integrate with this software / preferences.
After transforming existing lab data to use the above values or the non-standard encodings custom configured, either write a script to read old lab data into the new database or, if using a SQLite database but missing programming skill, prepare the lab data as a series of CSV files and use SQLite's built in CSV import functionality. Refer to database schema below for additional guidance.
Enter the CDIs in use by your lab. See the non-technical tutorials for more information.
Enter at least a "standard" presentation format. See the non-technical tutorials for more information.
Complete information about database tables and relationships. Note that data migration only needs to manually fill the snapshot_content and snapshots tables.
To disambiguate some concepts, the database uses the following vocabulary:
Information about a snapshot (completed CDI). Columns include:
Information about a response to an individual word / item in a snapshot. If a CDI form contains word1 and word2, a snapshot for that CDI type will include 1 snapshot row and 2 snapshot_content rows. A many-to-one relationship exists between snapshot_content and snapshots.
(This table does not need to be filled during data migration.) The application provides a programmatically accessible interface for outside software to automate tasks in CDIbase. To use the API, users must have an API key. Each of this table's rows contains information about user's API key. Columns:
(This table does not need to be filled manually during data migration.) This table contains all of the types of CDIs available to the application. This includes all the CDIs that parents can complete through CDIbase and all of the CDIs for which responses are available through the application. Each of this table's rows contains information about a single CDI type. Columns:
(This table does not need to be filled during data migration.) CDIbase can send CDI forms to parents and older participants to complete online. This table holds information about each form waiting to be completed by a parent / participant. Note that CDIbase removes records from this table as parents / participants complete their electronic CDIs. Columns:
(This table does not need to be filled manually during data migration.) This table contains information for all of the CDI percentiles, allowing CDIbase to automatically calculate percentiles for snapshots given the snapshot CDI, the participant's age, the gender of the participant, and the number of words reported as spoken / signed / known. Columns:
(This table does not need to be filled manually during data migration.) Presentation formats allow CDIbase to report the same data with different encodings for common values like said, not said, true, false, male, female, etc. These formats help support legacy software and applications / code in use by individual researchers without needing to keep multiple copies of the CDI database. Columns:
Users can define CDI formats through YAML (intro to YAML) specification files (example simple CDI specification, example of complex / multilingual CDI specification). Its components:
Presentation formats provide alternate encoding schemes. Does some lab software use "boy" while the rest uses "male" for gender? CDIbase can not only store multiple types of CDIs across many studies but its presentation formats switch between encodings, allowing individual researchers to choose between values like "true" or "1" without duplicating data.
Examples of presentation format YAML files:
Meaning of common fields:
A presentation format then has mappings where keys (attributes) are word values from the CDI format to convert and the corresponding values indicate what should be reported in downloads for those word values.
Meaning of uncommon fields (mostly used for legacy data and values from before migration).
The API is limited to sending CDIs to parents by email. However, we are more than happy to add additional API endpoints as needed. Just contact us and let us know what you want to see!
Send single CDI form to parent by email
POST /api/v0/send_parent_form?api_key={{key}}
All of the following additional query parameters are required:
At least one of the two is required:
Any of the following can be optionally specified:
Send a CDI form to many parents by email
POST /api/v0/send_parent_form?api_key={{key}}
All of the following additional query parameters are required:
At least one of the two is required:
Any of the following can be optionally specified:
Great! First, we suggest familiarity with the following:
See the project repository for more details on how to get started. Don't forget about our issue tracker. Pull requests welcome!
Most labs can use the default settings for CDIbase and will not require non-standard encoding values because presentation formats can provide alternate encodings to support individual researcher preferences and legacy software. For example, a researcher's software relying on "t" for true and "f" for false can use presentation formats providing "t" for true and "f" for false instead of the CDIbase's defaults of "0" and "1". However, some labs may need to change the encoding values stored to the underlying database. Reasonable use cases include
Those that do need non-standard encodings can edit /prog_code/util/constants.py to set the following values:
Website / documentation licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.