Fork me on GitHub

CDIbase Docs

Open source CDI management software for language acquisition research.

Technical stuff behind CDIBase

This documentation provides detailed guides for installing up a new copy of CDIbase, modifying the code behind the software, or doing something else rather technical. That being said, most researchers can simply use the less technical tutorials to accomplish their work.

CDIbase, Python software written with Flask and (by default) SQLite3, can go into almost any modern web application environment. The highly configurable solution also follows strict testing and documentation standards to help those that need to add new code. This project is currently in beta.

Environment setup and regulation (getting started)

CDIbase runs almost anywhere Python web applications go. However, deploying a web application requires some technical sophistication. Not comfortable setting up CDIbase yourself? Contact us and we may be able to help!

Anyway, the hardest parts of getting CDIbase setup are often regulatory and not technical. So, please check security, privacy, ethical, and related standards for your appropriate regulatory bodies as CDIbase cannot guarantee compliance with IRB and other institutional standards given the specifics of individual studies. Still, this guide can offer best practices and initial guidance given lessons from past installations of CDIbase.

Data migration

CDIbase uses a SQLite database by default though labs can use any DB API 2.0 compliant adapter like psycopg for PostgreSQL.

Need to migrate a small to medium sized dataset?
Just import your old dataset as CSV files!

Need to migrate lots of data?
The guide below is for you. Data migration will be easy for most labs but large datasets may involve some computer / data science, programming, database administration, and time on the terminal. Don't have access to that expertise? Contact us and we might be able to help.

Database schema / structure

Complete information about database tables and relationships. Note that data migration only needs to manually fill the snapshot_content and snapshots tables.

CDI formats

Users can define CDI formats through YAML (intro to YAML) specification files (example simple CDI specification, example of complex / multilingual CDI specification). Its components:

Presentation formats

Presentation formats provide alternate encoding schemes. Does some lab software use "boy" while the rest uses "male" for gender? CDIbase can not only store multiple types of CDIs across many studies but its presentation formats switch between encodings, allowing individual researchers to choose between values like "true" or "1" without duplicating data.

Examples of presentation format YAML files:

Meaning of common fields:

A presentation format then has mappings where keys (attributes) are word values from the CDI format to convert and the corresponding values indicate what should be reported in downloads for those word values.

Meaning of uncommon fields (mostly used for legacy data and values from before migration).

Percentile tables

Percentiles tables allow CDIbase to automatically calculate percentiles of participants given the number of words said / signed / known at the the time of a CDI. Typically labs will have one percentile per gender per CDI type. The first row has participant ages in months. The first column has percentiles. The remaining cells show the number of words known by the prototypical kid at that age / percentile. The 0,0 cell is ignored (filled with a single % sign by convention). Please see an example of a percentile table. If your lab hopes to use something other than percentile tables, please contact us for help or with much welcome patches.

Using the API

The API is limited to sending CDIs to parents by email. However, we are more than happy to add additional API endpoints as needed. Just contact us and let us know what you want to see!

Send single CDI form to parent by email

POST /api/v0/send_parent_form?api_key={{key}}

All of the following additional query parameters are required:

At least one of the two is required:

Any of the following can be optionally specified:

If any of these fields are not provided, missing values will be loaded from the most recent MCDI available for the child or, if there are no prior MCDIs available within the DB, the parent will be asked for this information within the form GUI.


Send a CDI form to many parents by email

POST /api/v0/send_parent_form?api_key={{key}}

All of the following additional query parameters are required:

At least one of the two is required:

Any of the following can be optionally specified:

If any of these fields are not provided, missing values will be loaded from the most recent MCDI available for the child or, if there are no prior MCDIs available within the DB, the parent will be asked for this information within the form GUI.

Local development / patching it yourself

Great! First, we suggest familiarity with the following:

See the project repository for more details on how to get started. Don't forget about our issue tracker. Pull requests welcome!

Understanding encoding values

Most labs can use the default settings for CDIbase and will not require non-standard encoding values because presentation formats can provide alternate encodings to support individual researcher preferences and legacy software. For example, a researcher's software relying on "t" for true and "f" for false can use presentation formats providing "t" for true and "f" for false instead of the CDIbase's defaults of "0" and "1". However, some labs may need to change the encoding values stored to the underlying database. Reasonable use cases include

Those that do need non-standard encodings can edit /prog_code/util/constants.py to set the following values:

Creative Commons License Website / documentation licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.