Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-writing export_features_CTAP to more complete database structure? #7

Open
janbrogger opened this issue Oct 2, 2017 · 1 comment

Comments

@janbrogger
Copy link
Contributor

I am wondering whether it might be helpful to rewrite export_features_CTAP. There is no table for segments or patients or studies, which means that there is a lot of duplication and the results table becomes one "megatable". What do you think?

Feature request to export_features_CTAP:

  1. Add table creation statements for new tables: patients, studies, segments, measurements to Sqlite database
  2. For each exported feature/value
    2a. Let the export code for each feature/value check whether there already is a patient, study or segment that corresponds to this patient, study or segment. Store patientId, studyId, segmentId for use in the next step.
    2b. Create a new measurement in the measurement table which had keys to patientId, studyId, segmentId.
@jutako
Copy link
Collaborator

jutako commented Oct 12, 2017

The current implementation of database export has identical tables in ctap/master and ctap/dev. The ctap/dev branch produces a separate database file for each feature group, whereas ctap/master produces just a single file.

The structure is:
sqlite> .schema
CREATE TABLE subject (subjectnr INTEGER PRIMARY KEY, subjectstr TEXT, sex TEXT, age REAL);
CREATE TABLE measurement (measurement TEXT PRIMARY KEY, session TEXT, description TEXT);
CREATE TABLE results (id INTEGER PRIMARY KEY AUTOINCREMENT, subjectnr INTEGER, measurement TEXT, channel TEXT, variable TEXT, value REAL, timestamp TEXT, duration REAL, latency REAL);

which already contains fields for subjects, measurements and results (feature values). Possible linking of these tables is left to the user.

To me the main issues are related to 'results' table:

  1. results table does not contain unique measurement id for each row, e.g., casename. "session" information is missing.
  2. the calculation segment is rather verbosely documented ("timestamp", "duration", "latency"): it might be possible to just have an id and a separate table for the segments. Each unique measurement needs its own segments, so linking needs both segment_id and casename.
  3. the results table is in long format which is convenient to work with (e.g., in R) but wastes storage space

Issue 1. needs to be resolved but the other two seem more like fine tuning to me.

Can anyone give examples about how large sqlite files are produced with the current setup?

Note also that base Matlab lacks tools to work with databases. To make fancy things we'd need decent tools to work with databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants