Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to extract and transform uploaded Timelapse data #71

Open
rudokemper opened this issue Jan 30, 2025 · 3 comments
Open

Add script to extract and transform uploaded Timelapse data #71

rudokemper opened this issue Jan 30, 2025 · 3 comments
Assignees
Labels
feature New specs for new behavior

Comments

@rudokemper
Copy link
Member

Feature Request

Add a script that takes a path for Timelapse data (in ZIP file format), transforms the data to write to a database, and copies attachments to the datalake.

The script should have an optional "delete?" boolean, similar to the Locus Map script.

One key question is what kind of tabular data we want to ingest: a CSV export, or the raw contents of the Timelapse SQLite database(s).

Timelapse data structure

Note

Sample camera trap assets for a CMI partner (requires authorization to access) here: https://drive.google.com/drive/u/2/folders/1dWk2g3tK5tE5jM7hUfPuLBH7YHdtlHuy

In Timelapse, the .tdb database stores the project template and metadata, and the .ddb database stores the annotations as applied to camera trap media. They are both SQLite dbs.

Lastly, it is possible to export Timelapse annotation data as a CSV.

Let's dig into all three.

.tdb

Within the .tdb file, there are three tables: TemplateInfo, TemplateTable, and sqlite_sequence. Only the TemplateTable is of interest for ingestion; the other two contain non-relevant file metadata.

TemplateTable

The columns in the TemplateTable are standardized:

  • Id (int): primary key
  • ControlOrder (int): not 100% sure but I'm guessing the order in which the fields appear in Timelapse.
  • SpreadsheetOrder (int): not 100% sure but I'm guessing the order in which the fields appear in a CSV export.
  • Type (text): the type of template field e.g. File, RelativePath, DateTime, DeleteFlag, FixedChoice, Counter, Note, Flag
  • DefaultValue (text): default value for the field
  • Label (text): the user-facing label e.g. "Tënon_Soort" (type of animal in Wayana)
  • DataLabel (text): the system name for the field e.g. "Tenon_Soort"
  • Tooltip (text): user-facing tip about the data field e.g. "welke soort dier/tenon mekpale"
  • TXTBOXWIDTH: the width of the data field in Timelapse
  • Copyable (text): true/false value if the field can be copied or not.
  • Visible (text): true/false value if the field is hidden or not.
  • List (text): If it's a FixedChoice field, then the list of choices will be stored here in the following way:
    {"IncludeEmptyChoice":true,"ChoiceListNonEmpty": ["value1", "value2", ...]}
    
  • ExportToCSV (text): true/false value if the field will be included in a CSV export

.ddb

Within the .ddb file, there are five tables (I'm not sure if these are standard or specific to the CMI partner asset): DataTable, ImageSetTable, MarkersTable, TemplateTable, and sqlite_sequence.

TemplateTable

This table is actually a duplicate of the homonymous table in the .tdb! So we may not need to work with the .tdb database at all.

ImageSetTable

This table seems to store metadata about the root folder of the project, and the number of media files annotated:

  • Id (int): primary key
  • RootFolder (text): the name of the folder where the template and media files are stored. For example, this might be "Recordings per month"
  • Log (text): unsure, it's empty in the partner asset
  • Row (text): number value, but it doesn't seem to match the number of rows in DataTable. So not sure what this refers to.
  • VersionCompatibily (text) (not a typo): the version of Timelapse used, for example, "2.3.1.0"
  • SortTerms (text): an array composed of terms used... for sorting.
    [
     {
       "DataLabel": "RelativePath",
       "DisplayLabel": "RelativePath",
       "ControlType": "RelativePath",
       "IsAscending": "true"
     },
     {
       "DataLabel": "DateTime",
       "DisplayLabel": "DateTime",
       "ControlType": "DateTime",
       "IsAscending": "true"
     }
    ]
    
  • SearchTerms (text): object composed of a SearchTerms array specifying search capabilities for each term (I provide just one example below), and then a bunch of key/value pairs of which I'm not sure what they do within Timelapse, or if they are useful to store. @abfleishman might know.
    {
      "SearchTerms": [
        ...,
        {
          "ControlType": "Counter",
          "DatabaseValue": "0",
          "DataLabel": "Tapsik_aantal",
          "Label": "Tapsik_aantal",
          "List": [],
          "Operator": ">",
          "UseForSearching": false
        },
      ],
      "RandomSample": 0,
      "UseTimeInsteadOfDate": false,
      "TermCombiningOperator": 0,
      "ShowMissingDetections": false,
      "DetectionSelections": {
        "Enabled": false,
        "UseRecognition": false,
        "InterpretAllDetectionsAsEmpty": false,
        "AllDetections": false,
        "RankByConfidence": false,
        "RecognitionType": 2,
        "DetectionCategory": "1",
        "ClassificationCategory": "1",
        "ConfidenceThreshold1ForUI": 0.800000011920929,
        "ConfidenceThreshold2ForUI": 1.0,
        "ConfidenceThresholdForSelect": {
          "Item1": 0.800000011920929,
          "Item2": 1.0
        },
        "CurrentDetectionThreshold": 0.800000011920929,
        "CurrentClassificationThreshold": 0.75
      },
      "EpisodeShowAllIfAnyMatch": false,
      "EpisodeNoteField": "",
      "DuplicatesRelativePathAndFileTuple": null,
      "GetRelativePathFolder": "02-24"
    }
    

MarkersTable

Not sure if this table has any value for ingestion, or what it does for that matter. For the partner asset, there are two columns, and empty rows:

  • id (integer): primary key
  • Tapsik_aantal (text); this is for a Counter field type, so perhaps this table is added by default for counter fields, and the user did not leverage whatever functionality it offers in Timelapse.

DataTable

This table stores all of the annotations.

There is an id (integer) primary key column.

The rest of the columns are the values of the DataLabel column for each row in TemplateTable.

For example, the linked asset has the following rows. (The third column here represents the Timelapse field type in TemplateTable

Column Description Type
id (integer) primary key
File (text) filename File
RelativePath (text) the relative path within the project root directory RelativePath
DateTime (datetime) timestamp of the media file DateTime
DeleteFlag (text) true/false boolean DeleteFlag
Tenon_Soort (text) type of species FixedChoice
Tapsik_aantal (text) number of animals observed Counter
Takahe_eitop (text) unsure, but it's a text / note field Note
Ipokan_foto (text) Is this a "voucher" / top pick scene Flag
Enijika_tenei (text) unsure, but it's a text / note field Note

Reviewing the Timelapse Template Guide, the following columns are mandatory and the DataLabel cannot be changed:

  • File
  • RelativePath
  • DateTime
  • DeleteFlag

CSV export

In addition to accessing the raw .tdb and .ddb files, it is also possible to have the user export their data to a CSV file from within Timelapse. The menu option is called Export all data (folder data, image/ video data) to a CSV file.

This file yields only the annotations, and the root folder. In other words, the same data as the DataTable from the .ddb, supplemented by the RootFolder column from the ImageSetTable from the .ddb. For example (some names changed for privacy)...

RootFolder,File,RelativePath,DateTime,Tenon_Soort,Tapsik_aantal,Takahe_eitop,Ipokan_foto,Enikija_tenei,
Recordings per month,Cam1_1_9_2_2024_01am.mp4,02-24\My Location\Cam 1\Lokatie 1, 2024-02-13 08:45:28,Dog,1,,false,,
Recordings per month,Cam1_1_5_2_2024_03am.mp4,02-24\My Location\Cam 1\Lokatie 1, 2024-02-13 08:45:34,Cat,1,,false,,
Recordings per month,Cam1_1_7_2_2024_12am.mp4,02-24\My Location\Cam 1\Lokatie 1, 2024-02-13 08:45:42,Dog,1,,false,,
Recordings per month,Cam1_1_7_2_2024_12am1.mp4,02-24\My Location\Cam 1\Lokatie 1, 2024-02-13 08:45:48,Dog,1,,false,,
Recordings per month,Cam1_1_6_2_2024_10pm.mp4,02-24\My Location\Cam 1\Lokatie 1, 2024-02-13 08:45:56,Fox,1,,false,,
Recordings per month,Cam1_1_8_2_2024_12am1.mp4,02-24\My Location\Cam 1\Lokatie 1, 2024-02-13 08:46:10,Dog,1,,false,,

What kind of data do we want to store for the partner?

The CSV seems to really capture the bulk of what a user might want out of their Timelapse annotations. But having dug into the contents of the .ddb above, the key question is: is there anything in the Timelapse SQLite databases that we might want to store for the partner, that isn't already provided in the CSV export?

For example, I can imagine that the contents of ImageSetTable -> SearchTerms might be useful for analysis purposes. But maybe not. (Again, @abfleishman curious to your take on this.)

Furthermore, for reasons of provenance it might be good to store more of the Timelapse metadata as high-fidelity as possible. One really useful piece of metadata that is not included in this CSV is ImageSetTable -> VersionCompatibily (sp) which stores the actual version of Timelapse that was used for annotating. And perhaps even the field properties (like default values, or text box width?) might be useful someday... or they are just noise.

One non-trivial advantage of a CSV Export is that it is an easier thing to ask for. Many users are familiar with a workflow of exporting a file from a software, and then finding that file (e.g., on their Desktop or Downloads folder) to send or upload it somewhere. Getting the SQLite database files will require the partner to use Windows Explorer to find their Timelapse root folder. Or... do they need to do that anyway to get the images/videos?

@rudokemper rudokemper added the feature New specs for new behavior label Jan 30, 2025
@rudokemper rudokemper added this to the 11th Hour Project milestone Jan 30, 2025
@rudokemper rudokemper self-assigned this Jan 30, 2025
@rudokemper
Copy link
Member Author

rudokemper commented Jan 30, 2025

There is one additional type of metadata missing from what we can get out of Timelapse (at least, for this partner), which is metadata about the camera traps / locations themselves: e.g. a camera ID, GPS coordinates...

In the data above, the only place to glean this is from the File and RelativePath fields, where mention is made of a "Cam x" and "Lokatie x". That is, there are no dedicated fields that store the camera ID and location ID.

So as it stands, we will need to join this Timelapse data with an external dataset. I'm not sure how the partner is currently storing this information. I will ask. The partner indicates that they maintain a spreadsheet with village, camera numbers, location, and GPS coordinates.

While working with this partner, to streamline this process, @mmckown set up a KoboToolbox form with the following questions for them:

  1. Date
  2. GPS location
  3. Village
  4. Who deployed?
  5. Picture of camera site
  6. Site name
  7. Camera
  8. Notes

As part of a gc-scripts-hub camera trap flow, we could request this kind of form submissions data from the Kobo API, and join it with incoming Timelapse data.

Maybe we should come up with a generic template of this form, and recommend it as part of our GuardianConnector-supported workflow for working with camera trap data and Timelapse? (@abfleishman note that this mirrors what you were thinking about in the discussions about how to get bioacoustic sensor metadata from the client)

@abfleishman
Copy link
Member

For example, I can imagine that the contents of ImageSetTable -> SearchTerms might be useful for analysis purposes. But maybe not. (Again, @abfleishman curious to your take on this.)

I think this "SearchTerms" and the entire ImageSetTable are specific to TimeLapse and may not be useful for capture separately.

Furthermore, for reasons of provenance it might be good to store more of the Timelapse metadata as high-fidelity as possible. One really useful piece of metadata that is not included in this CSV is ImageSetTable -> VersionCompatibily (sp) which stores the actual version of Timelapse that was used for annotating. And perhaps even the field properties (like default values, or text box width?) might be useful someday... or they are just noise.

One thought is around why we are ingesting these data: to store long term, to allow analysis/visualization, etc. If one of the things we want to support is allowing folks to retrieve something from GC that would allow them to open a "project" in TimeLapse again, then it seems valuable to know what is needed to "import" a project into timelapse. (All the important info might be in the template file + the csv export. IDK.

Maybe we should come up with a generic template of this form, and recommend it as part of our GuardianConnector-supported workflow for working with camera trap data and Timelapse? (@abfleishman note that this mirrors what you were thinking about in the discussions about how to get bioacoustic sensor metadata from the client)

Yes that does seem interesting. I think there is probably other info that we want to capture beyond Matthews short form, but that is from a science standpoint. And I bet there is a bunch of info that community members might want that I would not think of.

@rudokemper
Copy link
Member Author

If one of the things we want to support is allowing folks to retrieve something from GC that would allow them to open a "project" in TimeLapse again, then it seems valuable to know what is needed to "import" a project into timelapse.

Good question. I think the answer is likely to be "yes", if we ask. The same question comes up about CoMapeo data all the time (where the answer is decisively "no").

Per page 91 of the Timelapse Reference Guide, it is possible to import a CSV, but only when already working with a .tdb template file which tells Timelapse what kind of data type each column represents.

So, if we decide to store the .tdb file for the purpose of allowing a user to re-setup their Timelapse project, then exporting the CSV file is superfluous: we can just ask the user to ZIP their entire root folder (which they need to access anyway, to get the images/videos? If I'm not mistaken) and extract the annotated data from the .ddb file instead of the CSV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New specs for new behavior
Projects
None yet
Development

No branches or pull requests

2 participants