Skip to content
/ TICTAC Public
forked from unmtransinfo/TICTAC

Target illumination clinical trials analytics with cheminformatics

Notifications You must be signed in to change notification settings

xinl60/TICTAC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TICTAC - Target illumination clinical trials analytics with cheminformatics

Mining ClinicalTrials.gov via AACT-CTTI-db for target hypotheses, with strong cheminformatics and medical terms text mining, powered by NextMove LeadMine and JensenLab Tagger.

Dependencies

About AACT:

  • AACT-CTTI database from Duke.
  • According to website (accessed June 2022), data is refreshed daily.
  • AACT structure changed in November 2021, reflecting newer ClinicalTrials.gov API.
  • Identify drugs by intervention ID, since may be multiple drugs per trial (NCT_ID).

References:

Text mining, aka Named Entity Recognition (NER)

Purpose:

  • Associate drugs with diseases/phenotypes.
  • Associate drugs with protein targets.
  • Associate protein targets with diseases/phenotypes (via drugs).
  • Predict and score disease-target associations.

Drugs may be experimental candidates.

AACT tables of interest:

Table Notes
studies titles
keywords Reported; multiple vocabularies.
brief_summaries (max 5000 chars)
detailed_descriptions (max 32000 chars)
conditions diseases/phenotypes
browse_conditions MeSH links
interventions Our focus is drugs only among several types.
browse_interventions MeSH links
intervention_other_names synonyms
study_references PubMed links
reported_events including adverse events

Overall workflow:

See top level script Go_tictac_Workflow.sh.

  1. Data:
  2. Go_aact_GetData.sh - Fetch data from AACT db.
  3. Go_pubmed_GetData.sh - Fetch PubMed data from IDG TCRD.
  4. Go_jensenlab_GetData.sh - Fetch dictionary data from JensenLab.
  5. LeadMine:
  6. Go_aact_NER_leadmine_chem.sh - LeadMine NER, CT descriptions.
  7. Go_pubmed_NER_leadmine_chem.sh - LeadMine NER, referenced PMIDs.
  8. Tagger:
  9. Go_aact_NER_tagger_disease.sh - Tagger NER, CT descriptions.
  10. Cross-references:
  11. Go_pubchem_GetXrefs.sh - PubChem IDs via APIs.
  12. Go_chembl_GetXrefs.sh - ChEMBL IDs via APIs.
  13. Results, analysis:
  14. tictac.Rmd - Results described and analyzed.

Association semantics:

  • keywords, conditions, studies and summaries: reported terms and free text which may be text mined for intended associations.
  • descriptions: may be text mined for both the intended and other conditions, symptoms and phenotypic traits, which may be non-obvious from the study design.
  • study_references: via PubMed, text mining of titles, abstracts can associate disease/phenotypes, protein targets, chemical entities and more. The "results_reference" type may include findings not anticipated in the design/protocol.
  • interventions include drug names which can be recognized and mapped to standard IDs, a task for which NextMove LeadMine is particularly suited.
  • LeadMine chemical NER also resolves entities to structures via SMILES, enabling downstream cheminformatics such as aggregation by chemical substructure and similarity.

NextMove

Running NextMove Leadmine NER via nextmove-tools.

$ java -jar ${LIBDIR}/unm_biocomp_nextmove-0.0.1-SNAPSHOT-jar-with-dependencies.jar
usage: LeadMine_Utils [-config <CFILE>] [-h] -i <IFILE> [-idcol <IDCOL>]
       [-lbd <LBD>] [-max_corr_dist <MAX_CORR_DIST>] [-min_corr_entity_len
       <MIN_CE_LEN>] [-min_entity_len <MIN_E_LEN>] [-o <OFILE>]
       [-spellcorrect] [-textcol <TEXTCOL>] [-unquote] [-v]
LeadMine_Utils: NextMove LeadMine chemical entity recognition
 -config <CFILE>                     Input configuration file
 -h,--help                           Show this help.
 -i <IFILE>                          Input file
 -idcol <IDCOL>                      # of ID input column
 -lbd <LBD>                          LeadMine look-behind depth
 -max_corr_dist <MAX_CORR_DIST>      LeadMine Max correction (Levenshtein)
                                     distance
 -min_corr_entity_len <MIN_CE_LEN>   LeadMine Min corrected entity length
 -min_entity_len <MIN_E_LEN>         LeadMine Min entity length
 -o <OFILE>                          Output file
 -spellcorrect                       LeadMine spelling correction
 -textcol <TEXTCOL>                  # of text/document input column
 -unquote                            unquote quoted column
 -v,--verbose                        Verbose.

About

Target illumination clinical trials analytics with cheminformatics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 64.0%
  • R 24.3%
  • Python 11.7%