Proposal: Web-Based Platform for Integrated Gene Set Analysis with DBRetina and Interactive Visualization
DBRetina is a high-performance bioinformatics command-line tool (CLI) designed to compute pairwise distances among large collections of gene sets, enabling the generation of comprehensive similarity networks. Its computational efficiency eliminates the need to rely on outdated gene sets, allowing users to seamlessly integrate and curate the latest versions of various molecular databases. In addition, DBRetina allows users to incorporate their own gene sets into the network, offering unprecedented resolution for enrichment analysis and unlocking new insights.
Challenge: As a CLI tool, users still have to download the gene sets from suitable molecular databases, transform into a Gene Set Database Format to be suitable for processing by DBRetina, and explore the outputs in suitable visulization tools. Therefor, we propose a unified web interface to streamline gene set analysis by integrating DBRetina’s computational power, biomedical database interoperability, and Cytoscape-based network visualization. This platform will democratize access to advanced genomic comparisons while ensuring scalability and user-friendliness.
- Curated Database Library:
- Provide a searchable list of pre-configured biomedical databases (e.g., KEGG pathways, DisGeNET, MSigDB, DrugBank).
- Supported data types: molecular pathways, variant sets, drug targets, expression signatures.
- Automated Format Conversion:
- Direct Download: If a database provides gene sets in
.gmt
,.gmx
, or.grp
formats, enable one-click downloads. - On-Demand Transformation: For non-standard formats (e.g., JSON, CSV), deploy serverless scripts (Python/Wasm) to convert data into Gene Set Database Formats.
- Direct Download: If a database provides gene sets in
- Drag-and-Drop Interface:
- Allow users to upload custom gene sets (max 10 GB) in
.gmt
,.gmx
, or.grp
formats.
- Allow users to upload custom gene sets (max 10 GB) in
- Real-Time Validation:
- Client-side validation using JavaScript to check file structure (e.g., correct delimiter usage, gene ID consistency).
- Immediate feedback with error highlighting (e.g., "Line 5: Invalid Ensembl ID
ENSG0000XYZ
").
- Local Processing Engine:
- Compile DBRetina to WebAssembly (Wasm) for client-side execution, avoiding server bottlenecks.
- Enable users to:
- Combine their uploaded gene sets with selected public databases.
- Run pairwise similarity calculations (Jaccard/Tanimoto indices) directly in the browser.
- Export results as a network edge list (CSV) or Neo4j-compatible files.
- Resource Management:
- Warn users about computational limits (e.g., "Comparing 50 gene sets may take 8 minutes").
- Cytoscape.js Integration:
- Render similarity networks using Cytoscape.js with plugins for:
- Layouts: Force-directed, hierarchical, or circular layouts.
- Enrichment Analysis: Highlight gene sets enriched for specific pathways (link to EnrichmentMap plugin).
- Custom Styling: Adjust node/edge colors/sizes based on similarity scores.
- Render similarity networks using Cytoscape.js with plugins for:
- Export Options:
- Download networks as SVG/PNG or export to Cytoscape Desktop for advanced analysis.
Component | Technology Stack |
---|---|
Frontend | React.js, Cytoscape.js, WebAssembly |
Database Integration | REST APIs (public databases), Node.js (ETL) |
File Validation | JavaScript/FileReader API, BioJS parsers |
DBRetina Execution | Emscripten-compiled Wasm, Web Workers |
- Accessibility: No CLI/Cypher expertise required.
- Reproducibility: Shareable workflows (JSON configuration files).
- Privacy: Sensitive user data remains client-side.
- Extensibility: Modular design for adding new databases/plugins.
A pharmacologist wants to compare their custom drug-target gene sets with KEGG pathways:
- Upload & Validate: Uploads
drug_targets.gmt
; the platform validates gene IDs. - Database Selection: Selects KEGG and Reactome.
- Run DBRetina: Computes pairwise similarities in-browser (5 mins).
- Visualize: Filters the network (similarity > 0.7) and identifies clusters linked to cancer pathways.
- Collaborative Features: Shared workspaces with real-time co-editing.
- Cloud Integration: Optional AWS/GCP backend for terabyte-scale datasets.
- AI-Powered Insights: LLM-driven hypothesis generation (e.g., "Cluster 3 suggests a link between IFN-γ and Drug X").
- DBRetina: https://dbretina.github.io/DBRetina/
- Cytoscape.js: https://js.cytoscape.org/
- WebAssembly: https://webassembly.org/
- Gene Set Formats: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
Tags: #Bioinformatics
#WebAssembly
#Cytoscape
#DBRetina