This project performs a molecular docking simulation between the NPM-ALK fusion protein and the novel dual-site inhibitor, DualStrike, using AlphaFold 3 for protein structure prediction and SwissDock for docking. It is for Siraj Raval's video on Youtube, and HPC-AI compute cloud was used for affordability.
- Prerequisites
- Setup Instructions
- Running the Script
- Detailed Steps
- Understanding the Outputs
- Troubleshooting
- References
Before you begin, ensure you have the following installed:
- Python 3.7 or higher
- pip (Python package installer)
- The following Python packages:
- Biopython
- RDKit
- NumPy
- pandas
- matplotlib
- requests
Note: RDKit installation can be non-trivial on some systems. Follow the official installation guide: RDKit Install
-
Clone or Download the Repository:
git clone https://github.com/yourusername/dualstrike_docking.git cd dualstrike_docking
-
Create a Virtual Environment (Optional but Recommended):
python3 -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
- Install the Required Packages:
Note: If you encounter issues installing RDKit via pip, refer to the official installation guide.
pip install biopython rdkit pandas numpy matplotlib requests
chmod +x dualstrike_docking.py
# Standard execution
python dualstrike_docking.py
# If made executable
./dualstrike_docking.py
- Script automatically writes the sequence to
npm_alk_fusion.fasta
- No manual action required
Note: Manual submission required due to lack of public API
- Visit AlphaFold Protein Server website
- Upload generated
npm_alk_fusion.fasta
- Submit and await completion
- Download predicted structure (CIF format)
- Save as
npm_alk_fusion.cif
in script directory
- Script converts CIF to PDB format
- Ensure
npm_alk_fusion.cif
is in script directory - Generates
npm_alk_fusion.pdb
- Uses RDKit for 3D structure generation
- Creates
dualstrike.sdf
from SMILES notation
Note: Manual submission required
- Visit SwissDock Web Interface
- Upload:
- Receptor:
npm_alk_fusion.pdb
- Ligand:
dualstrike.sdf
- Receptor:
- Configure parameters (defaults acceptable)
- Submit and note Job ID
- Wait for completion
- Download results archive from SwissDock
- Extract to folder (e.g.,
SwissDock_results
) - Provide results folder path when prompted
- Script generates statistics and visualizations
- Provides descriptive statistics:
- Mean
- Standard deviation
- Minimum/Maximum values
- Histogram of binding energy distribution
npm_alk_fusion.fasta
: Fusion protein sequencenpm_alk_fusion.pdb
: Protein structuredualstrike.sdf
: Inhibitor structure- Results folder: Contains all SwissDock output files
# Alternative installation via conda
conda create -c conda-forge -n my-rdkit-env rdkit
conda activate my-rdkit-env
-
Missing Files/Incorrect Paths
- Verify file names and locations
- Check paths when prompted
-
Script Errors
- Check for missing dependencies
- Verify file formats
-
Docking Issues
- Review SwissDock error messages
- Verify file formatting
- AlphaFold Protein Structure Database
- SwissDock Docking Server
- RDKit Documentation
- Biopython Documentation
- Matplotlib Documentation
- Author: Siraj Raval, o1-preview, Claude-3.5-Sonnet
- Email: [email protected]
This script and associated instructions are provided for educational and research purposes. The DualStrike inhibitor is a hypothetical molecule used for demonstration. Always ensure compliance with all relevant laws, regulations, and ethical guidelines when conducting research.
MIT LICENSE