Skip to content

WeilabMSU/PMH_Bio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PMH_Bio

License: MIT

Title - Persistent Mayer Homology-Based Machine Learning Models for Protein-Ligand Binding Affinity Prediction.

Authors - Hongsong Feng, Li Shen, Jian Liu, and Guo-Wei Wei


Table of Contents


Introduction

Artificial intelligence-assisted drug design is revolutionizing the pharmaceutical industry. Effective molecular features are crucial for accurate machine learning predictions, and advanced mathematics plays a key role in designing these features. Persistent homology theory, which equips topological invariants with persistence, provides valuable insights into molecular structures. The calculation of Betti numbers is based on a differential that typically satisfies (d^2 = 0). Our recent work has extended this concept by employing Mayer homology with a generalized differential that satisfies (d^N = 0) for (N \geq 2), leading to the development of Persistent Mayer Homology (PMH) theory. This theory offers richer Betti number information across various scales. In this study, we utilize PMH to create a novel multiscale topological featurization approach for molecular representation. These PMH-based molecular features serve as valuable tools for descriptive and predictive analysis in molecular data and machine learning. By integrating these features with machine learning algorithms, we build highly accurate predictive models. Benchmark tests on established protein-ligand datasets, including PDBbind-2007, PDBbind-2013, and PDBbind-2016, demonstrate the superior performance of our models in predicting protein-ligand binding affinities.

Keywords: Persistent homology, Persistent Mayer homology, Protein-ligand binding affinity.


Model Architecture

A schematic illustration of the overall PMH-based knot data analysis (KDA) platform is shown below.

Model Architecture

Further details are provided in the paper, offering context and additional information about the architecture and its components.


Prerequisites

  • numpy 1.21.0
  • scipy 1.7.3
  • scikit-learn 1.0.2
  • python 3.10.12
  • biopandas 0.4.1
  • Biopython 1.75

Datasets

Datasets Total Training Set Test Set
PDBbind-v2007 1300 1105 Label 195 Label
PDBbind-v2013 2959 2764 Label 195 Label
PDBbind-v2016 4057 3767 Label 290 Label
  • PDBbind Raw Data: Protein-ligand complex structures. Download from the PDBbind database.
  • Label: The .csv file containing the protein ID and corresponding binding affinity for PDBbind data.

Modeling with PMH-Based Features

Datasets Training Set Test Set PCC RMSE (kcal/mol)
PDBbind-v2007 result 1105 195 0.824 1.95
PDBbind-v2013 result 2764 195 0.787 2.036
PDBbind-v2016 result 3767 290 0.834 1.755

Note: Twenty gradient boosting regressor tree (GBRT) models were built for each dataset with distinct random seeds to address initialization-related errors. The PMH-based features were paired with GBRT. The predictions can be found in the Results folder. Transformer-based sequence features were also generated and paired with GBRT to build machine learning models. All predictions can be found in the Results folder.


Generation of PMH-Based Features for Protein-Ligand Complex

# Example: Generating the PMH features for PDB 2p7z. The PDB file is located in PDB/2p7z folder and the generated features are saved in features/2p7z
python codes/PMH.py

License

This project is licensed under the MIT License - see the LICENSE file for details.


Citation

  • Hongsong Feng, Li Shen, Jian Liu, and Guo-Wei Wei, "Persistent Mayer homology-based machine learning models for protein-ligand binding affinity prediction"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages