PMH_Bio

Title - Persistent Mayer Homology-Based Machine Learning Models for Protein-Ligand Binding Affinity Prediction.

Authors - Hongsong Feng, Li Shen, Jian Liu, and Guo-Wei Wei

Introduction

Artificial intelligence-assisted drug design is revolutionizing the pharmaceutical industry. Effective molecular features are crucial for accurate machine learning predictions, and advanced mathematics plays a key role in designing these features. Persistent homology theory, which equips topological invariants with persistence, provides valuable insights into molecular structures. The calculation of Betti numbers is based on a differential that typically satisfies (d^2 = 0). Our recent work has extended this concept by employing Mayer homology with a generalized differential that satisfies (d^N = 0) for (N \geq 2), leading to the development of Persistent Mayer Homology (PMH) theory. This theory offers richer Betti number information across various scales. In this study, we utilize PMH to create a novel multiscale topological featurization approach for molecular representation. These PMH-based molecular features serve as valuable tools for descriptive and predictive analysis in molecular data and machine learning. By integrating these features with machine learning algorithms, we build highly accurate predictive models. Benchmark tests on established protein-ligand datasets, including PDBbind-2007, PDBbind-2013, and PDBbind-2016, demonstrate the superior performance of our models in predicting protein-ligand binding affinities.

Keywords: Persistent homology, Persistent Mayer homology, Protein-ligand binding affinity.

Model Architecture

A schematic illustration of the overall PMH-based knot data analysis (KDA) platform is shown below.

Further details are provided in the paper, offering context and additional information about the architecture and its components.

Prerequisites

numpy 1.21.0
scipy 1.7.3
scikit-learn 1.0.2
python 3.10.12
biopandas 0.4.1
Biopython 1.75

Datasets

Datasets	Total	Training Set	Test Set
PDBbind-v2007	1300	1105 Label	195 Label
PDBbind-v2013	2959	2764 Label	195 Label
PDBbind-v2016	4057	3767 Label	290 Label

PDBbind Raw Data: Protein-ligand complex structures. Download from the PDBbind database.
Label: The .csv file containing the protein ID and corresponding binding affinity for PDBbind data.

Modeling with PMH-Based Features

Datasets	Training Set	Test Set	PCC	RMSE (kcal/mol)
PDBbind-v2007 result	1105	195	0.824	1.95
PDBbind-v2013 result	2764	195	0.787	2.036
PDBbind-v2016 result	3767	290	0.834	1.755

Note: Twenty gradient boosting regressor tree (GBRT) models were built for each dataset with distinct random seeds to address initialization-related errors. The PMH-based features were paired with GBRT. The predictions can be found in the Results folder. Transformer-based sequence features were also generated and paired with GBRT to build machine learning models. All predictions can be found in the Results folder.

Generation of PMH-Based Features for Protein-Ligand Complex

# Example: Generating the PMH features for PDB 2p7z. The PDB file is located in PDB/2p7z folder and the generated features are saved in features/2p7z
python codes/PMH.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

Hongsong Feng, Li Shen, Jian Liu, and Guo-Wei Wei, "Persistent Mayer homology-based machine learning models for protein-ligand binding affinity prediction"

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
PDB/2p7z		PDB/2p7z
codes		codes
figures		figures
results		results
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PMH_Bio

Table of Contents

Introduction

Model Architecture

Prerequisites

Datasets

Modeling with PMH-Based Features

Generation of PMH-Based Features for Protein-Ligand Complex

License

Citation

About

Releases

Packages

Languages

WeilabMSU/PMH_Bio

Folders and files

Latest commit

History

Repository files navigation

PMH_Bio

Table of Contents

Introduction

Model Architecture

Prerequisites

Datasets

Modeling with PMH-Based Features

Generation of PMH-Based Features for Protein-Ligand Complex

License

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages