English | 简体ä¸ć–‡
This package provides an implementation of the inference pipeline of tFold, including tFold-Ab, tFold-Ag and tFold-TCR.
We also provide:
- An pre-trained language named ESM-PPI, works to extract both the intra-chain and inter-chain information of the protein complex to generate features for the down-streaming task.
- The test set we construct in our paper.
- A human germline antibody frameworks library to guide antibody generation using tFold-Ag.
Any publication that discloses findings arising from using this source code or the model parameters should cite the tFold paper.
Please also refer to the Supplementary Information for a detailed description of the method.
If you have any questions, please contact the tFold team at [email protected].
For business partnership opportunities, please contact [email protected].
Shorthand | Dataset | Description |
---|---|---|
ESM-PPI | UniRef50, PDB, PPI, Antibody | General-purpose protein language model, further pre-trained using ESM2 with 650M parameters. Can be used to predict multimer structure directly from individual sequences |
ESM-PPI-tcr | UniRef50, PDB, PPI, Antibody, TCR, peptide | General-purpose protein language model, further pre-trained using ESM2 with 650M parameters. Can be used to predict multimer structure directly from individual sequences |
tFold-Ab | SAbDab (before 31 December 2021) | SOTA antibody structure prediction model. MSA-free prediction with ESM-PPI |
tFold-Ag | SAbDab (before 31 December 2021) | SOTA antibody-antigen complex structure prediction model. Can be used for virtual screening of binding antibodies and antibody design |
tFold-TCR | STCRDab (before 31 December 2021) | SOTA TCR-complex structure prediction model. MSA-free prediction with ESM-PPI. Can be used for TCR design |
Model | RMSD-CDR-H3 | DockQ |
---|---|---|
AlphaFold-Multimer | 3.07 | 0.773 |
Chai-1 | 3.25 | 0.772 |
IgFold | 3.37 | 0.715 |
DeepAb | 3.73 | 0.721 |
ImmuneBuilder | 3.46 | 0.749 |
tFold-Ab | 3.01 | 0.770 |
Model | RMSD-CDR-H3 |
---|---|
AlphaFold | 3.96 |
Chai-1 | 3.57 |
IgFold | 4.64 |
ImmuneBuilder | 3.79 |
ESMFold | 3.80 |
OmegaFold | 3.63 |
tFold-Ab | 3.57 |
Model | DockQ | Success Rate |
---|---|---|
AlphaFold-Multimer | 0.158 | 18.2 |
AlphaFold-3 | 0.257 | 32.3 |
tFold-Ag | 0.217 | 28.3 |
Model | RMSD-CDR-A3 | RMSD-CDR-B3 | DockQ |
---|---|---|---|
AlphaFold-Multimer | 1.89 | 1.62 | 0.785 |
AlphaFold-3 | 1.80 | 1.50 | 0.769 |
TCRModel2 | 1.77 | 1.52 | 0.795 |
tFold-TCR | 1.66 | 1.35 | 0.795 |
Model | DockQ |
---|---|
AlphaFold-Multimer | 0.927 |
AlphaFold-3 | 0.926 |
tFold-TCR | 0.908 |
Model | DockQ | RMSD | Success Rate |
---|---|---|---|
AlphaFold-Multimer | 0.490 | 3.601 | 83.3 |
AlphaFold-3 | 0.496 | 3.094 | 72.2 |
tFold-TCR | 0.496 | 2.413 | 94.4 |
- Clone the package
git clone https://github.com/TencentAI4S/tfold.git
cd tfold
- Prepare the environment
- Please follow the instructions in INSTALL.md to set up the environment
- Download pre-trained weights under params directory (Optional)
Note:
If you download the weights in the folder ./checkpoints
, you can proceed directly with the following steps.
If you don't download the weights, the weights will be downloaded automatically when you run the code. 4. Download sequence databases for mas searching (only needed for tFold-Ag)
sh scripts/setup_database.sh
- Test set we construct in our paper
- Human germline antibody frameworks library to guide antibody generation
Our repository supports two methods of use, direct use or pip installation.
You can use a fasta file (--fasta) or a json file (--json) as input.
# antibody
python projects/tfold_ab/predict.py --fasta examples/fasta.files/7ox3_A_B.fasta --output examples/predictions/7ox3_A_B.pdb
# nanobody
python projects/tfold_ab/predict.py --fasta examples/fasta.files/7ocj_B.fasta --output examples/predictions/7ocj_B.pdb
Example 1: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with pre-computed MSA
# antibody-antigen complex
python projects/tfold_ag/predict.py --fasta examples/fasta.files/8df5_A_B_R.fasta --msa examples/msa.files/8df5_R.a3m --output examples/predictions/8df5_A_B_R.pdb
# nanobody-antigen complex
python projects/tfold_ag/predict.py --fasta examples/fasta.files/7sai_C_NA_A.fasta --msa examples/msa.files/7sai_A.a3m --output examples/predictions/7sai_C_NA_A.pdb
python projects/tfold_ag/gen_msa.py --fasta_file=examples/fasta.files/PD-1.fasta --output_dir=examples/PD-1
Example 3: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with inter-chain features
# generate inter-chain feature (ppi)
python projects/tfold_ag/gen_icf_feat.py --pid_fpath=examples/fasta.files/8df5_A_B_R.fasta --fas_dpath=examples/fasta.files/ --pdb_dpath=examples/pdb.files.native/ --icf_dpath=examples/icf.files.ppi --icf_type=ppi
# antibody-antigen complex prediction with inter-chain feature
python projects/tfold_ag/predict.py --fasta examples/fasta.files/8df5_A_B_R.fasta --msa examples/msa.files/8df5_R.a3m --icf examples/icf.files.ppi/8df5_A_B_R.pt --output examples/predictions/8df5_A_B_R.pdb --model_version ppi
python projects/tfold_ag/predict.py --fasta examples/fasta.files/7urf_O_P_A.cdrh3.fasta --msa examples/msa.files/7urf_A.a3m --output examples/predictions/7urf_O_P_A.pdb
# TCR
python projects/tfold_tcr/predict.py --json examples/tcr_example.json --output examples/predictions/ --model_version TCR
# pMHC complex
python projects/tfold_tcr/predict.py --json examples/pmhc_example.json --output examples/predictions/ --model_version pMHC
# Complex
python projects/tfold_tcr/predict.py --json examples/tcr_pmhc_example.json --output examples/predictions/ --model_version Complex
Direct installation from pypi:
pip install tfold
or install from source code:
cd tfold
pip install .
After pip install, you can load and use a pretrained model as follows:
import torch
import tfold
# Download the pre-trained model
model_path = tfold.model.esm_ppi_650m_ab()
# Load the model
model = tfold.model.PPIModel.restore(model_path)
# Prepare antibody sequences (can be single or multiple sequences)
data = [
'QVQLVQSGAEVKKPGASVKVSCKASGYPFTSYGISWVRQAPGQGLEWMGWISTYNGNTNYAQKFQGRVTMTTDTSTTTGYMELRRLRSDDTAVYYCARDYTRGAWFGESLIGGFDNWGQGTLVTVSS', # Heavy chain
'EIVLTQSPGTLSLSPGERATLSCRASQTVSSTSLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQHDTSLTFGGGTKVEIK' # Light chain
]
ppi_output = model(data)
The output keys are (['labl', 'mask', 'pred', 'sfea', 'pfea'])
.
Each key in the outputs dictionary represents a different component of the model's output:
- 'labl': Original token indices from tokn_mat_orig that represent the unmasked (original) amino acid sequences. These serve as ground truth labels for training.
- 'mask': Binary mask tensor indicating which positions were masked during inference. It's used to identify which positions should be predicted in the masked language modeling task.
- 'pred': The model's logits (raw prediction scores before softmax) for each position in the sequence. These are the actual predictions made by the model.
- 'sfea': Single-residue features/embeddings extracted from the final layer representations. These are residue-level embeddings with dimension self.c_s for each amino acid position.
- 'pfea': Pair features representing interactions between residues, derived from attention weights. These capture the relationships between each pair of residues in the sequence with dimension self.c_z.
import torch
import tfold
# Download the pre-trained model
ppi_model_path = tfold.model.esm_ppi_650m_ab()
tfold_model_path = tfold.model.tfold_ab_trunk()
# Load the model
model = tfold.deploy.PLMComplexPredictor.restore_from_module(ppi_model_path, tfold_model_path)
# Prepare antibody sequences (can be single or multiple sequences)
data =[
{
"sequence": 'QVQLVQSGAEVKKPGASVKVSCKASGYPFTSYGISWVRQAPGQGLEWMGWISTYNGNTNYAQKFQGRVTMTTDTSTTTGYMELRRLRSDDTAVYYCARDYTRGAWFGESLIGGFDNWGQGTLVTVSS', # Heavy chain
"id": 'H'
},
{
"sequence": 'EIVLTQSPGTLSLSPGERATLSCRASQTVSSTSLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQHDTSLTFGGGTKVEIK', # Light chain
"id": 'L'
}]
output_path = '8df5_A_B_R.pdb'
model.infer_pdb(data, output_path)
import torch
import tfold
# Download the pre-trained model of ESM-PPI
ppi_model_path = tfold.model.esm_ppi_650m_ab()
# Download the pre-trained model of alphaFold
alphafold_path = tfold.model.alpha_fold_4_ptm()
# Download base model for tFold-Ag
tfold_model_path = tfold.model.tfold_ag_base()
# Download the ppi model for tFold-Ag
# tfold_model_path = tfold.model.tfold_ag_ppi()
# Load the model
model = tfold.deploy.AgPredictor(ppi_model_path, alphafold_path, tfold_model_path)
# Prepare antibody-antigen sequences
msa_path = 'examples/msa.files/8df5_R.a3m'
with open(msa_path) as f:
msa, deletion_matrix = tfold.protein.parser.parse_a3m(f.read())
# if you don't have msa, you can use the following code to generate msa
#from projects.tfold_ag.gen_msa import generate_msa
#with open('8df5_R.fasta', 'w') as f:
# f.write('>8df5_R\nMGILPSPGMPALLSLVSLLSVLLMGCVAETGTRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELLHAPATVCGPKKSTHHHHHHHHGGSSGLNDIFEAQKIEWHE')
#generate_msa('8df5_R.fasta', output_dir='examples/msa.files/')
#with open('examples/msa.files/8df5_R.a3m') as f:
# msa, deletion_matrix = tfold.protein.parser.parse_a3m(f.read())
data = [
{
"id": "H",
"sequence": "QVQLVQSGAEVKKPGASVKVSCKASGYPFTSYGISWVRQAPGQGLEWMGWISTYNGNTNYAQKFQGRVTMTTDTSTTTGYMELRRLRSDDTAVYYCARDYTRGAWFGESLIGGFDNWGQGTLVTVSS"
},
{
"id": "L",
"sequence": "EIVLTQSPGTLSLSPGERATLSCRASQTVSSTSLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQHDTSLTFGGGTKVEIK"
},
{
"id": "A",
"sequence": "MGILPSPGMPALLSLVSLLSVLLMGCVAETGTRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELLHAPATVCGPKKSTHHHHHHHHGGSSGLNDIFEAQKIEWHE",
"msa": msa,
"deletion_matrix": deletion_matrix
}
]
output_path = '8df5_A_B_R.pdb'
model.infer_pdb(data, output_path)
import torch
import tfold
# Download the pre-trained model
ppi_model_path = tfold.model.esm_ppi_650m_tcr()
tfold_model_path = tfold.model.tfold_tcr_trunk()
# Load the model
model = tfold.deploy.TCRPredictor.restore_from_module(ppi_model_path, tfold_model_path)
# Prepare TCR sequences
data =[
{
"id": "B",
"sequence": "NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL"
},
{
"id": "A",
"sequence": "AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS"
}
]
output_path = '6zkw_E_D_A_B_C.pdb'
model.infer_pdb(data, output_path)
import torch
import tfold
# Download the pre-trained model
ppi_model_path = tfold.model.esm_ppi_650m_tcr()
tfold_model_path = tfold.model.tfold_tcr_pmhc_trunk()
# Load the model
model = tfold.deploy.TCRpMHCPredictor(ppi_model_path, tfold_model_path)
# Prepare TCR-pMHC sequences
data =[
{
"id": "B",
"sequence": "NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL"
},
{
"id": "A",
"sequence": "AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS"
},
{
"id": "M",
"sequence": "GSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQFAYDGKDYLTLNEDLRSWTAVDTAAQISEQKSNDASEAEHQRAYLEDTCVEWLHKYLEKGKETLLHLEPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQQDGEGHTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPEPVTLRWKP"
},
{
"id": "N",
"sequence": "MIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDEYACRVNHVTLSQPKIVKWDRDM"
},
{
"id": "P",
"sequence": "RLPAKAPLL"
}
]
output_path = '6zkw_E_D_A_B_C.pdb'
model.infer_pdb(data, output_path)
If you use tfold in your research, please cite our paper
@article{wu2024fast,
title={Fast and accurate modeling and design of antibody-antigen complex using tFold},
author={Wu, Fandi and Zhao, Yu and Wu, Jiaxiang and Jiang, Biaobin and He, Bing and Huang, Longkai and Qin, Chenchen and Yang, Fan and Huang, Ningqiao and Xiao, Yang and others},
journal={bioRxiv},
pages={2024--02},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
and old version of tFold-Ab
@article{wu2022tfold,
title={tFold-ab: fast and accurate antibody structure prediction without sequence homologs},
author={Wu, Jiaxiang and Wu, Fandi and Jiang, Biaobin and Liu, Wei and Zhao, Peilin},
journal={bioRxiv},
pages={2022--11},
year={2022},
publisher={Cold Spring Harbor Laboratory}
}
Our new pre-print paper on tFold-TCR will be coming soon