GitHub - TencentAI4S/tfold: open source code for Tencent tFold

This package provides an implementation of the inference pipeline of tFold, including tFold-Ab, tFold-Ag and tFold-TCR.

We also provide:

An pre-trained language named ESM-PPI, works to extract both the intra-chain and inter-chain information of the protein complex to generate features for the down-streaming task.
The test set we construct in our paper.
A human germline antibody frameworks library to guide antibody generation using tFold-Ag.

Any publication that discloses findings arising from using this source code or the model parameters should cite the tFold paper.

Please also refer to the Supplementary Information for a detailed description of the method.

If you have any questions, please contact the tFold team at [email protected].

For business partnership opportunities, please contact [email protected].

Main models

Shorthand	Dataset	Description
ESM-PPI	UniRef50, PDB, PPI, Antibody	General-purpose protein language model, further pre-trained using ESM2 with 650M parameters. Can be used to predict multimer structure directly from individual sequences
ESM-PPI-tcr	UniRef50, PDB, PPI, Antibody, TCR, peptide	General-purpose protein language model, further pre-trained using ESM2 with 650M parameters. Can be used to predict multimer structure directly from individual sequences
tFold-Ab	SAbDab (before 31 December 2021)	SOTA antibody structure prediction model. MSA-free prediction with ESM-PPI
tFold-Ag	SAbDab (before 31 December 2021)	SOTA antibody-antigen complex structure prediction model. Can be used for virtual screening of binding antibodies and antibody design
tFold-TCR	STCRDab (before 31 December 2021)	SOTA TCR-complex structure prediction model. MSA-free prediction with ESM-PPI. Can be used for TCR design

OverView

Main Results

Unbound Antibody Prediction (SAbDab-22H2-Ab)

Model	RMSD-CDR-H3	DockQ
AlphaFold-Multimer	3.07	0.773
Chai-1	3.25	0.772
IgFold	3.37	0.715
DeepAb	3.73	0.721
ImmuneBuilder	3.46	0.749
tFold-Ab	3.01	0.770

Unbound Nanobody Prediction (SAbDab-22H2-Nano)

Model	RMSD-CDR-H3
AlphaFold	3.96
Chai-1	3.57
IgFold	4.64
ImmuneBuilder	3.79
ESMFold	3.80
OmegaFold	3.63
tFold-Ab	3.57

Antibody-Antigen Complex Prediction (SAbDab-22H2-AbAg)

Model	DockQ	Success Rate
AlphaFold-Multimer	0.158	18.2
AlphaFold-3	0.257	32.3
tFold-Ag	0.217	28.3

unliganded TCR Prediction (STCRDab-22-TCR)

Model	RMSD-CDR-A3	RMSD-CDR-B3	DockQ
AlphaFold-Multimer	1.89	1.62	0.785
AlphaFold-3	1.80	1.50	0.769
TCRModel2	1.77	1.52	0.795
tFold-TCR	1.66	1.35	0.795

unbound pMHC Prediction (STCRDab-22-pMHC)

Model	DockQ
AlphaFold-Multimer	0.927
AlphaFold-3	0.926
tFold-TCR	0.908

TCR-pMHC Complex Prediction (STCRDab-22-TCR_pMHC)

Model	DockQ	RMSD	Success Rate
AlphaFold-Multimer	0.490	3.601	83.3
AlphaFold-3	0.496	3.094	72.2
tFold-TCR	0.496	2.413	94.4

Installation

Clone the package

git clone https://github.com/TencentAI4S/tfold.git
cd tfold

Prepare the environment

Please follow the instructions in INSTALL.md to set up the environment

Download pre-trained weights under params directory (Optional)

Note:

If you download the weights in the folder ./checkpoints, you can proceed directly with the following steps.

If you don't download the weights, the weights will be downloaded automatically when you run the code. 4. Download sequence databases for mas searching (only needed for tFold-Ag)

sh scripts/setup_database.sh

Dataset

Test set we construct in our paper

Human germline antibody frameworks library to guide antibody generation

Our repository supports two methods of use, direct use or pip installation.

Quick Start

You can use a fasta file (--fasta) or a json file (--json) as input.

tFold-Ab

Example 1: predicting the structure of a antibody & nanobody using tFold-Ab

# antibody
python projects/tfold_ab/predict.py --fasta examples/fasta.files/7ox3_A_B.fasta --output examples/predictions/7ox3_A_B.pdb

# nanobody
python projects/tfold_ab/predict.py --fasta examples/fasta.files/7ocj_B.fasta --output examples/predictions/7ocj_B.pdb

tFold-Ag

Example 1: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with pre-computed MSA

# antibody-antigen complex
python projects/tfold_ag/predict.py --fasta examples/fasta.files/8df5_A_B_R.fasta --msa examples/msa.files/8df5_R.a3m --output examples/predictions/8df5_A_B_R.pdb

# nanobody-antigen complex
python projects/tfold_ag/predict.py --fasta examples/fasta.files/7sai_C_NA_A.fasta --msa examples/msa.files/7sai_A.a3m --output examples/predictions/7sai_C_NA_A.pdb

Example 2: Generate MSA for structure predictions using MMseqs2

python projects/tfold_ag/gen_msa.py --fasta_file=examples/fasta.files/PD-1.fasta --output_dir=examples/PD-1

Example 3: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with inter-chain features

# generate inter-chain feature (ppi)
python projects/tfold_ag/gen_icf_feat.py --pid_fpath=examples/fasta.files/8df5_A_B_R.fasta --fas_dpath=examples/fasta.files/ --pdb_dpath=examples/pdb.files.native/ --icf_dpath=examples/icf.files.ppi --icf_type=ppi

# antibody-antigen complex prediction with inter-chain feature
python projects/tfold_ag/predict.py --fasta examples/fasta.files/8df5_A_B_R.fasta --msa examples/msa.files/8df5_R.a3m --icf examples/icf.files.ppi/8df5_A_B_R.pt --output examples/predictions/8df5_A_B_R.pdb --model_version ppi

Example 4: CDRs loop deisgn with tFold-Ag with pre-computed MSA

python projects/tfold_ag/predict.py --fasta examples/fasta.files/7urf_O_P_A.cdrh3.fasta --msa examples/msa.files/7urf_A.a3m --output examples/predictions/7urf_O_P_A.pdb

tFold-TCR

Example 1: predicting the structure of a TCR complex

# TCR
python projects/tfold_tcr/predict.py --json examples/tcr_example.json --output examples/predictions/ --model_version TCR

# pMHC complex
python projects/tfold_tcr/predict.py --json examples/pmhc_example.json --output examples/predictions/ --model_version pMHC

# Complex
python projects/tfold_tcr/predict.py --json examples/tcr_pmhc_example.json --output examples/predictions/ --model_version Complex

Quick Start with Pip Installation

Direct installation from pypi:

  pip install tfold

or install from source code:

  cd tfold
  pip install .

After pip install, you can load and use a pretrained model as follows:

Extract cross-chain information using ESM-PPI

import torch
import tfold

# Download the pre-trained model
model_path = tfold.model.esm_ppi_650m_ab()

# Load the model
model = tfold.model.PPIModel.restore(model_path)

# Prepare antibody sequences (can be single or multiple sequences)
data = [
        'QVQLVQSGAEVKKPGASVKVSCKASGYPFTSYGISWVRQAPGQGLEWMGWISTYNGNTNYAQKFQGRVTMTTDTSTTTGYMELRRLRSDDTAVYYCARDYTRGAWFGESLIGGFDNWGQGTLVTVSS', # Heavy chain
        'EIVLTQSPGTLSLSPGERATLSCRASQTVSSTSLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQHDTSLTFGGGTKVEIK' # Light chain
]
ppi_output = model(data)

The output keys are (['labl', 'mask', 'pred', 'sfea', 'pfea']).

Each key in the outputs dictionary represents a different component of the model's output:

'labl': Original token indices from tokn_mat_orig that represent the unmasked (original) amino acid sequences. These serve as ground truth labels for training.
'mask': Binary mask tensor indicating which positions were masked during inference. It's used to identify which positions should be predicted in the masked language modeling task.
'pred': The model's logits (raw prediction scores before softmax) for each position in the sequence. These are the actual predictions made by the model.
'sfea': Single-residue features/embeddings extracted from the final layer representations. These are residue-level embeddings with dimension self.c_s for each amino acid position.
'pfea': Pair features representing interactions between residues, derived from attention weights. These capture the relationships between each pair of residues in the sequence with dimension self.c_z.

Predict antibody structures with tFold-Ab

import torch
import tfold

# Download the pre-trained model
ppi_model_path = tfold.model.esm_ppi_650m_ab()
tfold_model_path = tfold.model.tfold_ab_trunk()

# Load the model
model = tfold.deploy.PLMComplexPredictor.restore_from_module(ppi_model_path, tfold_model_path)

# Prepare antibody sequences (can be single or multiple sequences)
data =[
        {
          "sequence": 'QVQLVQSGAEVKKPGASVKVSCKASGYPFTSYGISWVRQAPGQGLEWMGWISTYNGNTNYAQKFQGRVTMTTDTSTTTGYMELRRLRSDDTAVYYCARDYTRGAWFGESLIGGFDNWGQGTLVTVSS', # Heavy chain
          "id": 'H'
          },
        {
          "sequence": 'EIVLTQSPGTLSLSPGERATLSCRASQTVSSTSLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQHDTSLTFGGGTKVEIK', # Light chain
          "id": 'L'
          }]
output_path = '8df5_A_B_R.pdb'

model.infer_pdb(data, output_path)

Predict the structure of a antibody-antigen complex with tFold-Ag

import torch
import tfold

# Download the pre-trained model of ESM-PPI
ppi_model_path = tfold.model.esm_ppi_650m_ab()
# Download the pre-trained model of alphaFold
alphafold_path  = tfold.model.alpha_fold_4_ptm()
# Download base model for tFold-Ag
tfold_model_path = tfold.model.tfold_ag_base()

# Download the ppi model for tFold-Ag
# tfold_model_path = tfold.model.tfold_ag_ppi()

# Load the model
model = tfold.deploy.AgPredictor(ppi_model_path, alphafold_path, tfold_model_path)

# Prepare antibody-antigen sequences
msa_path = 'examples/msa.files/8df5_R.a3m'
with open(msa_path) as f:
   msa, deletion_matrix = tfold.protein.parser.parse_a3m(f.read())

# if you don't have msa, you can use the following code to generate msa
#from projects.tfold_ag.gen_msa import generate_msa
#with open('8df5_R.fasta', 'w') as f:
#    f.write('>8df5_R\nMGILPSPGMPALLSLVSLLSVLLMGCVAETGTRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELLHAPATVCGPKKSTHHHHHHHHGGSSGLNDIFEAQKIEWHE')
#generate_msa('8df5_R.fasta', output_dir='examples/msa.files/')
#with open('examples/msa.files/8df5_R.a3m') as f:
#   msa, deletion_matrix = tfold.protein.parser.parse_a3m(f.read())


data = [
         {
             "id": "H",
             "sequence": "QVQLVQSGAEVKKPGASVKVSCKASGYPFTSYGISWVRQAPGQGLEWMGWISTYNGNTNYAQKFQGRVTMTTDTSTTTGYMELRRLRSDDTAVYYCARDYTRGAWFGESLIGGFDNWGQGTLVTVSS"
         },
         {
             "id": "L",
             "sequence": "EIVLTQSPGTLSLSPGERATLSCRASQTVSSTSLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQHDTSLTFGGGTKVEIK"
         },
         {
             "id": "A",
             "sequence": "MGILPSPGMPALLSLVSLLSVLLMGCVAETGTRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELLHAPATVCGPKKSTHHHHHHHHGGSSGLNDIFEAQKIEWHE",
             "msa": msa,
             "deletion_matrix": deletion_matrix
         }
        ]

output_path = '8df5_A_B_R.pdb'

model.infer_pdb(data, output_path)

Predict TCR structures using tFold-TCR

import torch
import tfold

# Download the pre-trained model
ppi_model_path = tfold.model.esm_ppi_650m_tcr()
tfold_model_path = tfold.model.tfold_tcr_trunk()

# Load the model
model = tfold.deploy.TCRPredictor.restore_from_module(ppi_model_path, tfold_model_path)

# Prepare TCR sequences
data =[
            {
                "id": "B",
                "sequence": "NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL"
            },
            {
                "id": "A",
                "sequence": "AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS"
            }
        ]

output_path = '6zkw_E_D_A_B_C.pdb'

model.infer_pdb(data, output_path)

Predict TCR-pMHC structures using tFold-TCR

import torch
import tfold

# Download the pre-trained model
ppi_model_path = tfold.model.esm_ppi_650m_tcr()
tfold_model_path = tfold.model.tfold_tcr_pmhc_trunk()

# Load the model
model = tfold.deploy.TCRpMHCPredictor(ppi_model_path, tfold_model_path)

# Prepare TCR-pMHC sequences
data =[
            {
                "id": "B",
                "sequence": "NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL"
            },
            {
                "id": "A",
                "sequence": "AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS"
            },
            {
                "id": "M",
                "sequence": "GSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQFAYDGKDYLTLNEDLRSWTAVDTAAQISEQKSNDASEAEHQRAYLEDTCVEWLHKYLEKGKETLLHLEPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQQDGEGHTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPEPVTLRWKP"
            },
            {
                "id": "N",
                "sequence": "MIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDEYACRVNHVTLSQPKIVKWDRDM"
            },
            {
                "id": "P",
                "sequence": "RLPAKAPLL"
            }
        ]

output_path = '6zkw_E_D_A_B_C.pdb'

model.infer_pdb(data, output_path)

Citing tFold

If you use tfold in your research, please cite our paper

@article{wu2024fast,
  title={Fast and accurate modeling and design of antibody-antigen complex using tFold},
  author={Wu, Fandi and Zhao, Yu and Wu, Jiaxiang and Jiang, Biaobin and He, Bing and Huang, Longkai and Qin, Chenchen and Yang, Fan and Huang, Ningqiao and Xiao, Yang and others},
  journal={bioRxiv},
  pages={2024--02},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

and old version of tFold-Ab

@article{wu2022tfold,
  title={tFold-ab: fast and accurate antibody structure prediction without sequence homologs},
  author={Wu, Jiaxiang and Wu, Fandi and Jiang, Biaobin and Liu, Wei and Zhao, Peilin},
  journal={bioRxiv},
  pages={2022--11},
  year={2022},
  publisher={Cold Spring Harbor Laboratory}
}

Our new pre-print paper on tFold-TCR will be coming soon

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
examples		examples
projects		projects
scripts		scripts
tests		tests
tfold		tfold
.gitignore		.gitignore
LICENSE		LICENSE
README-zh.md		README-zh.md
README.md		README.md
environment.yaml		environment.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main models

OverView

Main Results

Unbound Antibody Prediction (SAbDab-22H2-Ab)

Unbound Nanobody Prediction (SAbDab-22H2-Nano)

Antibody-Antigen Complex Prediction (SAbDab-22H2-AbAg)

unliganded TCR Prediction (STCRDab-22-TCR)

unbound pMHC Prediction (STCRDab-22-pMHC)

TCR-pMHC Complex Prediction (STCRDab-22-TCR_pMHC)

Installation

Dataset

Quick Start

tFold-Ab

Example 1: predicting the structure of a antibody & nanobody using tFold-Ab

tFold-Ag

Example 1: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with pre-computed MSA

Example 2: Generate MSA for structure predictions using MMseqs2

Example 3: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with inter-chain features

Example 4: CDRs loop deisgn with tFold-Ag with pre-computed MSA

tFold-TCR

Example 1: predicting the structure of a TCR complex

Quick Start with Pip Installation

Extract cross-chain information using ESM-PPI

Predict antibody structures with tFold-Ab

Predict the structure of a antibody-antigen complex with tFold-Ag

Predict TCR structures using tFold-TCR

Predict TCR-pMHC structures using tFold-TCR

Citing tFold

About

Releases

Packages

Contributors 3

Languages

License

TencentAI4S/tfold

Folders and files

Latest commit

History

Repository files navigation

Main models

OverView

Main Results

Unbound Antibody Prediction (SAbDab-22H2-Ab)

Unbound Nanobody Prediction (SAbDab-22H2-Nano)

Antibody-Antigen Complex Prediction (SAbDab-22H2-AbAg)

unliganded TCR Prediction (STCRDab-22-TCR)

unbound pMHC Prediction (STCRDab-22-pMHC)

TCR-pMHC Complex Prediction (STCRDab-22-TCR_pMHC)

Installation

Dataset

Quick Start

tFold-Ab

Example 1: predicting the structure of a antibody & nanobody using tFold-Ab

tFold-Ag

Example 1: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with pre-computed MSA

Example 2: Generate MSA for structure predictions using MMseqs2

Example 3: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with inter-chain features

Example 4: CDRs loop deisgn with tFold-Ag with pre-computed MSA

tFold-TCR

Example 1: predicting the structure of a TCR complex

Quick Start with Pip Installation

Extract cross-chain information using ESM-PPI

Predict antibody structures with tFold-Ab

Predict the structure of a antibody-antigen complex with tFold-Ag

Predict TCR structures using tFold-TCR

Predict TCR-pMHC structures using tFold-TCR

Citing tFold

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages