Skip to content

Finding patch of conserved amino acid sites in 3D structure

License

Notifications You must be signed in to change notification settings

Rcoppee/CONSTRUCT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONSTRUCT

CONSTRUCT is a software tool designed to identify functional and structurally important sites in proteins by detecting amino acid sites evolving under strong purifying selection that cluster together in 3D structure.


✅ Prerequisites

🧪 Tested on macOS 15 (Sequoia, Apple Silicon - M3)

🖥️ Operating System

  • macOS (Apple Silicon — M1/M2/M3)
    • Minimum macOS 11 (Big Sur)
    • Not compatible with Intel Macs at the moment
  • Linux
    • Ubuntu 20.04 LTS or later (Debian-based systems supported)

🔢 Software Requirements

Tool Minimum Version Recommended Notes
Python 3.7 3.10+ Required for GUI (customtkinter)
R 4.1.0 4.1.2+ Ensures CRAN binary compatibility
Homebrew Latest Required for macOS dependency handling

Linux quick setup

sudo apt install r-base-core python3-pip git

📦 What gets installed

All of the following dependencies are installed automatically by the script:

Python

  • customtkinter

R

  • tidyverse
  • readr
  • dplyr
  • bio3d
  • msa (and its Bioconductor dependencies)

System tools

  • rate4site (compiled from source)
  • Compilation tools (gcc, make, etc.)

🚀 Installation

After downloading the files from the repository, run the installer script. It will verify and install all dependencies.

Download & Install

git clone https://github.com/Rcoppee/CONSTRUCT
cd CONSTRUCT/
bash install_packages.sh

🧪 Usage

To launch the program:

python3 CONSTRUCT.py

A graphical interface will open:

GUI preview

Just fill in the necessary fields and click "Run post-processing" to start the analysis.


Outputs

CONSTRUCT generates three result files:

  • spatial_rates.txt: a file containing the spatially correlated site-specific substitution rates of amino acid sites, ranked by their level of conservation.
  • log_files.txt: indicates whether a patch of conserved amino acid sites was detected in the protein structure (with the best window size and corresponding correlation strength).
  • color_conserved.pml: a file highlighting the top 10% of conserved amino acid sites (for use with PyMOL).

Examples

KEAP1

Analyzing the KEAP1 propeller domain

To analyze the KEAP1 propeller domain, two files must be submitted:

  1. A fasta file: This file should contain an alignment of orthologous sequences with the reference sequence listed first.
  2. A PDB file: This file should contain the Cartesian coordinates of the protein structure (in this example we hase used the PDB ID: 2FLU).

Once you have submitted these files, you can proceed by running the post-processing tool. When the process is complete, you'll see a score representing the strength of the correlation in site-specific substitution rates (a value > 8 indicates the presence of a patch of conserved amino acid sites). In this example, using the side-chain orientation option as Cartesian coordinates, you might observe a log score of 74.63, which is > 8, indicating the presence of a patch of conserved amino acid sites (corresponding to the surface interface with Nrf2, the substrate of KEAP1).

To visualize this patch, you can use PyMOL:

  1. Open PyMOL.
  2. Go to "File" and select "Open."
  3. Load the generated color_conserved.pml file.
Description de l'image

/!\ If you move the PDB file after running CONSTRUCT, you'll have to change the first line of color_conserved.pml, because the first line is: load {pdb_file_path}/my_pdb.pdb (where my_pdb.pdb is your PDB file). You can also manually open the PDB file in PyMOL then open color_conserved.pml.

Domain-specific analysis

Let’s take DHPS as an example.

Description de l'image

In the initial analysis, no specific boundaries were set, and the following patch was identified:

Description de l'image

This patch is located on the DHPS domain of the protein.

If you want to focus on a specific part of the protein, such as the PPPK domain, you can define the boundaries for that domain, which in this case would be from position 1 to 386.

Description de l'image

After specifying these boundaries, a patch of conserved amino acid sites was specifically detected in the PPPK domain:

Description de l'image

Tutorial

A video tutorial has been created for easy installation and execution of CONSTRUCT: https://www.youtube.com/watch?v=bf-VYReZIeM&t=10s

Citation

CONSTRUCT: an algorithmic tool for identifying functional or structurally important regions in protein tertiary structure

Lucas Chivot, Noé Mathieux, Anna Cosson, Antoine Bridier-Nahmias, Loic Favennec, Jean-Christophe Gelly, Jérôme Clain, Romain Coppée

About

Finding patch of conserved amino acid sites in 3D structure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •