last version of the manuscript

PRIDE-Archive · Feb 10, 2025 · 04c7b58 · 04c7b58
1 parent cb97484
commit 04c7b58
Show file tree

Hide file tree

Showing 3 changed files with 35 additions and 41 deletions.
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -1,42 +1,36 @@
-@article{Perez-Riverol2022-ow,
-  title     = "The {PRIDE} database resources in 2022: a hub for mass
-               spectrometry-based proteomics evidences",
-  author    = "Perez-Riverol, Yasset and Bai, Jingwen and Bandla, Chakradhar
-               and Garc{\'\i}a-Seisdedos, David and Hewapathirana, Suresh and
-               Kamatchinathan, Selvakumar and Kundu, Deepti J and Prakash,
-               Ananth and Frericks-Zipper, Anika and Eisenacher, Martin and
-               Walzer, Mathias and Wang, Shengbo and Brazma, Alvis and
-               Vizca{\'\i}no, Juan Antonio",
-  abstract  = "The PRoteomics IDEntifications (PRIDE) database
-               (https://www.ebi.ac.uk/pride/) is the world's largest data
-               repository of mass spectrometry-based proteomics data. PRIDE is
-               one of the founding members of the global ProteomeXchange (PX)
-               consortium and an ELIXIR core data resource. In this manuscript,
-               we summarize the developments in PRIDE resources and related
-               tools since the previous update manuscript was published in
-               Nucleic Acids Research in 2019. The number of submitted datasets
-               to PRIDE Archive (the archival component of PRIDE) has reached
-               on average around 500 datasets per month during 2021. In
-               addition to continuous improvements in PRIDE Archive data
-               pipelines and infrastructure, the PRIDE Spectra Archive has been
-               developed to provide direct access to the submitted mass spectra
-               using Universal Spectrum Identifiers. As a key point, the file
-               format MAGE-TAB for proteomics has been developed to enable the
-               improvement of sample metadata annotation. Additionally, the
-               resource PRIDE Peptidome provides access to aggregated
-               peptide/protein evidences across PRIDE Archive. Furthermore, we
-               will describe how PRIDE has increased its efforts to reuse and
-               disseminate high-quality proteomics data into other added-value
-               resources such as UniProt, Ensembl and Expression Atlas.",
-  journal   = "Nucleic Acids Res.",
-  publisher = "Oxford University Press (OUP)",
-  volume    =  50,
-  number    = "D1",
-  pages     = "D543--D552",
-  month     =  jan,
-  year      =  2022,
-  copyright = "https://creativecommons.org/licenses/by/4.0/",
-  language  = "en"
+@article{Perez-Riverol2025-mo,
+  title    = "The {PRIDE} database at 20 years: 2025 update",
+  author   = "Perez-Riverol, Yasset and Bandla, Chakradhar and Kundu, Deepti J
+              and Kamatchinathan, Selvakumar and Bai, Jingwen and
+              Hewapathirana, Suresh and John, Nithu Sara and Prakash, Ananth
+              and Walzer, Mathias and Wang, Shengbo and Vizca{\'\i}no, Juan
+              Antonio",
+  abstract = "The PRoteomics IDEntifications (PRIDE) database
+              (https://www.ebi.ac.uk/pride/) is the world's leading mass
+              spectrometry (MS)-based proteomics data repository and one of the
+              founding members of the ProteomeXchange consortium. This
+              manuscript summarizes the developments in PRIDE resources and
+              related tools for the last three years. The number of submitted
+              datasets to PRIDE Archive (the archival component of PRIDE) has
+              reached on average around 534 datasets per month. This has been
+              possible thanks to continuous improvements in infrastructure such
+              as a new file transfer protocol for very large datasets (Globus),
+              a new data resubmission pipeline and an automatic dataset
+              validation process. Additionally, we will highlight novel
+              activities such as the availability of the PRIDE chatbot (based
+              on the use of open-source Large Language Models), and our work to
+              improve support for MS crosslinking datasets. Furthermore, we
+              will describe how we have increased our efforts to reuse,
+              reanalyze and disseminate high-quality proteomics data into
+              added-value resources such as UniProt, Ensembl and Expression
+              Atlas.",
+  journal  = "Nucleic Acids Res.",
+  volume   =  53,
+  number   = "D1",
+  pages    = "D543--D553",
+  month    =  jan,
+  year     =  2025,
+  language = "en"
 }
 
 @article{Dai2024-yc,

diff --git a/paper/paper.md b/paper/paper.md
@@ -35,11 +35,11 @@ bibliography: paper.bib
 
 # Summary
 
-The Proteomics Identification Database (PRIDE) [@Perez-Riverol2022-ow] is the world's largest repository for proteomics data and a founding member of ProteomeXchange [@Deutsch2023-mu]. Here, we introduce [`pridepy`](https://github.com/PRIDE-Archive/pridepy), a Python client designed to access PRIDE Archive data, including project metadata and file downloads. `pridepy` offers a flexible programmatic interface for searching, retrieving, and downloading data via the PRIDE REST API. This tool simplifies the integration of PRIDE datasets into bioinformatics pipelines, making it easier for researchers to handle large datasets programmatically.
+The Proteomics Identification Database (PRIDE) [@Perez-Riverol2025-mo] is the world's largest repository for proteomics data and a founding member of ProteomeXchange [@Deutsch2023-mu]. Here, we introduce [`pridepy`](https://github.com/PRIDE-Archive/pridepy), a Python client designed to access PRIDE Archive data, including project metadata and file downloads. `pridepy` offers a flexible programmatic interface for searching, retrieving, and downloading data via the PRIDE REST API. This tool simplifies the integration of PRIDE datasets into bioinformatics pipelines, making it easier for researchers to handle large datasets programmatically.
 
 # Statement of Need
 
-The PRIDE Archive hosts an extensive collection of proteomics data [@Perez-Riverol2022-ow], but manual access to this data can be inefficient and time-consuming. With the increasing demand for cloud-based [@Dai2024-yc] and HPC bioinformatics tools [@Mehta2023-og], command-line utilities that integrate seamlessly with the PRIDE API are becoming essential. pridepy addresses this need by enabling researchers to programmatically access PRIDE using Python, a widely adopted programming language. It facilitates efficient integration of datasets into automated workflows and supports large-scale data transfers via [Aspera](https://www.ibm.com/products/aspera), [Globus](https://www.globus.org/data-transfer), FTP, and HTTPS, making it ideal for scalable and reproducible pipelines. Unlike other tools such as ppx [@Fondrie2021-xk], which primarily support data downloads from ProteomeXchange databases using the HTTP protocol, pridepy provides advanced functionality by leveraging multiple protocols and the latest PRIDE API to access both public and private datasets.
+The PRIDE Archive hosts an extensive collection of proteomics data [@Perez-Riverol2025-mo], but manual access to this data can be inefficient and time-consuming. With the increasing demand for cloud-based [@Dai2024-yc] and HPC bioinformatics tools [@Mehta2023-og], command-line utilities that integrate seamlessly with the PRIDE API are becoming essential. pridepy addresses this need by enabling researchers to programmatically access PRIDE using Python, a widely adopted programming language. It facilitates efficient integration of datasets into automated workflows and supports large-scale data transfers via [Aspera](https://www.ibm.com/products/aspera), [Globus](https://www.globus.org/data-transfer), FTP, and HTTPS, making it ideal for scalable and reproducible pipelines. Unlike other tools such as ppx [@Fondrie2021-xk], which primarily support data downloads from ProteomeXchange databases using the HTTP protocol, pridepy provides advanced functionality by leveraging multiple protocols and the latest PRIDE API to access both public and private datasets.
 
 # Methods
 

diff --git a/paper/paper.pdf b/paper/paper.pdf