Skip to content

Grabbing genome accessions & download info based on taxonomy, using the NCBI REST API, w00t

License

Notifications You must be signed in to change notification settings

ctb/2025-ncbi-rest-api

Repository files navigation

2025-ncbi-rest-api - examples of using the NCBI Datasets API in Python

This repo contains demo and example code to use the NCBI Datasets REST API to grab accessions of all (reference) genomes under a certain taxonomic node, and save/retrieve/manipulate the resulting information for fun and profit.

The Snakefile provides a few different examples, including the use of the sourmash directsketch plugin to download all of the genomes in bulk.

Specifically, this repo contains code to:

  • Download a dataset zip for one or more accessions;
  • Retrieve genome accessions for all eukaryotic genomes.
  • Create "subtracted" lists for polyphyletic taxonomic nodes such as invertebrates, non-bilateria, and "other" eukaryotes.
  • Download 10 fungal genome sequences.
  • Retrieve NCBI lineage information for a given taxid using pytaxonkit.

and maybe more.

Running this code

To run, set your NCBI API key like so:

export NCBI_API_KEY=foobarbaz

Create a conda environment or otherwise install the things in environment.yml:

conda env create -n ncbi-rest-api -f environment.yml
conda activate ncbi-rest-api

Then:

snakemake -p

to do some basic things.

Appendix: getting an API key

Follow these instructions.

Related repos

Support

I can't guarantee support for this code, of course, but odds are good that if you find a bug or need a fix it'll be useful to me and others. Please file an issue with any questions or comments! And feel free to say hi over on bluesky.

C. Titus Brown, 1/26/2025

me on Bluesky

About

Grabbing genome accessions & download info based on taxonomy, using the NCBI REST API, w00t

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages