The CEM Dataset

Overview

The CEM dataset is an unlabeled collection of 2D cellular EM images designed for self-supervised learning algorithms. Gathered from over 2 PB of data, it is heterogeneous enough to capture a significant variety of organisms, tissues, and imaging methods.

Resources

CEM1.5M: The newest release of the dataset with 1.5 million images.
CEM500K: The first release of the dataset with 500 thousand images.
CEM1.5M Pre-trained Weights: PyTorch weights for a ResNet50 model pre-trained on CEM1.5M using the SwAV algorithm.
CEM500K Pre-trained Weights: PyTorch weights for a ResNet50 model pre-trained on CEM500K using the MoCoV2 algorithm.
CEM Patch Filtering Weights: PyTorch weights for a ResNet34 model trained on 12,000 EM images that were labeled as "informative" or "uninformative". Used to curate patches in the CEM dataset.
cem-dataset: Source code to reproduce the results of our paper; scripts to preprocess, standardize, and curate 2D and 3D EM datasets; scripts to download and prepare the EMOrganelles benchmark datasets (including the All Mitochondria benchmark established in the CEM500K paper) and SnakeMake files to evaluate pre-trained models on the benchmarks. Plus, explanatory Jupyter Notebooks.

Citing this work

If you find any of these resources useful in your work, please cite:

@article {Conrad2021,
	author = {Conrad, Ryan and Narayan, Kedar},
	doi = {10.7554/eLife.65894},
	issn = {2050-084X},
	journal = {eLife},
	month = {apr},
	title = {{CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning}},
	url = {https://elifesciences.org/articles/65894},
	volume = {10},
	year = {2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cem.md

cem.md

The CEM Dataset

Overview

Resources

Citing this work

Files

cem.md

Latest commit

History

cem.md

File metadata and controls

The CEM Dataset

Overview

Resources

Citing this work