Semi-Supervised Recognition Challenge - FGVC7

This project contains my code for the CVPR2020 challenge on Semi-Supervised Recognition.

Description

A CVPR202 ML challenge focused on learning from partially labeled data, a form of semi-supervised learning. The dataset is designed to expose some challenges encountered in a realistic setting, such as the fine-grained similarity between classes, significant class imbalance, and domain mismatch between the labeled and unlabeled data.

Overview of Approach

Primarily, I tackled the problem as transfer learning one and used best practices from 5 different domains to increase performance.

Major Components of the Project

Transfer Learning (ImageNet --> iNat2020)
Fine-Grained Classification
Long Tail Classification
Semi-supervised learning (A huge unlabelled dataset)
Learning From out of distribution data

Competition Results

Data

This challenge focuses on Aves (birds) classification, where we provide labeled data of the target classes and unlabeled data from target and non-target classes. The data is obtained from iNaturalist, a community-driven project aimed at collecting biodiversity observations.

The dataset comes with standard training, validation, and test sets. The training set consists of:

labeled images from 200 species of Aves (birds), where 10% of the images are labeled.
unlabeled images from the same set of classes as the labeled images (in-class).
unlabeled images from a different set of classes as the labeled set (out-of-class). These images are from a different set of classes in the Aves taxa. This reflects a common scenario where a coarser taxonomic label of an image can be easily obtained.

The validation and test set contain 10 and 20 images respectively for each of the 200 categories in the labeled set. The distributions of these images are shown in the table below.

Split	Details	Classes	Images
Train	Labeled	200	3,959
Train	Unlabeled, in-class	200	26,640
Train	Unlabeled, out-of-class	-	122,208
Val	Labeled	200	2,000
Test	Public	200	4,000
Test	Private	200	4,000

The number of images per class follows a heavy-tailed distribution as shown in the Figure below.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataloader		dataloader
imgs		imgs
models		models
noisystudent		noisystudent
survey		survey
utils		utils
.gitignore		.gitignore
Experiments.ipynb		Experiments.ipynb
Explore Data.ipynb		Explore Data.ipynb
README.md		README.md
finetune.py		finetune.py
functions.py		functions.py
run.sh		run.sh
self_ensemble.py		self_ensemble.py
submit.py		submit.py
test.py		test.py
train.py		train.py
train_final.py		train_final.py
train_noisy.py		train_noisy.py
u_train.py		u_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi-Supervised Recognition Challenge - FGVC7

Description

Overview of Approach

Major Components of the Project

Competition Results

Data

Exploring Data

Class Distribution

Train and validation

Test

In-distribution data

Out-distribution data

References

About

Languages

awaisrauf/CVPR-Semi-Supervised-Recognition-Challenge

Folders and files

Latest commit

History

Repository files navigation

Semi-Supervised Recognition Challenge - FGVC7

Description

Overview of Approach

Major Components of the Project

Competition Results

Data

Exploring Data

Class Distribution

Train and validation

Test

In-distribution data

Out-distribution data

References

About

Resources

Stars

Watchers

Forks

Languages